Localization, internationalization
In the context of computers:
- internationalization(/internationalisation) : refers to the part of software that lets it easily be used in various languages and locales.
- localization(/localisation): Refers to the wish, implementation, and part of the OS environment that defines locale-specific handling of certain details, including formatting of numbers, dates, money amounts
- Globalization: Term used by IBM, referring to both
To save a lot of typing
- Internationalization is also known as i18n
- Localization is also known as L10n
- similarly, accessibility is seen as a11y, canonicalization is seen as c14n
See also:
- http://en.wikipedia.org/wiki/Internationalization_and_localization
- http://en.wikipedia.org/wiki/Locale
Unsorted
Linux locale setting
To see the active locale settings, use
locale
This should show something like:
LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
To see the locales that are installed, use:
locale -a
which typicaqlly show anywhere between a few and few dozen locales. I removed most locales from my system, so I only have a few::
C en_GB en_GB.iso88591 en_GB.utf8 en_US en_US.iso88591 en_US.utf8 POSIX
Say I want to use en_US.utf8. (note: you should copy-paste whatever you want verbatim. There are a few differences between OSes, e.g. en_US.utf8 versus en_US.UTF-8)
I would probably edit login settings (the best way to do this varies per system) to include something like:
export LC_ALL="en_US.utf8" export LANG="en_US.utf8"
So what does the above actually do?
Specific programs and functions that look for these variables will do formatting and parsing in a specific, more localized way.
More central
LC_TIME - how to show the names of weekdays and months, and some ordering
- but there are various libraries that will only use some of this (e.g. strftime lets you control order, though do pick up the names)
LC_CTYPE helps define what characters are letters, punctuation; capitals, transliteration, etc.
- in practice: text utilities may parse text slightly differently, e.g. the way they look for word boundaries.
LC_MESSAGES - language to use for messages
LC_NUMERIC - e.g. whether to use . or , as decimal separator resp. thousands separator
- in practice: a lot of things don't use this, so you'll probably see a mix. (verify)
LC_COLLATE is actually more bother than useful
- it's less flexible than it needs to be for many real-world collation rules
- some cases may break things like [A-Z] regexes
And relatively rarely used stuff:
LC_MONETARY - how to format money amounts
- few things use this
LC_PAPER - size in mm
LC_NAME
LC_ADDRESS
LC_TELEPHONE
LC_MEASUREMENT - 1 for metric, 2 for US
LC_IDENTIFICATION
Additionally there are LANG and LC_ALL. Neither of which need to be set, if the above are, because:
- LC_ALL forces the locale for all categories
- and seems meant to more easily force consistent behaviour in scripts
- LANG sets the default locale for all categories - and setting any category overrides this
- so this is a convenience thing, and not directly picked up(verify)
Other notes:
- there is often a POSIX and C locale.
- C is meant to be a simple, non-interpreting locale, useful to force collation to be bytewise, have characters always be bytes (most other locales are UTF8 these days), and force the decimal separator to be . (e.g. useful for some shell arithmetic in scripts)
- POSIX is similar. In some cases it is an alias of C[1], in other cases it is a definition that differs in a few details like, apparently, no explicit no definition for sorting non-ascii bytes but still effectively the same as C (verify)
https://sourceware.org/glibc/wiki/Locales#Locale_File_Format
https://unix.stackexchange.com/questions/87745/what-does-lc-all-c-do