Localization, internationalization

From Helpful
Jump to navigation Jump to search

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In the context of computers:

  • internationalization(/internationalisation) : refers to the part of software that lets it easily be used in various languages and locales.
  • localization(/localisation): Refers to the wish, implementation, and part of the OS environment that defines locale-specific handling of certain details, including formatting of numbers, dates, money amounts
  • Globalization: Term used by IBM, referring to both


To save a lot of typing

Internationalization is also known as i18n
Localization is also known as L10n
similarly, accessibility is seen as a11y, canonicalization is seen as c14n


See also:


Unsorted

Linux locale setting

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

To see the active locale settings, use

locale

This should show something like:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


To see the locales that are installed, use:

locale -a

which typicaqlly show anywhere between a few and few dozen locales. I removed most locales from my system, so I only have a few::

C
en_GB
en_GB.iso88591
en_GB.utf8
en_US
en_US.iso88591
en_US.utf8
POSIX


Say I want to use en_US.utf8. (note: you should copy-paste whatever you want verbatim. There are a few differences between OSes, e.g. en_US.utf8 versus en_US.UTF-8)


I would probably edit login settings (the best way to do this varies per system) to include something like:

export LC_ALL="en_US.utf8"
export LANG="en_US.utf8"


So what does the above actually do?

Specific programs and functions that look for these variables will do formatting and parsing in a specific, more localized way.


More central

LC_TIME - how to show the names of weekdays and months, and some ordering

but there are various libraries that will only use some of this (e.g. strftime lets you control order, though do pick up the names)

LC_CTYPE helps define what characters are letters, punctuation; capitals, transliteration, etc.

in practice: text utilities may parse text slightly differently, e.g. the way they look for word boundaries.

LC_MESSAGES - language to use for messages

LC_NUMERIC - e.g. whether to use . or , as decimal separator resp. thousands separator

in practice: a lot of things don't use this, so you'll probably see a mix. (verify)

LC_COLLATE is actually more bother than useful

it's less flexible than it needs to be for many real-world collation rules
some cases may break things like [A-Z] regexes


And relatively rarely used stuff:

LC_MONETARY - how to format money amounts

few things use this

LC_PAPER - size in mm

LC_NAME

LC_ADDRESS

LC_TELEPHONE

LC_MEASUREMENT - 1 for metric, 2 for US

LC_IDENTIFICATION


Additionally there are LANG and LC_ALL. Neither of which need to be set, if the above are, because:

  • LC_ALL forces the locale for all categories
and seems meant to more easily force consistent behaviour in scripts
  • LANG sets the default locale for all categories - and setting any category overrides this
so this is a convenience thing, and not directly picked up(verify)


Other notes:

  • there is often a POSIX and C locale.
    • C is meant to be a simple, non-interpreting locale, useful to force collation to be bytewise, have characters always be bytes (most other locales are UTF8 these days), and force the decimal separator to be . (e.g. useful for some shell arithmetic in scripts)
    • POSIX is similar. In some cases it is an alias of C[1], in other cases it is a definition that differs in a few details like, apparently, no explicit no definition for sorting non-ascii bytes but still effectively the same as C (verify)



https://unix.stackexchange.com/questions/149111/what-should-i-set-my-locale-to-and-what-are-the-implications-of-doing-so/149129#149129

https://sourceware.org/glibc/wiki/Locales#Locale_File_Format

https://unix.stackexchange.com/questions/87745/what-does-lc-all-c-do




Manpath: can't set the locale; make sure $LC * and $LANG are correct

https://unix.stackexchange.com/questions/269159/problem-of-cant-set-locale-make-sure-lc-and-lang-are-correct