Page Body

GNU/Linux Time and Localization

Maintaining the correct time and localization settings help you properly administer a GNU/Linux system and maintain its data integrity.

Note: If you are not familiar with the GNU/Linux command line interface, review the Conventions page before proceeding.


Coordinated Universal Time (UTC)

World clocks run on Coordinated Universal Time (UTC), usually with a fixed offset for the time zone. UTC depends on the International Atomic Time (TAI), which is a weighted average of about three hundred atomic clocks in more than fifty national laboratories throughout the world (atomic clocks synchronize themselves to one another via satellites to a deviation of about 0.1 milliseconds).

One second of UTC is equivalent to one second of TAI.

Leap Seconds

UTC is adjusted to the mean solar time at the meridian of Greenwich, England (also called UT1). The Earth's rotation decelerates slowly due to influences (e.g., tidal friction and plate tectonics), which leads to UT1 seconds being minimally longer than those of UTC.

Therefore, UTC must occasionally put in a leap second, so that the atomic UTC and the astronomical UT1 do not drift too far apart. Officially, UTC and UT1 should not differ by more than 0.9 seconds.

Potentially, leap seconds can happen twice a year, usually at the end of the months of June and December. Typically, they are announced six months ahead. Leap seconds are increasingly considered a nuisance. For example, the leap second of June 30th, 2012 triggered a bug in the Linux kernel that, in extreme cases, lead to computers completely locking up.

Hardware and System clocks

Every computer has a battery-operated hardware or Complementary Metal-Oxide-Semiconductor (CMOS) clock that is set via the firmware and continuously runs, even when the computer is off. The Linux kernel uses the hardware clock during system boot to set the internal system clock (also referred to as the kernel or system clock).

The system clock counts time in consecutive seconds since January 1st, 1970, 00:00:00 UTC (i.e., the UNIX Epoch time). When the system is booted, the current date and time are read from the hardware clock and used to initialize the system clock.

This is done using the hwclock program. The system needs to know whether the hardware clock is set to UTC or the local zone time. The latter may be necessary for the sake of other operating systems on the computer.

The system clock is used for file system time stamps and can be queried with the date command. The hwclock command outputs the hardware clock time. If you require a time representation in your own software, you should not necessarily use the system time, since it is mostly geared towards the internal use of the operating system.


The system clock can be set using the date command. The desired date and time is passed in as an argument:

# date

For example, # date 030319012010.30 sets the clock to March 3rd, 2010, 19:01:30. At a minimum, you need to pass the month, day, hour, and minute (e.g., # date 03031901). Use the -u (--utc, --universal) option to set the clock in UTC.

A more intuitive way to set the system clock is to use date's -s ex_string (--set=ex_string) option:

# date -s ex_string

ex_string can be a value like '2021-08-15 12:25:50'.

Different date and time formats can be specified with the + sign and percent encodings. For example:

$ date +'%Y-%m-%dT%H:%M:%z'

Characters like T, colon (:), and dashes (-) are displayed as is.

Percent encodings include:

A literal %.
Two-digit day.
Two-digit hour in 24-hour time.
Two digit hour in 12-hour time.
Three-digit day of the year (001-366).
Two-digit month.
Two-digit minute.
The number of seconds since January 1st, 1970, the UNIX epoch.
Current second.
Two-digit week of the year, with Sunday as the first day of the week (00-53).
Two-digit ISO week number, with Monday as the first day of the week (01-53).
Locale's time representation (e.g., 23:13:48).
Two-digit year.
Four-digit year.
Time zone offset from UTC.

You can append +'%Y%m%d%H%M%z' and similar constructions in strings or file paths using command substitution (e.g., "/path/to/file/$(date +'%Y%m%d%H%M%z')"). This is useful for appending timestamps.


To set the hardware clock while a system is running, first set the system clock using the date command. Then, transfer the system time to the hardware clock using the -w (--systohc) option:

# hwclock -w

Alternatively, use the --set and --date options to directly set the hardware clock (without the kernel noticing):

# hwclock --set --date=ex_string

Regardless, hwclock tries to store data concerning the systematic deviation of the hardware clock to the /etc/adjtime file. As a rule, hardware clocks are imprecise.

Popular GNU/Linux distributions transfer the system clock's time to the hardware clock on shutdown. This is based on the premise that the system time can be precisely kept via network synchronization methods like the Network Time Protocol (NTP).

Time Zones

The time zone on a GNU/Linux system is not a unique, system-wide setting, but belongs to the inheritable properties of a process. The Linux kernel measures time in seconds since January 1st, 1970, midnight UTC, so when dealing with time zones, it is a question of formatting this number of seconds.

The time zone is set in the /etc/timezone file. This file contains the name of an entry for one of the files in the /usr/share/zoneinfo/ directory (e.g., America/Los_Angeles). Files in /usr/share/zoneinfo/ are not readable text files and contain time zone data like the offset from UTC, the daylight saving time rules, and similar details.

/etc/localtime is a copy of (or symbolic link to) the file in /usr/share/zoneinfo/ that contains the information for the time zone specified in /etc/timezone. It may be prudent to have a copy of this file at /etc/localtime because there are system configurations where /usr/ may be on a separate partition or on a different machine. System time is important enough that it should be correctly available early in the boot process.

You can override a time zone setting by setting the TZ environment variable (e.g., TZ=Asia/Hong_Kong). The TZ variable can be used to describe time zones without having to use the data stored in /usr/share/zoneinfo/. In the simplest case, you can specify the abbreviated name of the desired time zone and the offset from UTC.

The offset must have an explicit sign (+ for time zones west of the prime meridian, - for time zones east of it), followed by a time span in the format HH:MM:SS (the minutes and seconds are optional, and there are currently no time zones with a seconds offset). For example, export TZ=CET-1 would select Central European Time, but without considering daylight saving time (DST).

To specify DST, too, you have to give the name of the DST time zone, its offset from normal time (in case it is not plus one hour), and a rule for switching to and from DST. The DST rule consists of a day specification and an optional time specification (separated by a slash), where the day specification may take one of three forms:

  1. Jn The day number within the year, counted from 1 to 365. February 29th is ignored.
  2. n The day number within the year, counted from 0 to 365. In leap years, February 29th is counted.
  3. Mm.w.d Day (d) of week (w) in month (m). d is between 0 (Sunday) and 6 (Saturday), w is between 1 and 5, where 1 is the first week in the month and 5 is the last one, and m is a value between 1 and 12.

Time zones are named after the most populous city in the part of the country in question that the time zone in question applies to. For example, Switzerland is covered by Europe/Zurich and Russia uses eleven time zones in total, not all of which are counted under Europe.

/usr/share/zoneinfo/ also contains convenience time zones like Poland or Hongkong. Zulu refers to UTC, which is often given as 12:00Z, where the North Atlantic Treaty Organization (NATO) spells Z as ZULU=.

GNU/Linux provides various tools to manage time zone files. The time zone compiler, zic, lets you create your own time zone files and convert them to the format required by the C library. The zdump command outputs a time zone file (or most of its content) in a readable format.

You can output the times in multiple time zones using zdump and watch:

$ ZONES='Asia/Tokyo Europe/Berlin America/New_York'
$ watch -t zdump "${ZONES}"

The watch command repeatedly runs a command, displaying its output and errors (the first screenful), until interrupted (enter Control+c). By default, the command is run every two seconds and watch will run until interrupted. The -t (--no-title) option turns off the header showing the interval, command, and current time at the top of the display, as well as the following blank line.

The manual page for tzset has more information on time zone conversion information.

Network Time Protocol (NTP)

Often, it is important for all hosts on a network to use approximately the same system time. Network file systems (e.g., NFS) or authentication infrastructure (e.g., Kerberos) can be disrupted by computers whose time noticeably deviates from that of the server, and stable system operation cannot be assured in that way. It is best to automatically synchronize the clocks of all hosts on the local network as much as possible.

Also, it is sensible to synchronize the clocks of all network computers not only to each other, but to an accurate external time base, like an atomic clock. Likely, you will have to resort to using a time server accessible via the Internet. Publicly available time servers are often operated by universities or Internet Service Providers (ISPs).

The Network Time Protocol (NTP) was designed to address these needs. Details on NTP can be found in RFC1305 or on


Likely, the NTP server software and related tools discussed in this section are not available by default on your GNU/Linux distribution. To properly install the required NTP server package, refer to your GNU/Linux distribution's documentation.


The daemon for time synchronization is ntpd. ntpd can act as a client and communicate via NTP with radio-controlled clocks or time servers. Alternatively, ntpd can act as a server and pass its synchronized time on to other hosts.


ntpd is configured by means of the /etc/ntp.conf file. An example /etc/ntp.conf file could look something like this:

server  # Local clock -- Not a good time source
fudge stratum 10  # Unsynchronised

# Time servers from the public pool
server iburst
server iburst
server iburst

# Miscellaneous
driftfile /var/lib/ntp/ntp.drift
logfile /var/log/ntp

The first server entry (i.e., server relates to the local clock, which is not considered reliable and is only used in emergencies (e.g., if no time server can be reached). The stratum value (i.e., stratum 10) describes the distance of the clock from the official atomic time. A computer that is directly connected to the atomic clock is at stratum 1, and a computer that gets its time from that computer is at stratum 2, and so on.

The hierarchy of NTP servers known as Stratum are:

Stratum 0
A reference time source, e.g., the Naval atomic clock.
Stratum 1
NTP servers that get their time from a reference time source, like an atomic clock.
Stratum 2
NTP servers that get their time from stratum 1 servers.
Stratum 3
NTP servers that get their time from stratum 2 servers.
Stratum n
NTP servers that can continue to a depth of 256 strata.

The iburst option on the server lines (e.g., server iburst) ensures that ntpd will quickly acquire the current time when it is starting up.

You can add an entry for a network time provider like this:

server ex_time_server_IP_address_or_DNS_name

A system running ntpd can simultaneously function as both a time consumer (client) and a time provider (server). It assumes that all systems involved are configured to use UTC time.

To reduce the load on public time sources, a very limited number of systems on an internal network should be configured to sync time with the public time provider. For example, a single Stratum 3 server can be configured on a network that gets its time from a public Stratum 2 server on the Internet. Then, all internal hosts can be configured to get their time from that single Stratum 3 server.

Network Time Protocol (NTP) Terms

Large time adjustments.
Small time adjustments.
Insane Time
When the difference between the time provider and time consumer are more than seventeen minutes off. This is how ntpd refers to time.
Incidental clock frequency errors. This represents the drift of the motherboard's hardware clock measured in parts per million (PPM). Drift is recorded in the /var/lib/ntp/drift/ directory in a driftfile.

NTP Pools

NTP with many clients can heavily tax a network or a time server. If you do not have direct access to a time server, the best approach is to use the NTP pool. The NTP pool consists of various publicly available time servers that are accessed by clients by means of a DNS round-robin scheme.

This means that an address like points fairly randomly to one of several thousand public time servers anywhere in the world. Since all of these servers roughly provide the same time, this is not an issue for clients. However, for server operators, this means that the load is equally shared, rather than being concentrated on a few time servers because their names are especially well-known.

In practice, you should specify three time servers from the NTP pool. You may get a better quality time by concentrating on geographic partial pools.


The above configuration example helps keep the network load low. If your ISP offers a time server, you can also use that and two time servers from the pool. Newer versions of ntpd support the pool directive, which is optimized for the use of NTP pools (e.g., pool You can specify more than one pool directive (duplicate servers will be removed), but one is usually enough.

If you consider synchronizing the time for a complete network to the NTP pool, you should make one of the computers on your network an NTP server and synchronize only this server to the NTP pool. The other hosts on your network should obtain their time from your local time server.

In this case, you should probably not confine yourself to a single time server. Configure at least two time servers (e.g., and and point one to the other one with peer # On ntp1, and the other way around on ntp2, which means that the two time servers can synchronize to each other.

On the clients, use the following configuration:

server iburst
server iburst

The above lets you tolerate losing one time server, but your Internet connection to the external time servers remains a reliability bottleneck. If you want to be sure, you need several independent Internet connections. If you do not have these other connections, it may be cheaper to buy a few GPS receivers.

Broadcast Servers

If you have a large local network, the constant synchronization messages of the various ntpds can create a considerable load on the network. In such a situation, it is wise to configure the time server as a broadcast server that periodically sends unsolicited time announcements to network hosts.

With a directive like broadcast, you can turn your ntpd into a broadcast server that will send time announcements to the network. The broadcast server must still get its own time from somewhere. Therefore, you will still need the server or pool directives. On the clients, you should use the broadcastclient directive. server or pool are not required there.

With the broadcast approach, an attacker can easily impersonate a broadcast server in order to distribute spurious time announcements. To avoid this, time announcements should be cryptographically authenticated (this is the default, and must be explicitly deactivated in case it is not desired).

In the simplest case, you can generate a set of symmetric keys using the ntp-keygen command:

# mkdir '/etc/ntp-keys' && cd '/etc/ntp-keys' &&
    ntp-keygen -M
Built against OpenSSL OpenSSL 1.1.1g  21 Apr 2020, using version OpenSSL 1.1.1k  25 Mar 2021
Generating new md5 file and link

ntp-keygen -M generates a /etc/ntp-keys/ntpkey_MD5key_tails.3843293183 key file, as well as a symbolic link ntp.keys in the /etc/ntp-keys/ directory. The file contains ten MD5 keys and ten SHA1 keys that you get to pick from.

The index (left-hand column) of the desired key must then be specified in the /etc/ntp.conf file:

keys /etc/ntp-keys/ntp.keys
broadcast key 1

The key file must also be available on the clients. There, you need to enter the following lines in /etc/ntp.conf:

keys /etc/ntp-keys/ntp.keys
trustedkey 1

The symmetric keys should not be readable by ordinary users. Newer versions of ntpd also support an asymmetric encryption scheme.


The /var/lib/ntp/drift/ntp.drift file is used to store the systematic drift of the hardware clock. ntpd must observe the hardware clock for some time to do so, but then works without constantly referring back to the time servers.

When an adjustment is necessary, you can approximately set the system clock with the # sntp -S ex_ntp_server command and then set the hardware clock with the system clock using the previously mentioned # hwclock -w command.

In situations where intermittent Internet connections are expected, chrony can be used as a replacement for ntpd (by default, GNU/Linux distributions like Fedora use chronyd). For more information on chrony, refer to


The ntpq command can be used to control NTP servers. It supports a number of commands that you can use to talk to NTP servers and query data, or (if you have the appropriate access rights) change their configuration. Commands can be specified on the command line or at an interactive prompt.

# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
 0.debian.pool.n .POOL.          16 p    -   64    0    0.000   +0.000   0.000
 1.debian.pool.n .POOL.          16 p    -   64    0    0.000   +0.000   0.000
 2.debian.pool.n .POOL.          16 p    -   64    0    0.000   +0.000   0.000
 3.debian.pool.n .POOL.          16 p    -   64    0    0.000   +0.000   0.000
-ny-time.gofile.     2 u   35  128  377  150.784   +3.000   2.036
+time.cloudflare       3 u   46  128  377   79.630   +5.018   8.221
-ntp1.ny1.ap.fou     2 u   56  128  377  147.268   +4.395   1.551
#   3 u   42  128  377  148.875   +4.301   1.264   2 u   63  128  377   82.415   +5.398   2.452        .PPS.            1 u   36  128  377   91.929   +1.319   1.143
+t1.time.gq1.yah     2 u   45  128  377  100.901   +3.979   1.641
+time.richiemcin    2 u   45  128  377  149.135   +3.969   1.401
+2601:603:b7f:fe    2 u   60  128  377  110.596   +5.737   1.547
-t2.time.bf1.yah      2 u   55  128  377  160.067   +5.163   1.552
*ntp5-1.mattnord   2 u  101  128  377  118.654   +4.840   1.006

For example, in the output of a # ntpq -c peers command, there are numerous columns of data:

The (possibly remote) time server.
The source that the time server obtains its time from.
The stratum, or the time server's distance from the atomic clock.

Specifies which role your host plays with regard to the remote host. Possible values include:

  • u unicast or manycast client
  • b broadcast or multicast client
  • p pool source
  • l local (reference clock)
  • s symmetric (peer)
  • A manycast server
  • B broadcast server
  • M multicast server
Denotes the period of time since the last contact with the remote server.
A number without a unit refers to minutes. h and d after the number stand for hours and days, respectively.
Specifies the polling frequency.
Intimates how successful recent queries to the server were. Interpret the value as the octal representation of an 8-bit shift register, where the least significant bit represents the most recent query. Therefore, the value 17 stands for four successful queries (unsuccessful queries are represented as value-zero bits).
Lists the offset, i.e., the mean difference between the times on this host and those on the remote host's time signals, or the root mean square (RMS) of the difference of successive time announcements (in milliseconds).
Displays the offset of the server relative to the host.
The size of the time discrepancies between two samples (in milliseconds).

The first character of each # ntpq -c peers output line denotes the status of the remote host:

The remote host does not talk to this host, is the host, or uses this host as a time source.
x, -
This remote host is ignored because its time does not appear accurate enough.
A good remote host, but it is still being ignored (because there are better ones). Qualifies as a stand-in.
A good remote host that is being taken into account.
Currently the preferred (primary) remote host.

The refid column can assume various values. Likely, you will see either an IP address or one of the common abbreviations:

Local clock of the unreliable kind (i.e., with a very high stratum value).
The pulse per second. A very accurate time signal, e.g., an atomic clock or a GPS receiver. GPS satellites are considered atomic clocks. A host with a GPS receiver is therefore considered to be on stratum 1.
The maximum inaccuracy can be measured in microseconds per second. The PPS signal only provides a precise sequence of seconds, similar to a metronome. You need to get the actual time from elsewhere.
The WWVB time signal, which is broadcast from a long-wave radio transmitter in Fort Collins, CO and can be received throughout most of North America. The possible accuracy depends on the amount of inconvenience that you want to experience on the receiving side.
Very simple and cheap receivers like those in popular radio-controlled clocks and watches synchronize to a precision of + or - 0.1 seconds.
The remote end is a network where this host serves as a broadcast server.

Network Time Protocol (NTP) Considerations

Using a time server on the Internet means not having to set up and maintain additional hardware, but there can be noticeable network load. It is also necessary to configure a firewall such that your NTP client (which can then act as a time server on the next-higher stratum inside your network) can contact the time server on the Internet, which means another possible attack vector.

The radio-controlled clock is another peripheral that can develop faults and must be maintained. Fortunately, radio-controlled clocks are very cheap, so you can operate two or three clocks, for redundancy. In a security-minded environment, a radio-controlled clock on a time server inside the demilitarized zone (DMZ) may be the method of choice, since firewall security is not compromised.

By default, the broadcast server sends a datagram containing the current time every 64 seconds. If a freshly started ntpd on a broadcast client receives such a datagram, it waits for a random (brief) interval and then starts a number of direct queries to the broadcast server, in order to set its clock and calibrate the connection.

After that, it only listens for further broadcast datagrams and slows down or speeds up the clock to reconcile any differences. If the running clock on the host differs too much from the time available via NTP, it prefers doing nothing over setting the time with a large jump.

You need to restart ntpd in order to activate the automatic clock setting mechanism:

# systemctl restart ntp (Debian)

# systemctl restart ntpd (Fedora)


If you do not intend to serve NTP to networked clients or connect to local hardware clocks, you can use a simple NTP client called systemd-timesyncd. By default, GNU/Linux distributions like Debian are configured to use systemd-timesyncd.

systemd-timesyncd is a daemon for synchronizing the system clock across the network. It implements a Simple Network Time Protocol (SNTP) client that focuses only on querying time from one remote server and synchronizing the local clock to it.

Stratum servers are configured in /etc/systemd/timesyncd.conf. Here is an example configuration file from a Debian system:

#  This file is part of systemd.
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
# See timesyncd.conf(5) for details.


Set the local time from one of your NTP servers:

# sntp ex_ntp_server

Finally, activate the NTP service in systemd:

# timedatectl set-ntp true

Time Commands

timedatectl, timedatectl status
Show the current settings of the system clock and hardware clock (RTC), including whether network time synchronization through systemd-timesyncd.service is active. This command shows the system time zone, as well.
# date MMDDhhmm[[CC]YY][.ss]
Set system date and time.
The items in square brackets are optional. The year can be either two or four digits. It the year is not specified, it is assumed to have stayed the same. If the seconds are not given, the value is assumed to be 0.
The minimum information required is the month, day, hour, and minute (e.g., # date 01031204). MM is for month, DD is for day, hh for is hour, and mm is for minute.
date +'%s'
Display the seconds since the UNIX epoch. This can be a useful value for timestamps.
A more comprehensive timestamp value can be displayed with date +'%Y%m%d%H%M%S'.
less /etc/adjtime, timedatectl status | grep 'local'
Determine whether the hardware clock is using UTC or local time.
UTC is preferable on machines running only GNU/Linux. Zone time may be better if the computer is running other operating systems, as well.
# hwclock, # hwclock -r, # hwclock --show, # hwclock --get
Read the hardware clock and print its time to the standard output in the ISO 8601 format. The time is shown in local time, even if the hardware clock is in UTC.
The seconds at the end of the output is the time between when the command was issued and the hardware clock was read.
# hwclock -l, # hwclock --localtime, # hwclock -u, # hwclock --utc
Indicate which timescale the hardware clock is set to.
The hardware clock may be configured to use either the UTC or the local timescale, but nothing in the clock says which one is being used. The -l and -u options give this information to the hwclock command.
If you specify neither -l nor -u, the one last given with a set function (e.g., -w, -s), as recorded in /etc/adjtime, is used. If the /etc/adjtime file does not exist, the default is UTC.
# hwclock -w, # hwclock --systohc
Set the hardware clock from the system clock and update the timestamps in /etc/adjtime.
# hwclock -s, # hwclock --hctosys
Set the system clock from the hardware clock.
# timedatectl set-ntp false && timedatectl set-time 'YYYY-MM-DD hh:mm:ss'
Set both the system and hardware clock date and time. Optionally, you can choose to set only the date or only the time.
# adjtimex
Display or set the kernel time variables.
You may need to install this command from your distribution's repository (e.g., # apt install adjtimex). Any user may display time variables, but only the root user can change them.
adjtimex is used for non-ntpd contexts and is used to correct the system time for systematic drift. It gives you raw access to the kernel time variables (e.g., # adjtimex -t 9999 -f 485452, where -t is short for --tick and -f is short for --frequency). To see the current values of tick and frequency, run adjtimex --print.
# sntp -S ex_ntp_server
Set the local clock from a synchronized specified server.
Monitor ntpd operations and determine performance. This command can be run interactively or using command line arguments.

ntpq output includes the following columnar data:

The hostname or IP address of the time provider.
The type of reference source.
The stratum of the time provider.

The role your host plays in regard to the remote host. Possible values include:

  • u unicast or manycast client
  • b broadcast or multicast client
  • p pool source
  • l local (reference clock)
  • s symmetric (peer)
  • A manycast server
  • B broadcast server
  • M multicast server
The number of seconds since the last time poll.
The number of seconds between two time polls.
Whether or not the time server was reached in the last poll.
The time (in milliseconds) that it took for the time provider to respond to the request.
The time difference between the local system clock and the time provider (in milliseconds).
The size of the time discrepancies between two samples (in milliseconds).

The peers command can be used to see details about each of the servers defined with the server keyword in /etc/ntp.conf.

The association command gives you more details about each server, including how well the remote server is performing.

If a server is prefixed with a +, it is a good time source, but not the main server. Anything else and the server is not considered to be good for time synchronization and will be monitored.

# ntptrace ex_ntp_server
Determines where a given NTP server gets its time from and follows the chain of NTP servers back to their master time source. If given no arguments, it starts with localhost.

Internationalization and Localization

Internationalization (or i18n for short, since there are 18 letters between the first and last character of the word) is the preparation of a software system so that localization becomes possible.

Localization (L10n) is the adaption of a software system to the local customs of different countries or cultural groups. The primary aspect of localization is the language of the user interface, including the messages printed by the system.

Another important aspect is the data that is being processed by the system. Such data may require special character encodings and input facilities. Aspects like notations for dates, times, and currencies, the collating order of alphabetic characters, and other minor details are also covered by localization.

For the Linux kernel, internalization is not a pressing issue, since there is widespread consensus that the kernel should not be loaded with error messages in multiple languages. The expectation is that anyone that gets to see these messages understands enough English to be able to interpret them.

On the other hand, GNU/Linux distributions contain vast amounts of application software that can benefit from localization. The major distributions are available in a wide variety of localized versions. There are also diverse special GNU/Linux distributions that concentrate on specific cultural groups and attempt to support them well.

While the commercial software manufacturers often set local subsidiaries or paid contractors to do the localization work, the localization of Free/Libre Open Source (FLOSS) software is mostly done by volunteers.

The most important prerequisite for the internationalization and localization of programs in foreign languages is that the system needs to be able to display the script of the language in question. The traditional character encoding for computers is American Standard Code for Information Technology (ASCII).

When taking into account the requirements of foreign languages, ASCII was often found to be inadequate.

American Standard Code for Information Technology (ASCII)

The American Standard Code for Information Technology (ASCII) represents 128 different characters, of which 33 (positions 0-31 and position 127) are reserved for control characters (e.g., line feed, horizontal tabulation, and bell). The remaining 95 characters include uppercase and lowercase letters, digits, and a selection of special characters, mostly punctuation marks.

ISO/IEC 8859

As pressure from international computer users mounted, a transition from ASCII to extended character sets took place, which were able to use all 256 possible values of a byte to encode characters.

The most widely used extended character sets are those described in the ISO/IEC 8859 standard, which includes character sets for many different languages. ISO/IEC 8859 actually consists of a set of numbered, separately published parts that are often considered separate standards.

The focus of the ISO/IEC 8859 standard is on information interchange, rather than elegant typography. So, various characters necessary for beautiful output are missing from the encodings.

ISO/IEC 8859 does not address Oriental languages like Chinese or Japanese because the character set of these languages by far exceed the 257 characters that fit into a single ISO/IEC 8859 code table.

Universal Coded Character Set (Unicode) and ISO 10646

The Universal Coded Character Set (Unicode) and ISO 10646 are parallel efforts to create one single character set to cover all alphabets of the world. Initially, both standards were separately developed, but were later merged after the world's software developers eschewed the complexity of ISO 10646.

Today, Unicode and ISO 10646 standardize the same characters with identical codes. The difference between the two is that ISO 10646 is a pure character table (i.e., an extended ISO 8859), while Unicode contains additional rules for details like lexical sorting order, normalization, and bidirectional output. With Unicode, characters also have various extra properties that indicate the ways in which a character can be combined with others.

The ISO 10646 character set contains just letters, digits, and punctuation marks, as well as ideographs (e.g., Chinese and Japanese characters), mathematical characters, and more. Each of these characters is identified by a unique name and an integer that is called a code point.

There are over 1.1 million code points in the character set, of which only the first 65,536 are in common use. These are also called basic multilingual plane (BMP).

Basic Multilingual Plane
"Roadmap to Unicode BMP.svg" by Drmccreedy is in the public domain

Unicode and ISO 10646 code points are written in the form U+0040, where the four digits represent a hexadecimal number.

Universal Coded Character Set (UCS-2) and Universal Coded Character Set Transformation Format - 8-bit (UTF-8)

Unicode and ISO 10646 specify code points (i.e., integer numbers for the characters in the character set), but do not specify how to handle these code points. Encodings are defined to explain how to represent the code points inside a computer.

The simplest encoding is the Universal Coded Character Set (UCS-2), in which a single code value between 0 and 65,535 is used for each character, which is represented by two bytes. Therefore, UCS-2 is limited to characters in the BMP. Also, Western-world data, which would otherwise be represented by an 8-bit encoding like ASCII or ISO Latin-1, require twice the storage space, since two bytes are used per character instead of one.

The Universal Coded Character Set Transformation Format - 8-bit (UTF-8) encoding is capable of representing any character in ISO 10646, while maintaining backward compatibility with ASCII and ISO-8859-1. It encodes the code points U+0000 and U+10FFFF using one to four bytes, where the ASCII characters occupy a single byte only.

The design goals of UTF-8 are:

ASCII characters represent themselves.
This makes UTF-8 compatible with all programs that deal with byte strings (i.e., arbitrary sequences of 8-bit bytes), but assign special meaning to some ASCII characters.
No first byte appears in the middle of a character.
If one or more complete bytes are lost or mutilated, it is still possible to locate the beginning of the next character.
The first byte of each character determines its number of bytes.
This ensures that a byte sequence representing a specific character cannot be part of a longer sequence representing a different character, and makes it efficient to search strings for substrings at the byte level.
The byte values FE and FF are not used.
These bytes are used at the beginning of UCS-2 texts in order to identify the byte ordering inside of the text. Since these characters are not valid UTF-8, UTF-8 documents and UCS-2 documents cannot be confused.

At present, UTF-8 is the encoding of choice for representing Unicode data on a GNU/Linux system. Run man 7 utf-8 for more information.


The iconv command converts text from one character encoding to another. In the simplest case, it converts the contents of the files specified on the command line from one character encoding to another.

By default, the result is written to the standard output. The to and from encodings are specified with the -f ex_from_encoding (--from-code=ex_from_encoding) and -t ex_to_encoding (--to-code=ex_to_encoding) options, respectively:

iconv -f ex_from_encoding -t ex_to_encoding ex_file...

If no from encoding is given, the default is derived from the current locale's character encoding. If no to encoding is given, the default is also derived from the current locale's character encoding.

The -o ex_output_file (--output=ex_output_file) option can be used to directly write the output to a file:

iconv -f ex_from_encoding -t ex_to_encoding -o ex_output_file ex_file...

If no input file is given (i.e., ex_file...), iconv reads its standard input.

When iconv encounters an invalid character in its input, it reports an error and exits. This can be countered by appending the //IGNORE or //TRANSLIT suffixes to the target encoding.

If you append the string //IGNORE to ex_to_encoding, characters that cannot be converted are discarded and an error is printed after the conversion.

If the string //TRANSLIT is appended to ex_to_encoding, characters being converted are transliterated when needed and possible, i.e., when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters outside the target character set that cannot be transliterated are replaced with a question mark in the output.

$ echo 'xäöüy' | iconv -f 'UTF-8' -t 'ASCII//IGNORE'
iconv: illegal input sequence at position 9
$ echo 'xäöüy' | iconv -f 'UTF-8' -t 'ASCII//TRANSLIT'

The -c option can be used to silently drop invalid characters:

$ echo 'xäöüy' | iconv -c -f 'UTF-8' -t 'ASCII'

The -l (--list) option is used to list all known character set encodings:

$ iconv -l | head

Keep in mind, iconv cannot necessarily successfully convert content between arbitrary pairs of these encodings.


In contrast to iconv, the convmv command converts the encoding of filenames, not file contents:

convmv -f ex_from_encoding -t ex_to_encoding ex_file...

convmv -f ex_from_encoding -t ex_to_encoding ex_directory...

As with iconv, the -f ex_from_encoding and -t ex_to_encoding options are used to specify the from and to encodings, respectively. If a directory is provided instead of a file, convmv will operate on all filenames in the directory tree/file system (the -r option can be used to make convmv go through directories recursively, as well).

Likely, you will need to install the convmv command from your distribution's repository (e.g., # apt install convmv, # dnf install convmv).


Installed locales can be viewed by using the locale command's -a (--all-locales) option:

$ locale -a | head

Each locale is represented in terms of two or three variables:

  1. Language code (ISO 639)
  2. Country code (ISO 3166)
  3. Encoding (optional)

American English is in the en_US.UTF-8 locale.

The system's locale determines:

  • The language and encoding of text displayed on the screen.
  • Your character sets.
  • The default sort order.
  • The default number format.
  • The default currency format.
  • How the date and time are displayed.

Locale Categories

Your locale settings are determined by the values assigned to the following locale categories:

Specifies the default local value for all LC_ locale categories.
Overrides LC_MESSAGES.
Configures the default address format.
Overrides all other LC locale categories.
Configures your sorting rules.
Configures the default character type and encoding.
Configures the default measurement unit.
Configures natural language messages.
Configures your currency format.
Configures the default personal name format.
Configures your number format.
Configures your default paper size.
Configures the default telephone number format.
Configures the date and time display.

You can use the -k (--keyword-name) option to determine what the value of a locale category means:

$ locale -k 'LC_PAPER'

The actual definitions that these settings are based on are located in the /usr/share/i18n/locales/ directory. You can create your own locale definition and the localedef program will take care of the work required to implement it into your system (i.e., the command compiles locale definition files).

Locale Precedence

The system uses the following approach to figure out which setting is authoritative:

  1. If LANG is set, its value counts.
  2. If LANG is not set, but the LC_* category for the topic in question is set, its value counts.
  3. If neither LANG nor the appropriate LC_* category are set, but LC_ALL is set, then its value counts.
  4. When none of these categories are defined, a compiled-in default value is used.


The LC_CTYPE locale category can be assigned a locale value using the following syntax:


The syntax uses these values:

Specifies the ISO 639 language code to be used (specified in lowercase).
Specifies the ISO 3166 country code to be used (specified in uppercase).
Specifies the character set to be used.
Specifies other locale attributes, e.g., dialect or currency.


With LC_ALL, you can simultaneously set all locale parameters.


A LANG value of C or POSIX (which are equivalent) describes the built-in default that programs use if they cannot find another valid setting. This is useful if you want a program to deliver a predicable output format.

$ LANG=de_DE.UTF-8 ls -l '/bin/ls'
-rwxr-xr-x. 1 root root 137912  8. Jun 03:26 /bin/ls
$ LANG=ja_JP.UTF-8 ls -l '/bin/ls'
-rwxr-xr-x. 1 root root 137912  6月  8 03:26 /bin/ls
$ LANG=fi_FI.UTF-8 ls -l '/bin/ls'
-rwxr-xr-x. 1 root root 137912  8. 6. 03:26 /bin/ls
$ LANG=C ls -l '/bin/ls'
-rwxr-xr-x. 1 root root 137912 Jun  8 03:26 /bin/ls

Above, four different LANG settings yield four different results, all of which result in different date stamps. Depending on the language setting, the date stamp can appear to programs like awk or cut -d' ' to consist of one, two, or three fields, which is fatal if a script is designed to parse this output. In such ambiguous cases, it is best to fix the output of programs whose output depends on the language setting to a standard that will definitely exist (i.e., use an explicit LANG=C).

The system language is not really a property of the entire system, but a parameter of each individual session. In the normal flow of operations, the login shell or graphical desktop environment is initialized with a specific language setting, and the subprocesses of this shell inherit this setting like the current working directory and resource limits of processes.

The controlling factor for the language of a session is the value of the LANG locale category. In the simplest case, it consists of three parts:

  1. A language code according to ISO 639
  2. An underscore character
  3. A country code as per ISO 3166

The country code is important because the languages in two countries may differ even though they use the same language in principle.

Some extensions may follow this plain specification, such as a character encoding (separated by a period) or a variant (separated by a @). The following are all valid values:

de_DE.ISO-8859-15     German German, according to ISO Latin-9
de_AT.UTF-8           Austrian German, Unicode/UTF-8-based
de_DE@euro            German German, including the Euro sign (ISO Latin-9)

Here is an example of how different LANG settings affect the output:

$ for i in 'en_US' 'de_DE' 'de_AT' 'fi_FI' 'fr_FR'; do
>   LANG="$i.UTF-8" date +'%B %Y'
> done
August 2021
August 2021
August 2021
elokuu 2021
août 2021

The example above presupposes that the system in question provides support for the given language. For example, Debian lets you pick which settings should be supported and which should not. If you select a setting that your system does not support, the system falls back to a built-in default, which is usually English.

The value of the LANG locale category not only influences the interface language, but all of the cultural setup of a GNU/Linux system.

This includes:

  • Time and date formatting
  • Number and currency formatting Often used by programs that make use of the printf() and scanf() C functions. Other programs have to query the variable themselves and accordingly format their output.
  • Character classification The classification of a character as a letter, a special character, or whatever depends on the language.
  • Character collating order In dictionaries and similar publications, umlauts (i.e., letters with diacritical marks) are considered equivalent to their base characters, while in name lists, like phone number directories, umlauts are sorted according to their transliteration.

    Usually, dictionaries go by the structure of the ideographs and their numbers of strokes when collating ideograph-based languages (e.g., Japanese and Chinese), while computers sort according to Latin transliterations.


The LANGUAGE locale category is only evaluated by programs that use the GNU gettext infrastructure to translate their messages into different languages. The most obvious difference between LANGUAGE and LANG is that LANGUAGE allows you to enumerate multiple languages (separated by colons).

This lets you specify a list, like so:


The above means English, or else French, or else German. The first language that a program actually features message for becomes the used language. LANGUAGE is preferred over LANG, at least for programs that use GNU gettext.

Localization Commands

timedatectl list-timezones
List available time zones, one per line.
zic ex_file...
Read text from ex_file... and create the time conversion information files specified in this input. If filename is -, the standard input is read.
zdump ex_time_zone...
Print the current time in each ex_time_zone zone file.
# timedatectl set-timezone ex_time_zone
Set the time zone to the specified value. Available time zone files can be displayed with the timedatectl list-timezones command.
iconv -l, iconv --list
List all known character set encodings.
iconv -f ex_from_encoding -t ex_to_encoding ex_file...
iconv --from-code=ex_from_encoding --to-code=ex_to_encoding ex_file...
Convert text from one character encoding to another. Output is sent to the standard output.
The -o ex_output_file (--outupt=ex_output_file) option can be used to directly write the output to a file, e.g., iconv -f ex_from_encoding -t ex_to_encoding -o ex_output_file ex_file....
Invalid character input errors can be addressed by appending the //IGNORE or //TRANSLIT suffixes to the target encoding (e.g., -t 'ASCII//IGNORE', -t 'ASCII//TRANSLIT'). Alternatively, you can tell iconv to silently drop invalid characters with the -c option.
convmv -f ex_from_encoding -t ex_to_encoding ex_file...,
convmv -f ex_from_encoding -t ex_to_encoding ex_directory...
Convert filenames from one character encoding to another.
If a directory is provided instead of a file, convmv will operate on all filenames in the directory tree/file system (the -r option can be used to make convmv go through directories recursively, as well).
locale -a
Display a list of all available locales.
locale -k ex_locale_keyword_or_category,
locale --keyword-name ex_locale_keyword_or_category
For each keyword whose value is being displayed, also include the name of that keyword, so that the output has the format keyword=value.
localedef -f ex_charmap_file -i ex_input_file ex_output_path,
localedef --charmap=ex_charmap_file --inputfile=ex_input_file ex_output_path
Read the indicated charmap and input files, compile them to a binary form, and place them in the output path.


You can find more information on the commands discussed above by examining the Linux User's Manual, either at the command line or online.

Additional information can be found at the following resources:

Enjoyed this post?

Subscribe to the feed for the latest updates.