Skip to content

Starting with localization #3997

Open
Open
@tertsdiepraam

Description

@tertsdiepraam

TL;DR: I want to add a new util for locale generation and provide locale-aware functionality in uucore


uutils is currently following the C locale for most of its operations and the locale settings of the system are mostly ignored. This has led to issues and PRs like these:

We've mostly been putting this off due to missing libraries in Rust, but recently, this has changed with the release of icu4x. It covers many of the things we need like locale-aware datetime formatting, locale-aware collation, etc..

However, it requires data to operate on, which is different from the usual data generated by locale-gen and friends (if I understand correctly). There are essentially 2 viable ways to include data with icu4x1:

  1. Store a blob on the filesystem to read at runtime (BlobDataProvider).
  2. Encode the data as Rust code included in the binary (BakedDataProvider).

Since we don't know up front what locales we might need, I think we need to use the BlobDataProvider and allow the user to generate their own locale data on command. So, I propose we do the following:

  1. Add a new util, called locale-gen or something similar
    • This util downloads and stores the locale data in a global directory (I'm not sure where, could also be controlled by an environment variable).
    • This util would be a wrapper around the icu_datagen crate2.
    • It could also read from system config files and install any necessary locales based on the system config automatically.
    • Since this util needs access to the internet, we will run into similar issues like we did with uudoc back when it automatically downloaded examples, so it needs to be optional.3
  2. Create locale-aware functionality in uucore as much as possible, so that the utils themselves don't have to bother with checking the right environment variables, loading the icu data, etc..
    • For example, to check the collation locale, the LC_COLLATE, LC_ALL and LANG env vars need to be checked.
    • For the utils, we then just expose a sort/collate function that checks (and caches) the locale and performs the correct collation.
  3. Change the utils to use the locale-aware functions provided by uucore.

Do you see any problems with this approach? Are there alternatives we should explore first?

Footnotes

  1. They also have FsDataProvider which is meant for development only.

  2. This crate also has a CLI, but we need to tailor it for use with coreutils, by setting nicer defaults for our purpose.

  3. icu_datagen uses reqwest, which will lead to similar problems as in https://github.com/uutils/coreutils/pull/3184

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions