[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Yoshinori K. Okuji
Sun, 1 Aug 2004 00:34:30 +0200
GRUB 2 has a goal of internationalization. Here, i18n means these things
(in my definition):
- Messages can be translated
- Non-ASCII characters can be used in config files
- Non-ASCII characters can be displayed
- Non-ASCII characters can be used in file names
The last one might not be important, because you don't use non-ASCII in
OS image files normally (consider /boot/vmlinuz). But I think
filesystems should be able to list up file names with non-ASCII
characters at least, so that the user can see some useful information
on the screen when she types "ls".
The first one should be realized in the same way as gettext, but it is
not implemented yet.
The third is possible in my test environment. I tested it with Japanese,
and worked fine.
When you want i18n, you need to define something about character code
and encodings. For now, I select Unicode as the standard character code
and use UTF-8 and UCS-4 as the encodings. I think this is a good idea,
because Unicode makes things a bit simpler than mixing a lot of code
(like ISO2022-JP-2). UTF-8 is used in most places, because it has a
good compatibility with ASCII. UCS-4 is used only in the console stuff
at the moment. This makes the implementation of a console device
easier, because you can represent each character in a fixed size (this
is not completely true, because of ligatures).
Therefore, you must assume that UTF-8 is used when you use strings in
your code. And, you must not assume that the length of a string is
equal to the size of the string on the screen. For example, many
European characters are 2-byte in UTF-8, but they are shown as 1-column
characters on the screen. So you must distinguish between the length
and the column size carefully.
In reality, what does the user want to do? I guess she wants to do
- See messages in her own language
- See titles in the menu in her own language
- Write comments in config files in her own languages
So I'd like to assume that config files are written in UTF-8. Maybe we
can support other encodings, but this requires some heuristic methods
or an explicit declaration of an used encoding. I feel that this is too
much, since people normally do not use UCS-2 or UTF-7 in text files.
But some people may want to use "legacy" chracter code, such as
ISO-8859-1 and EUC-JP.
I'm not sure if it is better to translate error messages from commands.
This should be a bad idea from developers' point of view, but useful
for ordinary users. More thoughts are needed.
Do you have any idea or question?