[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: etc/HELLO markup etc.
Re: etc/HELLO markup etc.
Sat, 22 Dec 2018 11:41:05 -0800
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1
Eli Zaretskii wrote:
If Han unification is the only important user of the charset property,
then yes, we could remove the rest of the charset info from HELLO.
Yes, that's the case.
the current HELLO just keeps the information
that was there before recoding it in UTF-8, nothing was added.
Sure, but the non-Han markup is merely a relic of that file's old method of
encoding, which avoided Unicode and instead used ISO 2022 escape sequences to
switch among various 8- and 16-bit encodings, as that was the only way to show
text in (say) Russian under the constraints of the old method. The non-Han
markup is completely unnecessary now that the file uses UTF-8. (The Han markup
probably isn't needed either, though I also would like Handa's opinion on that.)
Although the etc/HELLO markup might be of interest to those who care about
annotating languages in the text, it's irrelevant to the ordinary purpose of
that file, which is to show textual translations of "Hello"
That's not the original purpose of that file. The purpose is to show
scripts, not languages, and to show how we display different scripts
in the same buffer.
OK, but either way the non-Han markup is irrelevant to the ordinary purpose of
It's still not a good user interface, though, as it is difficult to see the
markup's effect when visiting etc/HELLO in the usual way
If the usual way is via find-file and its ilk, then you should see the
same results as with "C-h h", so I'm not sure I understand what you
I meant that one cannot see the markup's effect when visiting the file with
either C-h h or find-file in the usual way. It's useless markup.
In what way most of what you say is not applicable to etc/enriched.txt
Other forms of enriched-text markup are typically easily visible. If I visit
etc/enriched.txt I can easily see which parts are marked white on blue
background, which parts are marked italic, etc. Invisible enriched-text markup
is much harder to deal with when editing an enriched-text file.
the file is not a good showroom for how to maintain multilingual
What other facilities are you aware of or can suggest for showing
multilingual text with such level of detail and precision?
In practice the most common and often the best way to deal with the situation is
to do what the non-markup part of etc/HELLO is already doing: indicate within
the text itself what language or script is being used, to help the reader who
may be unacquainted with them, and with enough punctuation within the text so
that the reader can easily see what's going on. This technique has been used for
centuries, it's by far the most popular technique in common practice today, and
it suffices for this particular application (with the possible exception of its
Chinese and Japanese text).
It's not a good sign that there seem to be errors in the
possibly-useful (i.e., CJ) markup that nobody has noticed since the
markup was introduced in May, and that I noticed these errors now
only because I was visiting the file literally.
Which errors? I don't think we discovered any errors.
Yes, and that's the point! The approach we're taking is not good for dealing
with the situation.
One example of such an error is that "日本語" has no charset properties even
though it's obviously intended to use a Japanese script (since it follows the
word "Japanese"). I'm sure there are others.
Re: etc/HELLO markup etc., Eli Zaretskii, 2018/12/28
Re: etc/HELLO markup etc., handa, 2018/12/29