[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: etc/HELLO markup etc.

From: Paul Eggert
Subject: Re: etc/HELLO markup etc.
Date: Sat, 22 Dec 2018 11:41:05 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

Eli Zaretskii wrote:

If Han unification is the only important user of the charset property,
then yes, we could remove the rest of the charset info from HELLO.

Yes, that's the case.

the current HELLO just keeps the information
that was there before recoding it in UTF-8, nothing was added.

Sure, but the non-Han markup is merely a relic of that file's old method of encoding, which avoided Unicode and instead used ISO 2022 escape sequences to switch among various 8- and 16-bit encodings, as that was the only way to show text in (say) Russian under the constraints of the old method. The non-Han markup is completely unnecessary now that the file uses UTF-8. (The Han markup probably isn't needed either, though I also would like Handa's opinion on that.)

Although the etc/HELLO markup might be of interest to those who care about
annotating languages in the text, it's irrelevant to the ordinary purpose of
that file, which is to show textual translations of "Hello"

That's not the original purpose of that file.  The purpose is to show
scripts, not languages, and to show how we display different scripts
in the same buffer.

OK, but either way the non-Han markup is irrelevant to the ordinary purpose of the file.

It's still not a good user interface, though, as it is difficult to see the
markup's effect when visiting etc/HELLO in the usual way

If the usual way is via find-file and its ilk, then you should see the
same results as with "C-h h", so I'm not sure I understand what you
mean here.

I meant that one cannot see the markup's effect when visiting the file with either C-h h or find-file in the usual way. It's useless markup.

In what way most of what you say is not applicable to etc/enriched.txt
in general?

Other forms of enriched-text markup are typically easily visible. If I visit etc/enriched.txt I can easily see which parts are marked white on blue background, which parts are marked italic, etc. Invisible enriched-text markup is much harder to deal with when editing an enriched-text file.

the file is not a good showroom for how to maintain multilingual

What other facilities are you aware of or can suggest for showing
multilingual text with such level of detail and precision?

In practice the most common and often the best way to deal with the situation is to do what the non-markup part of etc/HELLO is already doing: indicate within the text itself what language or script is being used, to help the reader who may be unacquainted with them, and with enough punctuation within the text so that the reader can easily see what's going on. This technique has been used for centuries, it's by far the most popular technique in common practice today, and it suffices for this particular application (with the possible exception of its Chinese and Japanese text).

It's not a good sign that there seem to be errors in the
possibly-useful (i.e., CJ) markup that nobody has noticed since the
markup was introduced in May, and that I noticed these errors now
only because I was visiting the file literally.

Which errors?  I don't think we discovered any errors.

Yes, and that's the point! The approach we're taking is not good for dealing with the situation.

One example of such an error is that "日本語" has no charset properties even though it's obviously intended to use a Japanese script (since it follows the word "Japanese"). I'm sure there are others.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]