lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation (was tech. question: translating strings)


From: Vlad Harchev
Subject: Re: lynx-dev hyphenation (was tech. question: translating strings)
Date: Tue, 7 Sep 1999 14:58:49 +0500 (SAMST)

On Mon, 6 Sep 1999, Klaus Weide wrote:

> >[...] 
> > - is this correct?). Pattern matching is implemented as finite-state machine
> > in libhnj (the transitions are calculated when reading hydict). Apparently, 
> > if
> > two languages use different keycodes, it's possible to concatenate hydicts 
> > to
>                               ^^^^^^^^
> > get the hyrules that will hyphenate two languages at the same time - 
> 
> Why are you talking about keycodes?  That doesn't seem to make much sense,
> what kind of "keycodes" do you mean?  X11 keycodes?  Is the program bound
> to *that*?  (I hope not.) 

 Of course I incorrectly use "keycodes" instead of "character codes" (but you
could guess that).
 
> That sentence would make more sense to me if you replaced "keycode" with
> "character", with the understanding that "character" is meant in the
> ISO10646/Unicode sense.

 I don't know what ISO10646 means.
 
> >                                                                       so I
> > afraid, english phrases like StarDivision will be hyphenated incorrectly if
> > hydict for French is loaded since AFAIK French and English use latin-1
> > encoding (at least the keycodes of both lanugages are not disjoint).
> 
> Well of course phrases in one language wil be hyphenatted incorrectly if
> patterns for a different language are used.  That should be no surprise!
> (The only way around wrong hyphenation of proper names etc. from another
> language that I can think of would be to have specific exceptions, in your
> case in the French patterns.)

 Yes.
 
> The fact that the dictionary allows you to combine patterns for two (or
> more) languages IF AND ONLY IF their letter repertoires are completely
> non-overlapping should be viewed as a hack that can be useful in some
> situations, nothing more.  You can use one set of (combined) patterns for
> Russian+English, but it won't work for Russian+German or Russian+Ukrainian
> (assuming the two languages need different patterns), and you cannot use
> the same combined set for Ukrainian+English (under the same assumption) or

 I don't understand - why "the same" (if to be understood directly - then
there is a trivial answer). Seems you want to underline that "language
information" must be associated with hyrules (otherwise I don't understand
what is your conclusion). If you mean that "language information" - I mean
"what language the hyrules are applicable to" - must be provided with hyrules
- then there of course will be comment in the hyrules telling what language
it's applicable for - and it's a user's responsibility to note and understand
and remember that and not to select german hyrules for english texts. But I
assume that the user is not brain-damaged from here. And I don't want to make
foolproof protection for hyrules stuff (at least that can't be achieved using
unicode in any of it's incarnations, at least for German and English).

> for German+English or for Russian+German.
> 
> In fact, you cannot express that last one at all, not to mention

 Do you really mean "last one"  - ie R+G, not te G+E?
 I don't know what letter codes does G use (IMO only few - not more than 5 of
non-latin characters).

> combinations like Russian+Greek, unless you go to transform and apply the
> patterns in some representation of UCS (since no 8-bit charset I know has
> all the necessary letters combined).  That combining effect is only useful
> for X+English (and one or two other languages of all human languages in
> place of English), since only for English (and those other one or two) is
> the necessary character repertoire completely part of the common 8-bit
> charsets.  (Well not even that; if the hyphenation patterns remain bound
 
 I assume that hyphenation rules for, say, russian doesn't contain any latin
letter. So I don't understand what "X+English" you are talking about.

> to a specific charset, you cannot put patterns for the correct spelling
> of such useful words as naОve or blasИ or brassiХre in a combined Russian+
 
 It's obvious that E+G hyphenation rules even if written in unicode won't work
together correctly since there is no "captial english letter f" and 
"capital german letter f" . 

 I see these words first time - may be they are not so useful :)? WHat do they
mean?
 Not placing these two words won't hurt too much the hyphenation of the entire 
E+R document.

> English dictionary.)
>[...] 
 And IMO you didn't see TeX (and libhnj) hyphenation rules for russian - they
don't contain any latin characters (in spite of all latin characters are
present in any Cyr. encoding). The same for any other languages - the hyrules
are build by scanning a lot of text in that language.

 Please make conclusion for all your text starting after my "Yes". I can't
deduce your conclusion (or it's just "use unicode!").

 Best regards,
  -Vlad


reply via email to

[Prev in Thread] Current Thread [Next in Thread]