lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Tweaking HTML.c to insert characters (was: UTF-8 display qu


From: Klaus Weide
Subject: Re: lynx-dev Tweaking HTML.c to insert characters (was: UTF-8 display questions)
Date: Fri, 9 Jun 2000 11:31:59 -0500 (CDT)

[ For other readers, or if you read this message in the lynx-dev
archives: this message is in charset iso-8859-3.  You may have
to set the "assumed character set" to that, and maybe change the
Raw/CJK setting to OFF, in order to make sense of it, if reading
with lynx.  It'll make even more sense with DIsplay character set
set to "RFC 1345 Mnemonic" if you can't display the character 'ĉ',
that is a LATIN SMALL LETTER S WITH CIRCUMFLEX, directly... ]
[ Well actually, reading this in the archives adds another level
of complication.  It seems that the archiving software recognizes
iso-8859-3 and converts characters to HTML character entities, but
sometimes gets it wrong. There is an invalid "&ccirce" mysteriously
appearing in
<http://www.flora.org/lynx-dev/html/month062000/msg00152.html>. ]

On 9 Jun 2000, Sergei Pokrovsky wrote:

> Well, your piece works well if "display character set" is set to
> ASCII; it ASCIIzes as expected.  But it fails in the trivial case, when
> "display character set" is UTF-8; then the fallback branch is taken :)

> I normally have "UTF-8" for the display character set in my cfg; but
> it doesn't work.

I hope you have seen my followup message on that by now, with the
correction; and I hope that *that* will work as expected. :)
Sorry for the non-working code.

>   Klaus> character, or else return the replacement string normally
>   Klaus> from the def7_uni.tbl file.
> 
>   Klaus> it goes.
> 
> That's my report.  BTW, the usual ASCIIzation of the esperantic
> letters in Latin-3 (and Unicode) is by adding x for the hat or breve;

Is this the only scheme in use?  I remember vaguely that I read
about some alternatives, something using 'h'.

> so that the usual test phrase which contains all the accented letters
> of Esperanto,
> 
> Eĥoŝanĝo ĉiuĵaŭde       (i.e. "Echo change on every Thursday")
> 
> becomes
> 
> Ehxosxangxo cxiujxauxde
> 
> (the letter x is not a member of Esperanto alphabet, and besides, x is
> at the end of the alphabet, so that the ASCIIzed words sort well).
> Lynx simply drops the hats (Ehosango ciujaude), kaj "ŝanĝo" (change)
> becomes "sango" (blood)  ;)

You can change these in src/chrtrans/def7_uni.tbl.  For example, for the
LATIN SMALL LETTER U WITH BREVE, take U+0169 out of the line which reads

   0x75  U+0169  U+016b  U+016d  U+016f  U+0173  # u

and add another line somewhere

   U+016d:ux


Much of the existing def7_uni.tbl is from me, and it wasn't meant
to be the definite transliteration.  A lot of it is ad hoc and
can be improved; it's just that not many people have shown interest.
If you make these changes, and think they are of general use, please
send patches.

There is a potential problem, in that those strings are not language-
or locale-specific.  So ŝ -> sx may be right for Esperanto, but not
for some other language that also uses that character.  (Maybe ŝ -> sh
or whatever is the alternative is better?)

   Klaus




; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]