Re: lynx-dev Tweaking HTML.c to insert characters (was: UTF-8 display qu

From: Sergei Pokrovsky
Subject: Re: lynx-dev Tweaking HTML.c to insert characters (was: UTF-8 display questions)
Date: 09 Jun 2000 17:01:03 +0700
User-agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.6

>>>>> Klaus Weide writes:

  Klaus> On 8 Jun 2000, Sergei Pokrovsky wrote:
  >> There's another question about lynx massaging.  I've changed the
  >> superscript rendition to my liking, but there are some other
  >> similar changes I'd like to make.  One is the DFN element.  I'd
  >> like to implement it with quotes in lynx.  The problem is that
  >> I'd like to have curly quotes if available.  Is it possible to
  >> specify Unicode quotes _and_ to be able to have a translation if
  >> they are missing in the current font/charset?  (For the time
  >> being I've put _underscore_ there to mark DFN.)

  Klaus> There is no precedent for this kind of thing in HTML.c.  The
  Klaus> text that HTML.c functions see is generally already supposed
  Klaus> to be in the final (display character set) representation, so
  Klaus> you'd have to convert to that.  I would try something like
  Klaus> the following, under'case HTML_DFN:' in HTML_start_element
  Klaus> (and soemthing equivalent in HTML_end_element):

Well, your piece works well if "display character set" is set to
ASCII; it ASCIIzes as expected.  But it fails in the trivial case, when
"display character set" is UTF-8; then the fallback branch is taken :)

  Klaus> UCTransUniCharStr will just UTF-8-encode the Unicode value if
  Klaus> current_char_set says UTF-8, or return the correct byte if
  Klaus> we're in one of the windows-* D.C.S. that has this quote
  Klaus> character, or else return the replacement string normally
  Klaus> from the def7_uni.tbl file.

I normally have "UTF-8" for the display character set in my cfg; but
it doesn't work.

  Klaus> This is completely untested; if you use it, let us know how
  Klaus> it goes.

That's my report.  BTW, the usual ASCIIzation of the esperantic
letters in Latin-3 (and Unicode) is by adding x for the hat or breve;
so that the usual test phrase which contains all the accented letters
of Esperanto,

Eĥoŝanĝo ĉiuĵaŭde       (i.e. "Echo change on every Thursday")


Ehxosxangxo cxiujxauxde

(the letter x is not a member of Esperanto alphabet, and besides, x is
at the end of the alphabet, so that the ASCIIzed words sort well).
Lynx simply drops the hats (Ehosango ciujaude), kaj "ŝanĝo" (change)
becomes "sango" (blood)  ;)


