[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV U201e and U201c

From: Klaus Weide
Subject: Re: LYNX-DEV U201e and U201c
Date: Tue, 18 Mar 1997 15:57:15 -0600 (CST)

On Tue, 18 Mar 1997, Christopher R. Maden wrote:
> [Hynek Med]
> > On Mon, 17 Mar 1997, Klaus Weide wrote:
> > > For example, in German the low double quotation mark is used at
> > > the beginning of quotes.  But a German reader will recognize [,,]
> > > on a text screen as just two commas, whereas ["] is immedately
> > > recognized as a quotation mark (not in the same position or
> > > appearance as in print, but that comes with the medium "text
> > > screen" and is not unexpected.)  The same goes, to a lesser
> > > extent, for the other double-character replacements, and for the
> > > single comma.
> > > 
> > > And I think it is easier to read about "Fierlingerismus" than it
> > > is to read about ,,Fierlingerismus``.  Agreed?
> I don't really have a preference for `` or '' vs. ".  I thought that
> an effort should be made to represent bent quotes as such, but it's
> really something best decided by people who actually use such things.
> (Does the chartrans stuff have a mapping from Windows CP1250 "smart"
> quotes into the Unicode points?  That would be cool.)

That is how things already are done.  Otherwise Hynek wouldn't have
seen "U201e" and "U201c" on the screen where the text "as transmitted" was
using Microsoft codepoints.  The basic chartrans mechanism goes like this:

                    T1              T2
           charset  --->  Unicode   --->  display (C)haracter set

T1 and T2 are (conceptually) just two tables, which both can change with
the incoming and display character sets.

For comparison this is the "old" method:

                    T'1             T'2
           charset  --->  entities  --->  display (C)haracter set
             |                                ^
             |                                |
             +-------("raw" shortcut)---------+

with only one fixed table T'1 (for iso-8859-1) and several tables T'2.

Difficulties come not from the basic new mechanism (it is only used for
8-bit charsets, not the CJK ones), but from deciding where to to the
translations, from having both the old and new method (I was not bold
enough to ditch the old one), dealing with cases where characters
are invalid or cannot be translated etc.


; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]