[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: column numbers for non-ASCII characters in error messages

From: John Cowan
Subject: Re: column numbers for non-ASCII characters in error messages
Date: Sat, 18 Dec 2010 17:59:10 -0500
User-agent: Mutt/1.5.18 (2008-05-17)

Ben Pfaff scripsit:

>         * Byte offset from beginning of line.

Definitely no.

>         * Display width from beginning of line, with double-wide
>           characters counting as two positions and combining
>           characters (e.g. combining accents) counting as zero
>           positions.

This is only an approximation to true display width, but it's a pretty
good one.  The only thing I would add is to count conjoining initial jamo
<1100-115F> as double-width and the other conjoining jamo <1160-11FF>
as zero-width, thus making the resulting assembled hangul syllable
always double-width rather than varying between double- and triple-width.
The only downside is that in old Korean script a syllable sometimes has
more than one initial jamo, but I think that can be lived with.

>         * Grapheme clusters (user-visible characters) from
>           beginning of line, as specified in Unicode Standard
>           Annex #29 "Unicode Text Segmentation".

This is close to what I describe above, but doesn't distinguish between
single- and double-width characters, which I think is a mistake.

I marvel at the creature: so secret and         John Cowan
so sly as he is, to come sporting in the pool   address@hidden
before our very window.  Does he think that     http://www.ccil.org/~cowan
Men sleep without watch all night?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]