Re: Wide and UTF-8 international characters

bug-ncurses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wide and UTF-8 international characters

From:	Thomas Dickey
Subject:	Re: Wide and UTF-8 international characters
Date:	Sat, 17 May 2003 19:10:23 -0400
User-agent:	Mutt/1.2.5i

On Sat, May 17, 2003 at 04:25:21PM -0600, D. Stimits wrote:
> So it sounds like the 8th bit is no longer used as a flag...is that 
> correct? But also that 1 or more bytes are then added with each 
> character cell to provide attribute data...is that correct?

yes.
 
> I assume that the actual character then is always converted to a wide 
> character, even if it is just common text not requiring a wide character 
> (because it is easier to deal with uniform wide characters than 
> varying-width multibyte representations with escape sequences to mark 
> character set changes). How many bytes does the current ncurses use to 
> store non-attribute character data? I would guess two 8-bit bytes 
> internally per cell.

for wide-characters, more than that: it has to allow for combining characters
(more than one ;-).  The attributes are stored separately:

#define CCHARW_MAX      5
typedef struct
{
    attr_t      attr;
    wchar_t     chars[CCHARW_MAX];
}
cchar_t;

> > that was up til mid-2001 - I didn't quite know where to begin at 
> > rewriting,
> > but one of the contributors got it moving.  ncurses 5.3 was good enough to
> > use - the current code probably has isolated bugs, but I don't see any
> > that are related to wide-characters.  Not all functions are tested - so
> > I've been reviewing, adding test-programs for places that are noticeably
> > not covered.
> 
> Currently on Linux, I could display a copyright symbol ('c' inside of a 
> circle) by outputting 169 decimal cast as character (8 bits) to the 
> terminal. I'm looking at the man page for echochar, and it appears that 
> ncurses came up with its own version of something similar to html/xml 
> character entities, but the ncurses version is not as complete as 
> html/xml entities. If I were to use a printw function with a %c format, 
> feeding it 169 decimal (or anything from 128 through 255), will ncurses 
> ever represent the output appearance differently than had I fed that 
> decimal number (cast as 8 bit character) directly to a standard linux 
> console or xterm?

yes/no: the actual value written to the terminal depends on the locale.

169 is the Latin-1 (ISO-8859-1) code for copyright.  If your locale is one of
the ones that uses 8-bit characters, there's no real difference.  If it's one
that uses UTF-8, the ISO-8859-1 values are represented internally the same, but
written differently depending on the locale.  UTF-8 uses the range from 128-255
differently. 

-- 
Thomas E. Dickey <address@hidden>
http://invisible-island.net
ftp://invisible-island.net

[Prev in Thread]

Current Thread

[Next in Thread]

Wide and UTF-8 international characters, John Smith, 2003/05/09
- Re: Wide and UTF-8 international characters, Thomas Dickey, 2003/05/09
  - Re: Wide and UTF-8 international characters, D. Stimits, 2003/05/16
    - Re: Wide and UTF-8 international characters, Thomas Dickey, 2003/05/16
    - Re: Wide and UTF-8 international characters, D. Stimits, 2003/05/17
    - Re: Wide and UTF-8 international characters, Thomas Dickey <=

Prev by Date: Re: Wide and UTF-8 international characters
Next by Date: ncurses-5.3-20030517.patch.gz
Previous by thread: Re: Wide and UTF-8 international characters
Next by thread: Trivial man page change
Index(es):
- Date
- Thread