[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: local chars displayed as numbers

From: Reiner Steib
Subject: Re: local chars displayed as numbers
Date: Sat, 23 Sep 2006 13:34:44 +0200
User-agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.0.50 (gnu/linux)

On Sat, Sep 23 2006, Jason Rumney wrote:

> Kenichi Handa wrote:
>> At least windows-1252 doesn't cover all eight-bit bytes.
>> There are a few invalid bytes: 0x81, 0x8c, 0x8e...
> 0x8c is "Latin capital ligature Oe", and 0x8e is "Latin capital letter Z with
> caron" according to Windows XP character map. 0x8d is missing, as is 0x90
> (nbsp in latin-1). I'm not sure if the latter is just filtered out from
> display in character map though (0x20 space is also not displayed).

NO-BREAK SPACE is A0 in both, Latin-1 and windows-1252 (all characters
present in Latin-1 are also in windows-1252 at the same position;
i.e. windows-1252 is a superset of Latin-1).

,----[ http://en.wikipedia.org/wiki/Windows-1252 ]
| According to the information on Microsoft's and the Unicode
| Consortium's websites positions 81, 8D, 8F, 90, and 9D are
| unused. However the Windows API call for converting from code pages
| to Unicode maps these to the corresponding C1 control codes. The
| euro character at position 80 was not present in earlier versions of
| this code page, nor were the S and Z with caron (háček)

While I don't know if these five positions (81, 8D, 8F, 90, and 9D)
are sufficient to distinguish raw-text from windows-1252, together
with Eli's suggestion (detect null bytes) it might give good results.

Bye, Reiner.
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]