[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Txr-users] First cut at Unicode support.
From: |
Kaz Kylheku |
Subject: |
[Txr-users] First cut at Unicode support. |
Date: |
Wed, 11 Nov 2009 09:30:13 -0800 |
Hi all,
I've committed to GIT the first round of changes to make txr handle
international text.
Text is internally represented using wide characters.
The lex scanner for the language recognizes the UTF-8 encoding;
it can decode all characters [0, 0x10FFFF].
However, the regex engine is not yet converted to handle wide characters.
If you do a class match against a wide character, there will likely be
an out-of-bounds memory access, oops!
Moreover, there is a great deal of reliance on wide character I/O (including
conversion to and from an encoding) from the C library. All I/O needs
to be converted to the internal streams library, which will do its own
conversion to and from UTF-8.
There is a dependency on the wchar_t type, which can't hold all Unicode
characters on all platforms (some compilers have a 16 bit wchar_t).
There will have to be some #ifdefs to do something sane if input
is encountered which contains characters above the range of
wchar_t.
Cheers ...
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Txr-users] First cut at Unicode support.,
Kaz Kylheku <=