[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode support in io Forge package

From: Markus Mützel
Subject: Re: Unicode support in io Forge package
Date: Sat, 19 Oct 2019 09:40:48 +0200

Looking at the code again. I believe that the aim was to map the Latin-1 subset of UTF-16.
But there might be something wrong with the conversion function that is set up in oct2xls.m at around line 197, too.
Maybe I'll have time to think about it again in the next few days.


Diese Nachricht wurde von meinem Android Mobiltelefon mit GMX Mail gesendet.
Am 19.10.19, 08:16, "Markus Mützel" <address@hidden> schrieb:

Iirc, the interface uses UTF-16. The conversion function really only works for Latin-1 encoded input.
There really is no UTF-8 in this. TBH, I chose the name before I got a sufficient grasp of that encoding mess.

Forge packages usually target a wider range of Octave versions. I don't know whether this workaround can be safely removed without loosing support for Latin-1 in older Octave versions supported by io.
I didn't re-read the code. But believe that "unicode2native" is used if it is available.


PS: Sorry for top-posting. My mobile phone app doesn't allow otherwise.
Diese Nachricht wurde von meinem Android Mobiltelefon mit GMX Mail gesendet.
Am 19.10.19, 07:04, Andrew Janke <address@hidden> schrieb:
Hi, Octave and io maintainers,

I'm confused by the Unicode support in the io package. In particular,
the functions unicode2utf8 and utf82unicode, and the "encode_utf"
options in some of the ods/xls read/write functions.

What is the encoding that utf82unicode/unicode2utf8 are calling
"unicode" here? It looks like it's doing a single-byte encoding,
treating each byte as an unsigned int 0-255, and treating those 0-255
values directly as Unicode code point values. That's not any of the
standard Unicode encodings. (But I think it is exactly the same as
Latin-1/ISO 8859-1.)

As I understand it, since about Octave 4.4, Octave's internal encoding
(that is, how it interprets Octave char arrays) is either UTF-8 or an
opaque array of bytes; it's never in the "system code page" or some
other locale-specific encoding.

Is this UTF-8 support in io still relevant/correct? Maybe it should be
deprecated or renamed/removed? Since Octave now supports UTF-8, I think
you'd want to just leave UTF-8 text as is in all cases.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]