octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Handle encoding of Octave strings


From: mmuetzel
Subject: Re: Handle encoding of Octave strings
Date: Thu, 17 May 2018 03:52:42 -0700 (MST)

> What does Matlab do?  If your choice is different, I am sure that we
> will see bug reports about it.

In Matlab:
>>  str = 'aäbc'
str =
aäbc
>> str(1)
ans =
a
>> str(2)
ans =
ä
>> str(3)
ans =
b
>> str(4)
ans =
c
>> whos str
  Name      Size            Bytes  Class    Attributes
  str       1x4                 8  char               


So in Matlab one "char" has a size of 2 bytes. On the other hand, in Octave
one "char" has 1 byte.
Do we want to change the way Octave stores its char class? Initially I was
in favor of keeping the relation of 1 byte = 1 char (hence using UTF-8). But
it would make indexing more straight forward if we changed to UTF-16 (1
"char" = 2 bytes). At least when it comes to the BMP which encompasses
characters from most current scripts.

A first step towards this could be to add "from_u8", "to_u8", ("from_u16",
"to_u16") methods to our char class. 
Than we would need to identify all places in the code where we construct
char arrays from external sources (.m files, terminal, reading from files,
...) and where we pass strings to external sources (library functions,
writing to files, ...).
When this is done we might be able to switch the internal representation
from C-"char" to "uint16_t" without breaking everything...

Do you think that this is feasible?

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]