octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #64139] character encoding scheme with filerea


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #64139] character encoding scheme with fileread
Date: Wed, 3 May 2023 12:25:20 -0400 (EDT)

Update of bug #64139 (project octave):

        Operating System:               GNU/Linux => Any                    

    _______________________________________________________

Follow-up Comment #1:

The encoding for character arrays in Octave is UTF-8. For Matlab character
arrays, it is UTF-16. It cannot be changed in either.

However, you can manually convert to UTF-16 if that is important. But be aware
that the result won't work as a character vector in Octave, it is just a
vector of integers:

>> u16char = unicode2native(fileread('example.txt'), ['utf-16', nthargout(3,
'computer'), 'e']);
>> double(typecast(u16char(1:floor(numel(u16char)/2)*2), 'uint16'))
ans =

   65279     111      99     116      97     118     101


There seems to be a bug in Octave that leads to an additional zero-byte being
returned when converting to UTF-16. That's the reason for the indexing in the
last line. That shouldn't be needed.

IIUC, the 'encoding' argument specifies the *input* encoding for the file to
be read. That syntax is currently not supported in Octave.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64139>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]