octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #49348] Treat multi-byte characters as one cha


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #49348] Treat multi-byte characters as one character for char array
Date: Sat, 15 Oct 2016 07:31:01 +0000 (UTC)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0

URL:
  <http://savannah.gnu.org/bugs/?49348>

                 Summary: Treat multi-byte characters as one character for
char array
                 Project: GNU Octave
            Submitted by: mmuetzel
            Submitted on: Sat 15 Oct 2016 07:30:58 AM GMT
                Category: Interpreter
                Severity: 3 - Normal
                Priority: 5 - Normal
              Item Group: Matlab Compatibility
                  Status: None
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: dev
        Operating System: Any

    _______________________________________________________

Details:

In Octave, one char seems to be one byte long always. This leads to the
following issues when reversing or indexing into a character array with
multi-byte characters:

t='abcäöü'
t(end:-1:1)
size(t)
t(4:6)


In Octave:

>> t='abc\303\244\303\266\303\274'
t = abcäöü
>> t(end:-1:1)
ans = �öä�cba
>> size(t)
ans =
   1   9
>> t(4:6)
ans = ä�


In Matlab:

>> t='abcäöü'
t =
abcäöü
>> t(end:-1:1)
ans =
üöäcba
>> size(t)
ans =
     1     6
>> t(4:6)
ans =
äöü



I also noticed that char() doesn't support values >255 and messes up Unicode
characters >127:
In Octave:

>> char (269)
warning: range error for conversion to character value
ans =
>> char(228)
ans = �


In Matlab:

>> char(269)
ans =
č
>> char(228)
ans =
ä


Is this a design choice in Octave? Would it be possible that Octave's char
class treated Unicode characters as such no matter how many bytes they use in
any encoding (UTF-8 seems to be a good choice)?




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?49348>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]