|
From: | Markus Mützel |
Subject: | [Octave-bug-tracker] [bug #49348] Treat multi-byte characters as one character for char array |
Date: | Sat, 15 Oct 2016 07:31:01 +0000 (UTC) |
User-agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0 |
URL: <http://savannah.gnu.org/bugs/?49348> Summary: Treat multi-byte characters as one character for char array Project: GNU Octave Submitted by: mmuetzel Submitted on: Sat 15 Oct 2016 07:30:58 AM GMT Category: Interpreter Severity: 3 - Normal Priority: 5 - Normal Item Group: Matlab Compatibility Status: None Assigned to: None Originator Name: Originator Email: Open/Closed: Open Discussion Lock: Any Release: dev Operating System: Any _______________________________________________________ Details: In Octave, one char seems to be one byte long always. This leads to the following issues when reversing or indexing into a character array with multi-byte characters: t='abcäöü' t(end:-1:1) size(t) t(4:6) In Octave: >> t='abc\303\244\303\266\303\274' t = abcäöü >> t(end:-1:1) ans = �öä�cba >> size(t) ans = 1 9 >> t(4:6) ans = ä� In Matlab: >> t='abcäöü' t = abcäöü >> t(end:-1:1) ans = üöäcba >> size(t) ans = 1 6 >> t(4:6) ans = äöü I also noticed that char() doesn't support values >255 and messes up Unicode characters >127: In Octave: >> char (269) warning: range error for conversion to character value ans = >> char(228) ans = � In Matlab: >> char(269) ans = č >> char(228) ans = ä Is this a design choice in Octave? Would it be possible that Octave's char class treated Unicode characters as such no matter how many bytes they use in any encoding (UTF-8 seems to be a good choice)? _______________________________________________________ Reply to this item at: <http://savannah.gnu.org/bugs/?49348> _______________________________________________ Message sent via/by Savannah http://savannah.gnu.org/
[Prev in Thread] | Current Thread | [Next in Thread] |