octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 in


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input
Date: Sat, 8 Apr 2023 12:34:21 -0400 (EDT)

Follow-up Comment #31, bug #57107 (project octave):


> In MATLAB and older version of octave, I could use regexp to test if the
object could be a possible valid UBJSON buffer by something like
>
> regexp(char(['U' 255]), '^\s*[\[\{SCHiUIulmLMhdDTFZN]')
>
> but now this raise an error in Octave 6+ but not elsewhere.

Matlab uses UTF-16 for their char arrays. So, `char(255)` is a valid character
for them. Octave uses UTF-8. `char(255)` is not valid as a part of any UTF-8
sequence.
IIUC, you are inspecting a byte sequence that happens to start with something
that can be interpreted as ASCII characters. But then changes later on to a
"random" byte sequence.
Maybe, you could find the first byte that can't be interpreted as ASCII first
with something like `find(~isascii(char(['U' 255])), 1, 'first')`, and then
feed only the part up to that to `regexp`.
That might also help speed up the regexp execution a little bit since less
data needs to be send forth and back...


In `encodevarname`, you are checking for `exist('unicode2native','builtin')`.
But that function is implemented as a .m file in Octave. E.g., for me with
Octave 8.1.0 on Windows:

>> which unicode2native
'unicode2native' is a function from the file C:\Program Files\GNU
Octave\Octave-8.1.0\mingw64\share\octave\8.1.0\m\s
trings\unicode2native.m
>> exist('unicode2native','builtin')
ans = 0
>> exist('unicode2native','file')
ans = 2


Would it help to adapt that check?


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]