[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [changeset] Asian Characters and strchr()
From: |
Jaroslav Hajek |
Subject: |
Re: [changeset] Asian Characters and strchr() |
Date: |
Wed, 11 Mar 2009 09:03:13 +0100 |
On Wed, Mar 11, 2009 at 8:53 AM, Ben Abbott <address@hidden> wrote:
>
> On Mar 11, 2009, at 3:33 PM, Jaroslav Hajek wrote:
>
>> On Wed, Mar 11, 2009 at 8:24 AM, Ben Abbott <address@hidden> wrote:
>>>
>>> I noticed that fileparts give an error when the full-file contains asian
>>> characters.
>>>
>>> ctave:209> fileparts ("System/Library/Fonts/junk.ttf")
>>> error: subscript indices must be either positive integers or logicals.
>>> error: called from:
>>> error:
>>>
>>> /Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/strings/strchr.m
>>> at line 40, column 19
>>> error:
>>>
>>> /Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/miscellaneous/fileparts.m
>>> at line 30, column 10
>>>
>>> It appears that there is a simple fix for strchr, but it will depend upon
>>> the ascii equivalent for Asian fonts.
>>>
>>> I'm seeing negative values.
>>>
>>> fullfile = "System/Library/Fonts/junk.ttf";
>>> octave:211> double(fullfile)
>>> ans =
>>>
>>> Columns 1 through 16:
>>>
>>> 83 121 115 116 101 109 47 76 105 98 114 97
>>> 114 121 47 70
>>>
>>> Columns 17 through 32:
>>>
>>> 111 110 116 115 47 -27 -115 -114 -26 -106 -121 -25
>>> -69 -122 -23 -69
>>>
>>> Columns 33 through 37:
>>>
>>> -111 46 116 116 102
>>>
>>> Can anyone tell me what the permissible range for integer values of Asian
>>> characters is?
>>
>> I think a char->double conversion is supposed to yield nonnegative
>> values, so this seems buggy.
>>
>>> I'm planning to patch strchr, any reason I shouldn't do that?
>>>
>>
>> I don't think there's a bug in strchr. This is clearly caused by the
>> negative values.
>
> For Asian fonts the values are 16bit ... unsigned or signed I don't know.
No, they're not. See your own example. Octave has no support for UTF8
strings, so unless "char" is more than 8 bits, the result will be an
8-bit number. Thus, "strchr" won't search for Japanese characters (but
this does not mind here, since you need to find just an ascii
character).
Currently, the sign of char -> double is left up to the compiler,
which I don't think is good. I think we should guarantee that to be
positive, same what Matlab does. Shall I make a patch, or do you wish
to do it?
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz