[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [changeset] Asian Characters and strchr()
From: |
Jaroslav Hajek |
Subject: |
Re: [changeset] Asian Characters and strchr() |
Date: |
Wed, 11 Mar 2009 10:30:25 +0100 |
On Wed, Mar 11, 2009 at 9:14 AM, Ben Abbott <address@hidden> wrote:
>
> On Mar 11, 2009, at 4:03 PM, Jaroslav Hajek wrote:
>
>> On Wed, Mar 11, 2009 at 8:53 AM, Ben Abbott <address@hidden> wrote:
>>>
>>> On Mar 11, 2009, at 3:33 PM, Jaroslav Hajek wrote:
>>>
>>>> On Wed, Mar 11, 2009 at 8:24 AM, Ben Abbott <address@hidden> wrote:
>>>>>
>>>>> I noticed that fileparts give an error when the full-file contains
>>>>> asian
>>>>> characters.
>>>>>
>>>>> ctave:209> fileparts ("System/Library/Fonts/junk.ttf")
>>>>> error: subscript indices must be either positive integers or logicals.
>>>>> error: called from:
>>>>> error:
>>>>>
>>>>>
>>>>> /Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/strings/strchr.m
>>>>> at line 40, column 19
>>>>> error:
>>>>>
>>>>>
>>>>> /Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/miscellaneous/fileparts.m
>>>>> at line 30, column 10
>>>>>
>>>>> It appears that there is a simple fix for strchr, but it will depend
>>>>> upon
>>>>> the ascii equivalent for Asian fonts.
>>>>>
>>>>> I'm seeing negative values.
>>>>>
>>>>> fullfile = "System/Library/Fonts/junk.ttf";
>>>>> octave:211> double(fullfile)
>>>>> ans =
>>>>>
>>>>> Columns 1 through 16:
>>>>>
>>>>> 83 121 115 116 101 109 47 76 105 98 114 97
>>>>> 114 121 47 70
>>>>>
>>>>> Columns 17 through 32:
>>>>>
>>>>> 111 110 116 115 47 -27 -115 -114 -26 -106 -121 -25
>>>>> -69 -122 -23 -69
>>>>>
>>>>> Columns 33 through 37:
>>>>>
>>>>> -111 46 116 116 102
>>>>>
>>>>> Can anyone tell me what the permissible range for integer values of
>>>>> Asian
>>>>> characters is?
>>>>
>>>> I think a char->double conversion is supposed to yield nonnegative
>>>> values, so this seems buggy.
>>>>
>>>>> I'm planning to patch strchr, any reason I shouldn't do that?
>>>>>
>>>>
>>>> I don't think there's a bug in strchr. This is clearly caused by the
>>>> negative values.
>>>
>>> For Asian fonts the values are 16bit ... unsigned or signed I don't know.
>>
>> No, they're not. See your own example. Octave has no support for UTF8
>> strings, so unless "char" is more than 8 bits, the result will be an
>> 8-bit number. Thus, "strchr" won't search for Japanese characters (but
>> this does not mind here, since you need to find just an ascii
>> character).
>> Currently, the sign of char -> double is left up to the compiler,
>> which I don't think is good. I think we should guarantee that to be
>> positive, same what Matlab does. Shall I make a patch, or do you wish
>> to do it?
>
> ok, I'm hadn't considered how many bits Octave was using for characters.
>
> In any even, please to make a patch (I'm not competent enough in c++ to do
> it myself).
>
> In the meantime, I'll avoid using fileparts and strchr when there may be
> Asian characters present.
>
> Ben
>
Fix is uploaded.
regards
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz