Re: [changeset] Asian Characters and strchr()

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [changeset] Asian Characters and strchr()

From:	Jaroslav Hajek
Subject:	Re: [changeset] Asian Characters and strchr()
Date:	Wed, 11 Mar 2009 10:30:25 +0100

On Wed, Mar 11, 2009 at 9:14 AM, Ben Abbott <address@hidden> wrote:
>
> On Mar 11, 2009, at 4:03 PM, Jaroslav Hajek wrote:
>
>> On Wed, Mar 11, 2009 at 8:53 AM, Ben Abbott <address@hidden> wrote:
>>>
>>> On Mar 11, 2009, at 3:33 PM, Jaroslav Hajek wrote:
>>>
>>>> On Wed, Mar 11, 2009 at 8:24 AM, Ben Abbott <address@hidden> wrote:
>>>>>
>>>>> I noticed that fileparts give an error when the full-file contains
>>>>> asian
>>>>> characters.
>>>>>
>>>>> ctave:209> fileparts ("System/Library/Fonts/junk.ttf")
>>>>> error: subscript indices must be either positive integers or logicals.
>>>>> error: called from:
>>>>> error:
>>>>>
>>>>>
>>>>> /Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/strings/strchr.m
>>>>> at line 40, column 19
>>>>> error:
>>>>>
>>>>>
>>>>> /Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/miscellaneous/fileparts.m
>>>>> at line 30, column 10
>>>>>
>>>>> It appears that there is a simple fix for strchr, but it will depend
>>>>> upon
>>>>> the ascii equivalent for Asian fonts.
>>>>>
>>>>> I'm seeing negative values.
>>>>>
>>>>> fullfile = "System/Library/Fonts/junk.ttf";
>>>>> octave:211> double(fullfile)
>>>>> ans =
>>>>>
>>>>> Columns 1 through 16:
>>>>>
>>>>> 83   121   115   116   101   109    47    76   105    98   114    97
>>>>> 114   121    47    70
>>>>>
>>>>> Columns 17 through 32:
>>>>>
>>>>> 111   110   116   115    47   -27  -115  -114   -26  -106  -121   -25
>>>>> -69  -122   -23   -69
>>>>>
>>>>> Columns 33 through 37:
>>>>>
>>>>> -111    46   116   116   102
>>>>>
>>>>> Can anyone tell me what the permissible range for integer values of
>>>>> Asian
>>>>> characters is?
>>>>
>>>> I think a char->double conversion is supposed to yield nonnegative
>>>> values, so this seems buggy.
>>>>
>>>>> I'm planning to patch strchr, any reason I shouldn't do that?
>>>>>
>>>>
>>>> I don't think there's a bug in strchr. This is clearly caused by the
>>>> negative values.
>>>
>>> For Asian fonts the values are 16bit ... unsigned or signed I don't know.
>>
>> No, they're not. See your own example. Octave has no support for UTF8
>> strings, so unless "char" is more than 8 bits, the result will be an
>> 8-bit number. Thus, "strchr" won't search for Japanese characters (but
>> this does not mind here, since you need to find just an ascii
>> character).
>> Currently, the sign of char -> double is left up to the compiler,
>> which I don't think is good. I think we should guarantee that to be
>> positive, same what Matlab does. Shall I make a patch, or do you wish
>> to do it?
>
> ok, I'm hadn't considered how many bits Octave was using for characters.
>
> In any even, please to make a patch (I'm not competent enough in c++ to do
> it myself).
>
> In the meantime, I'll avoid using fileparts and strchr when there may be
> Asian characters present.
>
> Ben
>

Fix is uploaded.

regards

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

[Prev in Thread]

Current Thread

[Next in Thread]

Asian Characters and strchr(), Ben Abbott, 2009/03/11
- Re: Asian Characters and strchr(), Jaroslav Hajek, 2009/03/11
  - Re: [changeset] Asian Characters and strchr(), Ben Abbott, 2009/03/11
    - Re: [changeset] Asian Characters and strchr(), Jaroslav Hajek, 2009/03/11
    - Re: [changeset] Asian Characters and strchr(), Ben Abbott, 2009/03/11
    - Re: [changeset] Asian Characters and strchr(), Jaroslav Hajek <=
    - Re: [changeset] Asian Characters and strchr(), Ben Abbott, 2009/03/11

Prev by Date: Re: about contibuting to octave
Next by Date: Re: [PATCH 4 of 4] Implement diag + sparse, diag - sparse, sparse + diag, sparse - diag
Previous by thread: Re: [changeset] Asian Characters and strchr()
Next by thread: Re: [changeset] Asian Characters and strchr()
Index(es):
- Date
- Thread