Re: [changeset] Asian Characters and strchr()

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [changeset] Asian Characters and strchr()

From:	Jaroslav Hajek
Subject:	Re: [changeset] Asian Characters and strchr()
Date:	Wed, 11 Mar 2009 09:03:13 +0100

On Wed, Mar 11, 2009 at 8:53 AM, Ben Abbott <address@hidden> wrote:
>
> On Mar 11, 2009, at 3:33 PM, Jaroslav Hajek wrote:
>
>> On Wed, Mar 11, 2009 at 8:24 AM, Ben Abbott <address@hidden> wrote:
>>>
>>> I noticed that fileparts give an error when the full-file contains asian
>>> characters.
>>>
>>> ctave:209> fileparts ("System/Library/Fonts/junk.ttf")
>>> error: subscript indices must be either positive integers or logicals.
>>> error: called from:
>>> error:
>>>
>>> /Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/strings/strchr.m
>>> at line 40, column 19
>>> error:
>>>
>>> /Users/bpabbott/Development/mercurial/octave-3.1.53/scripts/miscellaneous/fileparts.m
>>> at line 30, column 10
>>>
>>> It appears that there is a simple fix for strchr, but it will depend upon
>>> the ascii equivalent for Asian fonts.
>>>
>>> I'm seeing negative values.
>>>
>>> fullfile = "System/Library/Fonts/junk.ttf";
>>> octave:211> double(fullfile)
>>> ans =
>>>
>>> Columns 1 through 16:
>>>
>>>  83   121   115   116   101   109    47    76   105    98   114    97
>>> 114   121    47    70
>>>
>>> Columns 17 through 32:
>>>
>>>  111   110   116   115    47   -27  -115  -114   -26  -106  -121   -25
>>> -69  -122   -23   -69
>>>
>>> Columns 33 through 37:
>>>
>>> -111    46   116   116   102
>>>
>>> Can anyone tell me what the permissible range for integer values of Asian
>>> characters is?
>>
>> I think a char->double conversion is supposed to yield nonnegative
>> values, so this seems buggy.
>>
>>> I'm planning to patch strchr, any reason I shouldn't do that?
>>>
>>
>> I don't think there's a bug in strchr. This is clearly caused by the
>> negative values.
>
> For Asian fonts the values are 16bit ... unsigned or signed I don't know.

No, they're not. See your own example. Octave has no support for UTF8
strings, so unless "char" is more than 8 bits, the result will be an
8-bit number. Thus, "strchr" won't search for Japanese characters (but
this does not mind here, since you need to find just an ascii
character).
Currently, the sign of char -> double is left up to the compiler,
which I don't think is good. I think we should guarantee that to be
positive, same what Matlab does. Shall I make a patch, or do you wish
to do it?


-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

[Prev in Thread]

Current Thread

[Next in Thread]

Asian Characters and strchr(), Ben Abbott, 2009/03/11
- Re: Asian Characters and strchr(), Jaroslav Hajek, 2009/03/11
  - Re: [changeset] Asian Characters and strchr(), Ben Abbott, 2009/03/11
    - Re: [changeset] Asian Characters and strchr(), Jaroslav Hajek <=
    - Re: [changeset] Asian Characters and strchr(), Ben Abbott, 2009/03/11
    - Re: [changeset] Asian Characters and strchr(), Jaroslav Hajek, 2009/03/11
    - Re: [changeset] Asian Characters and strchr(), Ben Abbott, 2009/03/11

Prev by Date: Re: [changeset] Asian Characters and strchr()
Next by Date: Re: [changeset] Asian Characters and strchr()
Previous by thread: Re: [changeset] Asian Characters and strchr()
Next by thread: Re: [changeset] Asian Characters and strchr()
Index(es):
- Date
- Thread