pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pre-lexer-2 1/4] command: Factor command name matching out of comma


From: Ben Pfaff
Subject: Re: [pre-lexer-2 1/4] command: Factor command name matching out of command.c.
Date: Sun, 17 Oct 2010 09:25:58 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

John Darrington <address@hidden> writes:

> On Sat, Oct 16, 2010 at 04:27:27PM -0700, Ben Pfaff wrote:
>      John Darrington <address@hidden> writes:
>      
>      > I've no doubt this patch is an improvement.
>      > However, I'm worried about how this is going to work with non-ascii 
> encodings.
>      > For example some recent syntax files that I've seen have UTF-8 "hard" 
> spaces 
>      > (0xc2 0x0a) instead of the normal ' '.
>      
>      Does SPSS actually treat a "hard space" as white space?  Looking
>      at the C, Java, and XML standards, none of them appear to treat
>      hard spaces as white space; it appears to be rejected as invalid.
>      
> I can only really answer that with a question: "What do you mean by `treat as 
> whitespace'"?
>
> Based upon syntax file examples that I have found on the web, it certainly
> appears to be true that a hard space is interpreted as a keyword seperator in 
> syntax.

OK.  I guess we can presume that SPSS accepts code points 0xa0
and 0x20 as equivalent in syntax then.

That's too bad--all of the C, Java, and XML white space
characters are code points below 0x80.  Since the other
characters that are valid parts of command names are also below
0x80, this code could have essentially ignored UTF-8 encoding if
SPSS did not treat 0xa0 as white space.  Oh well.

> However other questions remain.  For example some comands (eg:
> AUTORECODE /MISSING ) alter their behaviour when "blank" string
> values are encountered. I don't know exactly what it means for
> a string value to be "blank".  If anyone can do some
> experiments with spss and report the results it would be very
> much appreciated.

I've always had the assumption that 0x20 was the sole code point
accepted as "blank" in these situations.
-- 
Ben Pfaff 
http://benpfaff.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]