[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: var_is_valid_name
From: |
John Darrington |
Subject: |
Re: var_is_valid_name |
Date: |
Fri, 27 Mar 2009 10:12:26 +0900 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
On Thu, Mar 26, 2009 at 04:58:13PM -0700, Ben Pfaff wrote:
I think that getting rid of var_is_valid_name() would do more
than what we want. In particular, I think that a user of the GUI
would then be able to create a variable that could not be used in
syntax, which means that it could not be used in GUI procedures
that internally use syntax.
We already have that problem :(
For example, if you load up the file at
http://savannah.gnu.org/file/databas.sav?file_id=1906 you'll see that
it has variable names with non-ascii characters. Thus the gui
generates syntax which the lexer thinks it invalid. I think this is a
limitation of the lexer which needs to be addressed, but that's the
subject of another thread ....
Variable names that would be
troublesome include those that start with a digit (or consist
only of digits), or contain special characters such as spaces or
double quotes.
The problem with those criteria as you've described them is that they
depend upon the encoding. For example the byte which corresponds to a
space in ascii 0x20 might well be an ordinary character in some other
encoding. Similarly a byte which is a digit in one character set,
could be alphanumeric in another. This won't happen in any iso
encoding or in utf8, but I just don't know about the general case.
The real problem here is that we are disallowing an unreasonable
number of characters in identifiers, right? To fix that, we can
adjust lex_is_id1 and lex_is_id2. Perhaps all we need to do is
to add "|| c >= 128" to the test in lex_is_id1(). Although that
assumes that we are using a sane character encoding such as UTF-8
or ISO Latin-#, we seem to be moving internally toward UTF-8 for
everything anyhow.
In fact, we're only using utf8 internally in the GUI. Thinking about
recent problems reported by international users, has lead me to
conclude that we have to delay conversion to utf8 until quite a high
level, if we have any chance of being compatible with spss' data
files, and friendly to international users.
Anyway, so far as the current problem is concerned, var_is_valid_name
needs to be much less fussy. I'll try your proposed solution since I
think it may work, as you say, for "sane" character sets. If it
doesn't work, or if we find a system file with an insane encoding then
we'll have to look at this again.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature