[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lexingx rules for the single quote character

From: Lipping Joonas
Subject: Lexingx rules for the single quote character
Date: Tue, 4 Mar 2014 08:56:10 +0000

I've been investigating the second issue listed at 
http://wiki.octave.org/Projects#Interpreter , which states that "if (expr) 
'this is a string' end" should be tokenized as  IF expr STRING END. Currently, 
the first single quote character is being translated to the token HERMITIAN, 
that is, the "if" condition is not expr itself but its hermitian. After that, 
"this", "is", "a" and "string" are understood as variable names, and the single 
quote after that is taken to begin a singly quoted string. The dual nature of 
the single quote leads to some ambiguities, like:

if (A - B) ' != C D' end

Are we comparing the hermitian of A - B to C and potentially setting the answer 
to equal the hermitian of D, or are we checking the truth value of A - B and 
returning a string? You don't run into this in Matlab so much, because it does 
not allow spaces preceding the hermitian postfix: if there is a space before 
the single quote, it is either a string terminator or bad syntax. But in 
Octave, the hermitian postfix tolerates whitespace. The line of code above also 
illustrates that it is problematic to try to solve this by tokenizing "if 
(expr) ' " (with a space) to IF expr SQ_STRING_START and "if (expr)' " (no 
space) to IF expr HERMITIAN, as that would impose non-obvious rules on what the 
"if" expression is allowed to look like. For instance,

if ((A + B) ' != B) C end

should intuitively be the same when the "readability" parens around the if 
expression are removed,

if (A + B) ' != B C end

yet the new rule would make the latter statement erroneous. Some advantage 
might be gained by looking forward and checking whether the single quote 
characters add up later, but that is only a heuristic improvement. Making the 
restriction apply to ALL parenthetic expressions could work a bit better, 
though it has the downside of non-optional parens (and there is probably at 
least one person somewhere whose code it would break). In this case, it might 
also be necessary to apply the same restriction to hermitians, so that in

A    '''''''

all single quotes are hermitians but in

A'  '
A ''''' '
(A) '

the rightmost single quote is a string terminator. Otherwise,

if (A + B)' 'hello' end

will confuse the lexer.

What do we want to do here? I have some code here which causes all single 
quotes that come after ")" or HERMITIAN tokens with spaces in between to be 
treated as string terminators, and preliminary inspection indicates that it 
doesn't cause any new test failures, but I need to poke around a bit more to be 
sure. I could write some tests to accompany it and submit a patch.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]