Re: [Bug-apl] Parsing Numbers

Hi Nick,

APL historically has required consecutive numbers to be separated by at least one character that can't be part of a number.

This differs from the lex approach of matching the longest legal thing, and not caring about the next thing beginning immediately thereafter.

Since doing it differently doesn't add any new functionality, it's probably best to just stick with the way it's always been done.

Such occurrences in peoples code are almost always typos, and it's best to tell them about them.

Regards,

Mike

On Fri, Aug 28, 2015 at 10:23 AM, Nick Lobachevsky <address@hidden> wrote:

Before the parsing comes the lexical analysis....

Have a look at the ancient Unix lex (or flex) utility for some insight
as to how GNU APL might recognise numbers. Also, have a look at the
APL.LEX definition file from Timothy Budd's APLc compiler project,
see http://home.earthlink.net/~swsirlin/aplc.tar.Z which you can
unzip. More info here: http://home.earthlink.net/~swsirlin/aplcc.html

Budd's Lex regex definitions for numeric constants are:
(".ng"{ws})?[0-9]+\.[0-9]*([eE][+-]?[0-9]+)? {return( lexnum(RCON));}
(".ng"{ws})?[0-9]*\.[0-9]+([eE][+-]?[0-9]+)? {return( lexnum(RCON));}
(".ng"{ws})?[0-9]+ {return( lexnum(ICON)); }
With Lex, the longest match wins. Evidently, the reason for the two
similar real number definitions is to support things like 1.e3 and
.1e3 instead of the more complete 1.0e3 and 0.1e3.

So to my Lex-influenced way of thinking,

¯5¯6¯7
really should be three negative numbers, as the high minus
unambiguously begins the next numeric token.

1E6E7
1E6 is a complete numeric token and processing ends for that number
immediately. What follows is E7, which looks like a variable or
function name.

1E¯¯6
would be four tokens, 1, then E, then a lone high minus, then negative 6.

1E¯
three tokens

1D¯¯6
also four tokens, interesting why the Bad Number.

From:	Mike Duvos
Subject:	Re: [Bug-apl] Parsing Numbers
Date:	Fri, 28 Aug 2015 11:12:19 -0700