[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How the backquote and the comma really work?

From: Pascal J. Bourguignon
Subject: Re: How the backquote and the comma really work?
Date: Wed, 12 Aug 2015 18:30:52 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Michael Heerdegen <address@hidden> writes:

> Marcin Borkowski <address@hidden> writes:
>> Interestingly, there's a lot of buzz about Lisp /interpreter/ written
>> in Lisp, but not so much about Lisp /reader/ written in Lisp.  In
>> fact, I didn't find one on the Internet.

Not looking good enough.

and of course, there's one in each lisp implementation.

> Good question.  Maybe it's because doing such things is mainly for
> educational reasons, and when you want to learn how a language works,
> studying the interpreter is more beneficial.

But also, it's assumed that by teaching the most complex subjects,
people will be able to deal with the less complex subjects by

Sometimes indeed it looks like not.

>> Now I'm wondering: is my approach (read one token at a time, but never
>> go back, so that I can't really "peek" at the next one) reasonable?
>> Maybe I should just read all tokens in a list?  I do not like this
>> approach very much.  I could also set up a buffer, which would contain
>> zero or one tokens to read, and put the already read token in that
>> buffer in some cases (pretty much what TeX's \futurelet does.  Now
>> I appreciate why it's there...).

Most languages are designed to be (= to have a grammar that is) LL(1);
there are also LR(0), SLR(1), LALR(1) languages, but as you can see, the
parameter is at most 1 in general.   What this means is that the parser
can work my looking ahead at most 1 token.  That is, it reads the
current tokens, and it may look the next token, before deciding what
grammar rule to apply.  Theorically, we could design languages that
require a bigger look-ahead, but in practice it's not useful; in the
case where the grammar would require longer look ahead, we often can
easily add some syntax (a prefix keyword) to make it back into LL(1) (or
LALR(1) if you're into that kind of grammar).

Why is it useful?  Because it allows to read, scan and parse the source
code by leaving it in a file and loading only one or two tokens in
memory at once: it is basically an optimization for when you're
inventing parsers on computers that don't have a lot of memory in the 60s.

And then! Even the first FORTRAN compiler, the one in 63 passes,
actually kept the program source in memory (4 Kw), and instead loaded
alternatively the passes of the compiler to process the data structures
of the program that remained in memory!

So indeed, there's very little reason to use short look-ahead, only that
we have a theorical body well developped to generate parsers
automatically from grammar of these forms.

So, reading the whole source file in memory (or actually, already having
it in memory, eg. in editor/compiler IDEs), is also a natural solution.

Also for some languages, the processing of the source is defined in
phases such as you end up easily having the whole sequence of tokens in
memory. For example, the C preprocessor (but that's another story).

Finally, parser generators such as PACKRAT being able to process
grammars with unlimited lookahead, can benefit from pre-loading the
whole source in memory.

In any case, it's rather an immaterial question, since on one side, you
have abstractions such as lazy streams that let you process sequences
(finite or infinite) as an I/O stream where you get each element in
sequence and of course, you can copy a finite stream back into a
sequence.  Both abstractions can be useful and used to write elegant
algorithms.  So it doesn't matter.  Just have a pair of functions to
convert buffers into streams and streams into buffer and use whichever
you need for the current algorithm!

> I really don't get the point in which way the Python example would have
> advantages over yours.  The only difference is that your version
> combines the two steps that are separate in the Python example.  Your
> version is more efficient, since it avoids building a very long list
> that is not really needed and will cause a lot of garbage collection to
> be done afterwards.

Nowadays sources, even of complete OS such as Android, are much smaller
than the available RAM.  Therefore loading the whole file in RAM and
building an index of tokens into it will be more efficient than
performing O(n) I/O syscalls.

__Pascal Bourguignon__       
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk

reply via email to

[Prev in Thread] Current Thread [Next in Thread]