[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How the backquote and the comma really work?
Re: How the backquote and the comma really work?
Sun, 23 Aug 2015 10:30:23 +0200
On 2015-08-12, at 18:30, Pascal J. Bourguignon <address@hidden> wrote:
> Michael Heerdegen <address@hidden> writes:
>> Marcin Borkowski <address@hidden> writes:
>>> Interestingly, there's a lot of buzz about Lisp /interpreter/ written
>>> in Lisp, but not so much about Lisp /reader/ written in Lisp. In
>>> fact, I didn't find one on the Internet.
> Not looking good enough.
> and of course, there's one in each lisp implementation.
But often in C or something, not in Lisp.
>> Good question. Maybe it's because doing such things is mainly for
>> educational reasons, and when you want to learn how a language works,
>> studying the interpreter is more beneficial.
> But also, it's assumed that by teaching the most complex subjects,
> people will be able to deal with the less complex subjects by
> Sometimes indeed it looks like not.
Especially if one doesn't have a CS background, and is mostly
Also, it's not that I'm unable to deal with that; after a few
iterations, I usually succeed. My problem was not that I can't do it,
my problem was that I felt I was doing it suboptimally, and wanted to
see how smarter/more knowledgeable people deal with that.
>>> Now I'm wondering: is my approach (read one token at a time, but never
>>> go back, so that I can't really "peek" at the next one) reasonable?
>>> Maybe I should just read all tokens in a list? I do not like this
>>> approach very much. I could also set up a buffer, which would contain
>>> zero or one tokens to read, and put the already read token in that
>>> buffer in some cases (pretty much what TeX's \futurelet does. Now
>>> I appreciate why it's there...).
> Most languages are designed to be (= to have a grammar that is) LL(1);
> there are also LR(0), SLR(1), LALR(1) languages, but as you can see, the
> parameter is at most 1 in general. What this means is that the parser
> can work my looking ahead at most 1 token. That is, it reads the
> current tokens, and it may look the next token, before deciding what
> grammar rule to apply. Theorically, we could design languages that
> require a bigger look-ahead, but in practice it's not useful; in the
> case where the grammar would require longer look ahead, we often can
> easily add some syntax (a prefix keyword) to make it back into LL(1) (or
> LALR(1) if you're into that kind of grammar).
Now my lack of education is easily seen. I only heard about formal
grammars (well, I had one class about them - I mean, /one class/, 90
minutes, some 15 years ago).
> Why is it useful? Because it allows to read, scan and parse the source
> code by leaving it in a file and loading only one or two tokens in
> memory at once: it is basically an optimization for when you're
> inventing parsers on computers that don't have a lot of memory in the 60s.
And basically, this confirms my intuition that reading one token at
a time is not necessarily a stupid thing to do.
> And then! Even the first FORTRAN compiler, the one in 63 passes,
> actually kept the program source in memory (4 Kw), and instead loaded
> alternatively the passes of the compiler to process the data structures
> of the program that remained in memory!
> So indeed, there's very little reason to use short look-ahead, only that
> we have a theorical body well developped to generate parsers
> automatically from grammar of these forms.
> So, reading the whole source file in memory (or actually, already having
> it in memory, eg. in editor/compiler IDEs), is also a natural solution.
> Also for some languages, the processing of the source is defined in
> phases such as you end up easily having the whole sequence of tokens in
> memory. For example, the C preprocessor (but that's another story).
> Finally, parser generators such as PACKRAT being able to process
> grammars with unlimited lookahead, can benefit from pre-loading the
> whole source in memory.
Thanks for sharing - as hinted above, I have a lot to learn!
> In any case, it's rather an immaterial question, since on one side, you
> have abstractions such as lazy streams that let you process sequences
> (finite or infinite) as an I/O stream where you get each element in
> sequence and of course, you can copy a finite stream back into a
> sequence. Both abstractions can be useful and used to write elegant
> algorithms. So it doesn't matter. Just have a pair of functions to
> convert buffers into streams and streams into buffer and use whichever
> you need for the current algorithm!
And most probably I'll end up coding an abstraction like this, with
a function for looking at the next token without “consuming” it, and
a function for “popping” the next token. Converting between buffers and
streams wouldn’t be very useful for me, since I would either lose the
whole text structure (line-breaks, comments), or have to do a lot of
work to actually preserve it.
>> I really don't get the point in which way the Python example would have
>> advantages over yours. The only difference is that your version
>> combines the two steps that are separate in the Python example. Your
>> version is more efficient, since it avoids building a very long list
>> that is not really needed and will cause a lot of garbage collection to
>> be done afterwards.
> Nowadays sources, even of complete OS such as Android, are much smaller
> than the available RAM. Therefore loading the whole file in RAM and
> building an index of tokens into it will be more efficient than
> performing O(n) I/O syscalls.
OTOH, here I walk an Emacs buffer and not an external file. Moreover,
as I said, I don’t want to lose info on where I am in the source.
Faculty of Mathematics and Computer Science
Adam Mickiewicz University