Re: Parsing

On Mon, Sep 15, 2014 at 2:29 PM, Stefan Israelsson Tampe <address@hidden> wrote:

i find that logic programs together with memoization and auto cut ideoms helps in making powerful parsers, really fun at least.

You should definitely look at Tom's paper then, the "director's cut" is here:

http://www.tom-ridge.com/doc/ridge14sle_extended.pdf

He explicitly mentions the analogy with cut, and the parsers are memoised. He's a theorem prover by trade, and he started this by producing machine-checked formal proofs in HOL4 of the soundness and completeness of combinator parsers,

I implemented a python parser
using those [...] you my think that it will backtrack horribly but thanks to memoization it will not and gives the AST quite quickly.

No, I believe you! And you should see what you can do with ambiguous grammars. Ridge gives some crazy Ackermann-like numbers. The parser's cache can be used as a compact representation of the otherwise hyper-exponential size of the ASTs an ambiguous grammar generates.

"For example, for input length 19,there are 441152315040444150 parse trees, but as with most exponential behaviours it is not feasible to actually compute all these parse trees. Now let us consider the following parser, which is identical except that it computes (in all possible ways) the length of the input parsed [...] Naively we might expect that this also exhibits exponential behaviour, since presumably the parse trees must all be generated, and the actions applied. This expectation is wrong. Running this example parser on an input of size 19 returns in 0.02 seconds with a single result 19. For an input of size 100, this parser returns a single result 100 in 5 seconds, and over a range of inputs this parser exhibits polynomial behaviour rather than exponential behaviour"

This amused me, because I implemented the earlier (pre-oracle) parser he described in Standard ML, and I did something which I thought later was a bit stupid: I executed all the semantic actions _during the parse_ and cached the semantic results in the "memo-pad" along with the syntactic derivation. It wasn't so slow, because of the memoization. But I am wondering now whether there is not some application to proof-search that one could use this technique on. A parse is a derivation of a kind, so with the right kind of grammar it could be a derivation of some sort of proof, and if that were an ambiguous grammar then it could explore non-deterministic relations, like a Prolog interpreter does.

I am talking out of my hat again, but maybe someone who has a clearer idea of these things will see a spark amongst all the smoke. But I hope no-one would spend a year or something looking for a way to prove P=NP because of what I'd said. (Unless they actually proved P=NP, which would be very cool!)

Oh, guile-log is based on C code that pretty restrictive in the kind of cpu you can use, basically only amd64 atm. That can be changed but there is really not much interest
in this software with more than a nicel little playground for this little schemer.

Is it a lot of C? Maybe you could replace that with lightning assembler, and then it would run on a lot of hardware?

Ian

From:	Ian Grant
Subject:	Re: Parsing
Date:	Mon, 15 Sep 2014 15:59:34 -0400