http://satyakiran.googlepages.com/google_soc.txt Google Summer of Code Application Mentor - GNU CONTENTS 1. NAME / CONTACT INFORMATION 2. PROJECT TITILE 3. PROBLEM DESCRIPTION / SUGGESTED SOLUTION / DELIVERABLES 4. PLAN FOR THE PROJECT 5. QUALIFICATION 5. RESUME 6. REFERENCES *********************** NAME / CONTACT INFORMATION ************************************************************ Satya Kiran Popuri 1062 W Taylor st Apt 2 Chicago IL 60607 USA. Phone: 773-780-1101 ( Cell ) *********************** PRJECT TITLE ************************************************************************** GNU Bison: Burke-Fisher Error Correction *********************** PROBLEM DESCRIPTION /SUGGESTED SOLUTION *********************************************** To implement Burke-Fisher Error Correction into Bison C output. Summary: YACC traditionally implements error correction with error productions and the inbuilt 'error' token. This requires the grammar writer to clutter her grammar with possible error productions. The burden of strategic error recovery is put on the grammar writer. A global error recovery algorithm like Burke-Fischer error repair can handle errors without any explicit error productions in the grammar. The algorithm looks for a single token insertion/deletion/replacement to the left of (and including) the erroneous token and tries to parse through the error token successfully. This strategy is based on the fact that an incorrect token can occur in the input string much before the token where the error is actually discovered. The algorithm is described in the paper titled "A Practical Method for LR and LL Syntactic Error Diagnosis and Recovery" by Michael G. Burke and Gerald A. Fisher. The full text of this paper is available from: http://www.cs.berkeley.edu/~jcondit/pl-prelim/burke87practical.pdf Solution: An implementation of the Burke-Fisher Error Correction algorithm requires two simultaneous parsers to be generated in the bison output. One parser performs shift/reduce actions on the "current" stack and a second "old" parser follows the same actions k tokens behind the current parser. When an error is encountered, the current parser can call the old parser to try all possible single token replacements starting k positions to the left of the current token. These k tokens are stored in a steady queue. When the old parser is able to parse R=4 tokens ahead, with one of the single token insertions/deletions/replacements, we can consider the error to be recovered successfully. ML-YACC, a parser generator written in ML programming language already offers this facility. An ML implementation of the above algorithm is available online at: http://www.cs.cmu.edu/afs/cs/user/fp/courses/95-lp/code/elpsml/base.sml Semantic Actions: Since bison is often used to generate parsers for many different languages, we cannot guarantee side-affect free semantic actions. Hence the semantic actions of the grammar productions reduced by the "current" parser must be deferred until the "old" parser has parsed successfully. All semantic actions are carried out on the "old stack". The role of the "current" parser is just to check syntax. Additional Bison declarations for error recovery: When token insertions are tried, we need some semantic value for the inserted token. These values can be supplied by adding a new %value directive to bison. Also, the grammar writer will be able to suggest some error corrections for token replacement by using the %change declaration as implemented in ML-YACC. Deliverables: 1. The changed bison code. Particularly the parser generator part of the code will change a lot. Also minor changes to the front end to check for a new command line option for Burke-Fisher error recovery mode. 2. Documentation of all changes made. *********************** PROJECT SCHEDULE *********************************************** 1. May 23 - June 16 Understand Bison code base and do a low level design document of the code changes to be made. 2. June 19- July 21 Make required code changes in bison and do unit tests. Also prepare regression and system test cases. 3. Jul 24 - Aug 13 System testing and regression testing with existing bison error mechanism and also the new mechanism implemented. 4. Aug 14 - Aug 21 Documentation - Documentation will be carried out with the development phase. This final stage is only to refine the documentation. *********************** QUALIFICATION *********************************************** I studied compiler design during spring 2006 at UIC. LR parsing theory was a large part of the curriculum. I used Bison to generate a parser and abstract syntax tree for a small subset of C as part of the course (see http://satyakiran.googlepages.com/cminus for details). But this was only a toy compiler. Now I want to explore real world problems. I think a compiler compiler will enable me to understand things more deeply than just a compiler. Another important reason for taking up this project is that I always wanted to contribute to free software, but never got a chance. I thought now is a good time and opportunity to dive into the world of free software. *********************** RESUME *********************************************** Please visit http://satyakiran.googlepages.com/hresume.html to view my resume. Thanks. *********************** REFERENCES *********************************************** 1. Prof V.N. VenkataKrishnan, Dept. of Computer Science, University of Illinois a Chicago, Chicago, IL. Ph: address@hidden