[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GSoC] "Parse using AST" Final Report

From: Vishal Gupta
Subject: [GSoC] "Parse using AST" Final Report
Date: Thu, 23 Aug 2018 18:26:03 +0530

Hello everyone,

Google Summer of Code provided me the opportunity to work with Automake during
the summers. Here is the final report on work done during the same.


                              [GSoC'18] Parse using
                              Abstract Syntax Tree - Final Report

Branch:- experimental/gsoc/ast

Commit History:- cf633fb61 to 5aebec2cc

1. Introduction

The goal of the project was to generate an abstract syntax tree after parsing
the input . It has separate lexer, parser and tree generation
module so that new features can be added incrementally. For debugging purpose
the AST generated is displayed as image using graph description language.

The current implementation can parse the following type of statements:
* Primaries : All the primaries are identified.
* Multiline statements : Statements on multiple lines ending with backslash
  and its combination with other type of statements
* Conditional statements : Single or nested conditional statements
  [ if-else-endif block ]
* Subdirs directive : The parser parses the file in the said
  subdirectory and generates the tree in that subdirectory.
* Include directive : The parser recursively parses the included file and
  merge the generated tree to the parent tree by serializing at child process
  and deserializing at parent process.

2. Technical Description

* Lexer :- It converts the input file line by line into tokens using perl
  regex. Whenever called by parser, it returns an array of tokens. Each token
  has a name and if required, a value.

* Bison Grammar :- The grammar of Automake is written in GNU Bison. It is used
  to generate state description and state transition diagram. Writing in Bison
  is easy and exact syntax can be captured. Adding new features and debugging
  them becomes easy as no change is required in parser. The tool is only
  required at maintainer end to generate the parsing table.

* Parsing Table :- It is an array of hashes. Each array index indicate the
  state and hash consist of key value pair (Key for token acceptable for that
  state and value is the next transition state). For reduction of grammar and
  generation of tree, a special 'reduce' key is used in Hash. It points to
  array containing number of tokens to reduce and corresponding function to
  call in Tree module. The start state is 0 and acceptance state is stored in
  variable with value decided by converter.

* Converter :- It takes as input the state description and state transition
  file generated from Bison and outputs the parsing table in a format
  understood by perl.

* Parser and AST :- It parses the input file using tokens provided by lexer
  and the parsing table provided by converter. LR parsing algorithm
  similar to this [] is implemented.
  At each reduction of the grammar, the node is generated for the tree
  by calling its corresponding function in tree module specified in parsing
  table. After successful parsing, the parser outputs the generated tree in
  DOT format.

* Graph Description Language and dot utility :- For visualization of abstract
  syntax tree, it is printed in the graph description language. It provide an
  easy interface to display directed and undirected graphs. As the size of
  input file increases, the tree increases in width significantly as compared
  to height as all the statements identified during parsing are attached to
  single master node. Unflatten utility is used to fix this. At last, dot
  utility is used to convert the graph to image file.

* Testing :- Test cases for features implemented are provided in “t” directory
  A simple script is executed which generates AST for the test files.
  Currently visual inspection of tree is required for validation.

3. Future Work

* Make rules :- Currently only basic make rules are identified by parser.
  Grammar needs to be extended to handle complex make rules.

* Using AST to generate :- Current project was to generate AST
  from input file. This AST can be used to generate Converting
  AST to the output file is another big task. It would require to implement
  how different variables are handled, what to output for corresponding
  statement and how to integrate with the current code base.

4. Conclusion

Learnt and enjoyed a lot. Applying the compiler theory learnt in class to
practical work with a language I was knowing only some basics was a wonderfull
experience. It also helped me to understand and follow good coding practices.

Most of the automake features are implemented. I will be working to fix the
grammar to identify complex make rules. With that, AST can be generated for
most of the files which are identified by Automake. As this is a new feature,
in the current state it cannot be merged in codebase. When generation of from AST be implemented can it be merged. I would like to continue
working with that part, but some guidance in that direction will be required.

Sorry for delay in presenting the report as I was not well.

Vishal Gupta

reply via email to

[Prev in Thread] Current Thread [Next in Thread]