[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs contributions, C and Lisp

From: Eric Ludlam
Subject: Re: Emacs contributions, C and Lisp
Date: Fri, 09 Jan 2015 23:06:35 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

On 01/09/2015 10:09 AM, David Engster wrote:
Richard Stallman writes:
You and several others are trying to pressure me to decide to make
GCC output the full AST.  I have seen insults and harassment.

Not from me, and I haven't seen anything like it on this thread.

This is not the way to convince me.  It is the way to make me resent
your behavior.

I've no idea what I've done to earn your resentment. I think my behavior
was entirely reasonable, given that I've started with this only because
you asked to base our tooling efforts on GCC. Anyway, you don't have to
worry that I'll continue with this.

This conversation seems unnecessarily final. Richard has valid concerns that need details, but since the AST (which I know almost nothing about) is so huge (based on parser's I've written), no matter how many details we may think of that are needed, someone else can think of a bit that could indeed be unnecessary.

I wrote the first run for the "smart completion" engine that currently ships in Emacs, the parts of CEDET that includes EDE and Semantic. While I personally think it is pretty awesome, it really isn't hard to fool it which is where a lot of this GCC interest comes from. It took many years of my part-time work (and contributions from others like David) to assemble what is there now into a robust well tested system.

The basic pieces of the system which is implemented in Emacs Lisp consists of a parser generator plus some parsers written in a bison-like syntax including a C++ parser. Due to limitations of Emacs' performance, only the parts of the language that handle definitions are implemented. (ie - tags for functions, variables, structures, etc.) The parser outputs a tag table with lots of details. Having a full parser generator and the parser is what makes this convenient to do. Hacks like etags, GNU Global, etc can't produce enough information for the next step.

The next step is the completion engine. This is where regexp hacks exist to "parse" a statement like:

  i = foo.bar.substring

which peels it apart into a variable "i", a notion of assignment, and ("foo" "bar" "substring") via several assumptions, such as that users don't write code like this:

  i /* some variable */
  = /* equals */
  foo. /* mystruct */

The engine then goes and looks up i in reverse to see what it is. It then looks up foo in various tables that get built of known symbols, derives the data type, and thus members of foo. It iterates down through the "." symbols dereferencing each symbol by data type to get to the next step. This depends on the fact that most projects compile all their headers "the same way" so that tables parsed from some header included in this C file will have the same symbols when included in a different C file.

With that background, there are a couple options for a GCC plugin. One option would be to have one plugin that outputs tags compatible with some standard. Naturally I suggest the one already in Emacs. A second plugin could be used to figure out all the state I mentioned earlier when looking up symbols, and provide completions directly (ie - a list of text strings to offer as completions.) That plugin would ONLY be used for completion, and all the internal logic couldn't be reused for another purpose in Emacs.

The alternative is to dump out the AST into an Emacs friendly form, and write the above logic in Emacs instead. This is convenient because Emacs is easy to hack, and gcc plugins (based on what I've been reading) are really complicated. In terms of "get up and running quickly", dumping a big scary data structure out of a scary environment into a friendly easy to hack environment is a desirable path for us, and as Richard points out, for non-free software.

I personally think that if there were a good way to bridge the gap so that gcc could directly output tags for the existing Semantic engine, then there is an incremental benefit of nearly perfect tag generation for the existing tool AND a performance boost. It won't solve the whole problem though. To solve the rest of it, we'd need a gcc plugin to parse a file up to a chosen point. For a file with 1000 lines of code and included headers, gcc needs the WHOLE AST to make sense of "the last line", or the part that needs the completion. This is because we can't guess at what isn't needed until you've actually processed it all. This is where the boundary between gcc and Emacs comes in. In theory, the GCC plugin could process the AST and output ONLY the completions, or ONLY whatever was asked for (local types, scope information, refactoring data or what not.) An alternative might be to output a subset of the AST for processing in Emacs that is local to the the completion area, and depend on our old Emacs code to do the type lookup, etc. This would improve the current completion, but still could be fooled based on the quality of the Emacs data which is now, by definition, incomplete. In the past (call it year 2000) people thought my smart completion was lame (ie - inaccurate) and slow compared to dynamic abbreviation completion where claims of "dabbrevs is good enough" were stated. This proposal could be "good enough" for this single feature.

So, I've laid out some scenarios that are "not full AST" friendly. There are some benefits (performance), and tradeoffs (difficulty). Even so, we've only touched on one feature. There are lots of other features in the existing Semantic tool already in Emacs derived from having a parser built right into Emacs, such as highlighting code with syntax errors (but only the code for definitions, not the logic.) I have a long list of other things I'd love to do to such as redo font-lock with the many "hints" about what your code is doing that only the compiler could know, but can't because writing a parser from scratch is actually pretty hard and error prone regardless of doing so in Emacs. Many folks have touched on those features in a myriad of other thread replies in this mailing list. I've taken my best stab at some of them that seemed attainable, but feel I've gone as far as I can aside from some incremental improvements, or just adding new languages.

I've been very thankful for David's help with the CEDET project, and the many improvements in our existing smart completion engine he's made. For myself, and I imagine David, having gone through that and persevered simulating a compiler for so long to try and get these features only to have dabbrev people scoff on one side and clang users sneer on the other is disheartening. The hope of having a "real compiler" to lean on could open so many doors for us we just can't get to right now it is hard not to be discouraged by non technical issues.

I would hope that David, who is looking into the gcc plugin route, and Richard can find a reasonable compromise that enables Emacs to have data from gcc that would enabled our existing tools to grow in its accuracy, and would encourage contributions from folks who do not have the skills to hack gcc plugins create new features. I suspect that isn't possible until someone learns more about gcc's AST and thinks about what a good abstraction model for Emacs is, and how it could be applied to the existing pretty good smart completion system.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]