[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Bison: D language support
Re: GNU Bison: D language support
Sun, 10 Feb 2019 11:05:43 +0100
> Le 8 févr. 2019 à 20:12, H. S. Teoh <address@hidden> a écrit :
> On Thu, Feb 07, 2019 at 07:10:12PM +0100, Akim Demaille wrote:
> [...] Which list do we use for such
> discussions? bison-patches seems to be primarily for discussing actual
> code changes, rather discussion of this sort.
bison-patches is really the appropriate place. Currently it mostly
looks like a verbatim of the patches that make it in the repo, but
back in the days when we were several maintainers, that's where
discussions took place.
>> Well, I made no efforts to have the C++ parser derives from some
>> base class. Yet, there are a couple of "virtual" in the API, but
>> let's consider them historical artifacts. I don't think it makes
>> much sense to have a hierarchy of parsers.
> Yeah, I didn't think so either. Because of this, I'm inclined to have
> Bison emit a parser struct rather than a class -- at least by default.
> If it's not too onerous I suppose we could make it a user-configurable
> option. But I don't anticipate anyone clamoring for that, so perhaps we
> should just stick with struct.
I also agree here. One problem we face in Bison is that we have many
options already, and writing the test suite is itself a challenge.
And it does happen that some combination is not tested, and behaves
We should avoid offering too many options, unless it is quite clear
that it's need to offer a specific behavior. Likewise, the parser API
should remain narrow IMHO.
>>>> - On a more high-level note, I'm wondering how flexible the API of
>>>> the parser can be. The main thought behind this is that given
>>>> enough flexibility, we may be able to target, e.g., @nogc, @safe,
>>>> pure, etc.. With @safe probably a pretty important target, if
>>>> it's possible to do so. While this depends of course on the
>>>> exact code the user puts into the .y file, a worthy goal is to
>>>> make the emitted D code @safe (pure, etc.) by default unless the
>>>> user writes address@hidden code in the .y file.
>> I cannot comment on this. But the generated parser should aim at
>> the least constrains. So the generated code itself should not
>> require a GC, IMHO.
> Makes sense. Using a struct instead of a class would help towards not
> requiring a GC. :-)
> Supporting @safe is certainly a worthwhile goal, though it does impose
> certain restrictions: unions that involve pointer members, for example,
> are verboten in @safe. There's the @trusted escape-hatch for such
> occasions, though I personally don't feel comfortable with the idea of
> @trusted code that's auto-generated, since the point of @trusted is for
> a human reviewer to vet the code for memory safety where the compiler
> cannot prove it, and that's incompatible with auto-generation.
There are tons of papers that prove the correctness of the algorithm
implemented in Bison, so you should not feel worried about that. Of
course, proving an algorithm and proving an implementation is not the
same thing :) Besides, the user is free to mess around in her actions.
So if it's possible, what is purely generated code (without bits from
the user) should probably be trusted. When it comes to user actions,
maybe we need to be more cautious. An %define variable such as
api.parser.trusted maybe? so that the user can declare herself if she
wants to claim so.
>>>> - How flexible can the lexer API be? For example, currently
>>>> lexer.yyerror takes a string argument, which requires using
>>>> std.format in various places. If permissible, I'd like to have
>>>> yyerror take a generic input range instead, so that we can avoid
>>>> the inherent memory allocation of std.format (e.g., if we wish to
>>>> target @nogc).
>> lexer.yyerror? yyerror is expected to be part of the parser,
>> not the scanner.
> OK, I may have been confused by the calc.y example, where CalcLexer
> declares a yyerror() method.
I wrote that in a Q&D way, based on the Java example. And I had not
>> It's painful that the interface of yyerror has to be declared
>> by the user in C and C++, but, again, that's rather an historical
>> scar. You should aim at a fixed signature.
> Makes sense. D does allow static introspection of function signatures,
> so potentially one approach could be for the generated code to detect
> the signature of yyerror and adapt accordingly.
I'm not sure you need this. (So far) Bison calls yyerror only with
a single string: it assembles the error message before passing it to
yyerror. So, at least for a start, you could keep yyerror's interface
simple: possibly location, then string.
> I see. So it's basically a hook for user-defined code to handle errors
> however the user sees fit.
>>>> - On a more general note, I'd like to make the parser/lexer APIs
>>>> range-based as much as possible, esp. when it comes to
>>>> string-handling. But I'm just not sure how much the APIs are
>>>> expected to conform to the analogous C/C++/Java APIs.
>> Because in practice the maintenance falls on the shoulders of
>> the Bison maintainers, we want to API to remains as alike as
>> possible, without being unnatural to the host language.
> Makes sense.
> I was hoping for a Bison API more idiomatic to D,
I'm not saying it should not be! I agree it should be idiomatic to D.
But when there are different roughly equivalent options, I'd like to
stick to the one used in the other backends.
> e.g., instead of
> explicitly binding to a lexer object, the parser could simply receive an
> input range of tokens (an input range in D is any type that supports the
> iteration primitives .empty, .front, .popFront). This can default to
> yylex, but the user would be able to pass any token source to the
> parser, including pre-baked arrays of tokens, e.g. in a unittest to
> ensure the parser handles certain specific token sequences correctly.
Beware of the test suite...
I agree that what you suggest sounds good though (and I actually have to mock
this in the test suite on top of the yylex interface).
Let's follow your path, and see where it goes.
>>>> - Can Bison handle UTF-8 lexer/parser rules? D uses UTF-8 by
>>>> default, and it would be nice to leverage this support instead of
>>>> manually iterating over bytes, as is done in a few places.
>> Bison does not care about your encoding, it sits on top of a
>> stream of tokens, not a stream of characters. Again, because
>> of history, it accepts bytes as tokens-of-the-poor, but it should
>> not learn to read UTF-8, that's not its business.
> What I had in mind when I wrote that was yytnamerr_(), which appears to
> be used for formatting error messages.
yytnamerr is an abomination. We need to work on this (in all the langages)
in the near future. Don't focus on it too much right now, we will probably
to something better.
[GitHub vs bison-patches]
> How would that work? Just submit PRs to your github repo and get CI,
> then post the patches and close the PRs? Just wondering what the
> currently accepted process is.
bison-patches is the proper place for the humans to discuss the patch
until it is validated. GitHub's PRs will provide you with a CI on travis-ci.org
(we used to be on .com, I recently moved to .org where I can have five
concurrent slaves instead of three), and will provide me with a easy
means to import your work into master.