[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re-working yyitems

From: Akim Demaille
Subject: Re: Re-working yyitems
Date: Wed, 8 May 2019 17:41:55 +0200

Hi Valentin,

Thanks a lot for your effort!

> Le 7 mai 2019 à 04:23, Valentin Tolmer <address@hidden> a écrit :
> Hi,
> I'm still in the progress of refactoring glr.cc to be more c++-friendly and
> eventually introduce variants. Right now, I'm trying to eliminate the last
> use of YYMALLOC: yyitems.
> yyitems is an (optionally) expandable array of yyGLRStackItem, which
> contains all the items in all the glr stacks. It's currently handled as an
> arena-style array where allocating a new item is just bumping up the
> yynextFree pointer. When it needs to grow, reallocation is a bit tricky
> since we have to update all the cross-referencing fields of the elements.
> Same for compression (IIUC, it's compressed when we get back to a single
> stack after removing the others). The main advantage is that
> allocating/deleting items on the stack is near-instantaneous, but it makes
> the code complex.


> If we want to avoid re-writing a vector-like container, the grow operation
> should be a simple copy. However, the elements have internal pointers to
> each other. The possibilities are thus:
> - Keep the current implementation: fast, but ugly and complex

Something that you have not described, is how much effort you think
it would take.  I suppose this first approach is basically asking for
no development effort, which is also a good thing.

In particular, I am worried about ruining the performances, so I would
feel much better if we could keep the "current implementation" as a base
line for benches.  Then we would have something to compare alternatives
against, which might rule them out for being too slow.

Parsers must be fast.  There's a huge number of parser generators, and
we should not shoot ourselves in the ungulate's hoof; it must run as
fast as possible.

> - Switch to indexes instead of pointers for internal references, and switch
> the container to vector<item>: extra care must be taken when deleting
> elements to update the indexes, but otherwise growing, pushing and popping
> should be painless

> - Switch to a vector<unique_ptr<item>>: This way the pointers are stable.
> We're counting on malloc to handle the arena for us, and
> growing/pushing/popping/deleting is painless. However, we lose access to
> the index of an element (we can't work it out from a pointer), which is
> only used in debug. If the index itself is not important and can be
> replaced by a UID, it would be easy enough to add one to the item. This is
> the approach I propose.

I have still not looked at your work, but maybe we should start reviewing?
And focus on them for Bison 3.5?  Then we explore these tracks?  Or are
you saying that variants cannot be tailored to work yet until you are done
with these changes?

Note that so that we stick to C++98...  I am not strongly against requiring
C++11 for glr.cc, but I would feel more comfortable if users of glr.cc
could give their opinion.

> Detailed summary of the operations:
> Currently:
> yyGLRStackItem* yyitems; -> raw array of items.
> item.yystate/yyoption -> state or option (union field)
> item.yystate.yypred/item.yyoption.yystate -> raw pointer to another state.
> - when growing:
>  - if state:
>    - reloc yystate.yypred
>    - reloc yystate.yysemantics.yyfirstVal
>  - if option:
>    - reloc yyoption.yystate
>    - reloc yyoption.yynext
>  - reloc yysplitPoint
>  - reloc all the yystacks[].yystate
> - when compressing:
>  - update the yypred
>  - null the split point and last deleted
>  - after split point:
>    - update yystate
>    - update yystacks[0].yystate
> - when creating an element:
>  - essentially no-op
>  - bump nextFree
>  - callers should call reserve_glrstack to make sure there is headroom.
> - when accessing an element:
>  - faster (?), element is inlined in the vector.
> - when dumping:
>  - easy access to the element's index (&item - yyitems)
> Proposed:
> std::vector<std::unique_ptr<yyGLRStackItem>> yyitems; -> array of
> unique_ptr of
> items. Overall, much simpler code, removing several fields, simplifying the
> growing/compressing functions, guaranteed memory safety, ...
> - when growing:
>  - small vector, cheaper to move
>  - simple push_back
> - when compressing:
>  - std::remove_if or equivalent
> - when creating an element:
>  - cost of allocation
> - when accessing an element:
>  - cost of dereference (but anyway the elements were not often accessed
>    sequentially (?))
> - when dumping:
>  - no access to the element's index
>    - Is it needed?
>    - Do they need to be dense? Or can we assign a UID to each element?
> Need to
>      store the UID, though.

Don't worry too much about the additional space costs in debugging mode.
Just don't emit the corresponding piece of code with parse.trace is not

Reading your summary, it does look much simpler and probably not so costly.

But if IIUC, we are adding an indirection for all the yyitems.  I
suppose that it's because the union is split in two different structs
of different sizes?  I guess that's where we might lose quite some
efficiency: many allocations instead of a single large one.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]