[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Optimization for reading *.d files
From: |
Paul Smith |
Subject: |
Re: Optimization for reading *.d files |
Date: |
Sun, 19 Mar 2017 13:39:21 -0400 |
On Sat, 2017-03-18 at 22:49 -0700, brenorg wrote:
> > I'd prefer to investigate improving the existing parser, rather than
> > create a completely separate parser path.
>
> I could agree if the difference were 10x or more. But I believe 2x a
> reasonable gain from removing so many features. From what I looked on the
> code, the main hassle comes from the target specific variable assignment.
The thing is, the parser has to walk through all the characters on each
line at least once. It can know, after that walk, whether there's
anything special about a given word or complete line. There's no reason
the parser should have to do a lot of extra work, if it already knows
that there isn't anything interesting here to work on. This is kind of
the same idea as the existing shell invocation fast-path: if the command
line in the recipe is simple enough then we don't have to invoke a
shell: we just invoke the command directly. We tell whether it's simple
enough by merely looking for certain special characters in the command;
if we see any of them we automatically fall through to the slow path
(full shell invocation).
Maybe that information isn't being provided where it's needed, to allow
short-cuts to be taken in the parser. Or maybe there's some complexity
here that I haven't thought of that makes this impossible.
But I don't see why even a 2x slowdown should be expected by the full
parser.
> So to sum up:
> 0 - I will get back with results for a newer version.
> 1 - How crazy it would be to make it multi-threaded?
The question we have to ask is what is the upside. If your disk is
slow, then you're waiting for your disk: a HD has only one head so you
can only read one file at a time from the disk no matter how many
threads you have. Waiting for MORE disk IO isn't going to speed things
up appreciably, if the time spent actually parsing files is small
compared to the wait time for more content to parse.
If the parse time is more equal to the disk IO time, then you might get
some benefit from having some amount of lookahead, either by async IO or
one extra thread.
The question is do you REALLY get performance gains for this added
complexity? I'm not convinced it's a no-brainer. I'd need to see some
analysis showing exactly where the time is going during the parsing.
> 2- This should be configurable with a very strong disclaimer. The
> alternative scanner wouldn't do any sanity check, so it could be dangerous.
> 3 - Other option could involve creating a separate tool to collect a bunch
> of "simple files" and pre-process them into a compact database. That
> resulting file could then be read into the makefile. By doing that, Make
> would have to understand this internal compact database format. Still, it
> would probably need a lot code, even more than the simple scanner.
It's quite possible something like this could be done via an extension,
either Guile or a shared library, that maintained a database. To make
it really efficient we'd need a new API that allowed extensions to
define new rules, or at least prerequisite definitions, but even without
that condensing the values to a single instance (as you've discovered)
could be helpful.
I mean something like, defining a new function that would parse a .d
file and add content into some kind of database. If you use Guile you
could use guile-dbi with sqlite, or just roll your own. This function
would be run as part of the recipe, after the compiler generates the .d
file (something tricky would need to be done here because normally
functions are expanded before the recipe is invoked).
Then in addition a new function would be defined that extracted content
from the database and fed it to make to define the rules. Something
like:
DATABASE = makedeps.sqlite
$(guile (create-rules $(DATABASE)))
%.o : %.c
$(CC) ... # create .d file
$(guile (parse-deps $(DATABASE) $*.d))
as I said this won't work directly because the guile function would be
expanded before the compiler is run; something would have to happen
there.