[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Optimization for reading *.d files
From: |
brenorg |
Subject: |
Re: Optimization for reading *.d files |
Date: |
Sun, 19 Mar 2017 11:12:50 -0700 (PDT) |
Paul Smith-20 wrote
> On Sat, 2017-03-18 at 22:49 -0700, brenorg wrote:
>> > I'd prefer to investigate improving the existing parser, rather than
>> > create a completely separate parser path.
>>
>> I could agree if the difference were 10x or more. But I believe 2x a
>> reasonable gain from removing so many features. From what I looked on the
>> code, the main hassle comes from the target specific variable assignment.
>
> The thing is, the parser has to walk through all the characters on each
> line at least once. It can know, after that walk, whether there's
> anything special about a given word or complete line. There's no reason
> the parser should have to do a lot of extra work, if it already knows
> that there isn't anything interesting here to work on.
Yes, that was the hope I had before seeing the code. Unfortunately, the code
is not well structured enough to make this optimization simple to implement.
That's why I followed the "simpler scanner" path.
Paul Smith-20 wrote
>> So to sum up:
>> 0 - I will get back with results for a newer version.
>> 1 - How crazy it would be to make it multi-threaded?
>
> The question we have to ask is what is the upside. If your disk is
> slow, then you're waiting for your disk: a HD has only one head so you
> can only read one file at a time from the disk no matter how many
> threads you have. Waiting for MORE disk IO isn't going to speed things
> up appreciably, if the time spent actually parsing files is small
> compared to the wait time for more content to parse.
>
> If the parse time is more equal to the disk IO time, then you might get
> some benefit from having some amount of lookahead, either by async IO or
> one extra thread.
>
> The question is do you REALLY get performance gains for this added
> complexity? I'm not convinced it's a no-brainer. I'd need to see some
> analysis showing exactly where the time is going during the parsing.
I don't think disk plays much into this. If the SO file cache is hot, most
time should be spent on the parser - and that is what I see.
I ran perf on the actual code parsing a large number of files, and 80% of
the time goes to eval_makefile/eval.
Paul Smith-20 wrote
>> 2- This should be configurable with a very strong disclaimer. The
>> alternative scanner wouldn't do any sanity check, so it could be
>> dangerous.
>> 3 - Other option could involve creating a separate tool to collect a
>> bunch
>> of "simple files" and pre-process them into a compact database. That
>> resulting file could then be read into the makefile. By doing that, Make
>> would have to understand this internal compact database format. Still, it
>> would probably need a lot code, even more than the simple scanner.
>
> It's quite possible something like this could be done via an extension,
> either Guile or a shared library, that maintained a database. To make
> it really efficient we'd need a new API that allowed extensions to
> define new rules, or at least prerequisite definitions, but even without
> that condensing the values to a single instance (as you've discovered)
> could be helpful.
>
> I mean something like, defining a new function that would parse a .d
> file and add content into some kind of database.
I love the idea. A generic callback API would be nice and easy to support.
I don't know much about Guile. I will take a look at that.
Next steps are to see how far the "condensing" values takes me and get back
in here if I think we can do better.
--
View this message in context:
http://gnu-make.2324884.n4.nabble.com/Optimization-for-reading-d-files-tp17656p17664.html
Sent from the Gnu - Make - Bugs mailing list archive at Nabble.com.