[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Micro-optimization in ledger_format

From: Vadim Zeitlin
Subject: Re: [lmi] Micro-optimization in ledger_format
Date: Wed, 16 Jan 2019 22:08:29 +0100

On Wed, 16 Jan 2019 20:23:29 +0000 Greg Chicares <address@hidden> wrote:

GC> Please take a look at
GC>   commit cbb6cade2f89
GC>   Measure speed of various stream-cast implementations
GC> which I'll push presently.

 Yes, I've actually been looking at it already, after receiving the
notification email about it.

GC> There, I added some alternatives to the existing
GC> 'stream_cast_test.cpp', which uses the same fundamental
GC>   std::stringstream ss;
GC>   ss << source;
GC>   ss >> destination;
GC>   assert(ss.eof());
GC> technique, and can benefit from the same optimizations as ledger_format().
GC> I've left IIFE out of that commit, because my unfamiliarity with that
GC> technique is just an obstacle to understanding. We can always reconsider
GC> re-IIFE-ifying the code later. But already you'll see that I wrote
GC>   std::stringstream& imbued() // The name will make sense when you read it.
GC> to return a reference, while PR 104 has
GC>   static std::stringstream interpreter{[]() {... return ss;} ()};
GC> where I don't see a reference. How can a std::stringstream be returned
GC> other than by reference, given that it lacks the special member functions
GC> that would seem to be required? Or are reference semantics used implicitly,
GC> somehow?

 I can answer this one: stringstream does have special member functions, it
is movable, i.e. provides move ctor and move assignment operator, so this
is why returning it by value works.

GC> I'm seeing timings like this:
GC>   Speed tests...
GC>   stream_cast     : 4.707e-006 s mean;          4 us least of 2125 runs
GC>   minimalistic    : 3.638e-006 s mean;          3 us least of 2750 runs
GC>   static stream   : 2.335e-006 s mean;          2 us least of 4283 runs
GC>   static facet too: 2.271e-006 s mean;          2 us least of 4404 runs
GC>   without str()   : 2.207e-006 s mean;          2 us least of 4532 runs
GC> The comments in the left-hand column are terribly terse, but comments in
GC> the code explain what's going on.

 I wonder why did you decide to number cast_N functions and fN lambdas
(and why do the latter even exist in the first place as named variables
instead of just passing them directly to TimeAnAliquot()?) instead of using
some suffixes based on the labels above? This would make it simpler to
track which test corresponds to which result IMO.

GC> Because you reported an overall speedup
GC> in PDF generation on the order of ten percent, I figured that this
GC> subroutine would have to become about twice as fast--which is approximately
GC> what I observe, so...no surprise there.

 FWIW I see something similar under Linux with clang:

  Speed tests...
  stream_cast     : 3.584e-06 s mean;          3 us least of 2790 runs
  minimalistic    : 2.725e-06 s mean;          2 us least of 3670 runs
  static stream   : 1.947e-06 s mean;          1 us least of 5137 runs
  static facet too: 1.846e-06 s mean;          1 us least of 5419 runs
  without str()   : 1.574e-06 s mean;          1 us least of 6354 runs

but IMO in both cases the tests running time is too small and I think they
should be running for longer than they do to have more accurate results
(but I haven't bothered checking if this is really the case...).

GC> The most important improvement is making the std::stringstream static.

 Yes, constructing streams is expensive.

GC> Imbuing a facet statically, OAOO, is a comparatively tiny improvement, at
GC> least for the particular facet used in this unit test.

 But it's the same one as is used in the real code, isn't it?

GC> It looks like just adding the word 'static' (instead of full-blown
GC> IIFE) suffices for the one really dramatic improvement.

 It probably does, I don't know/remember if we tested just this change
separately. But OTOH it makes sense -- to me at least -- to use an IIFE to
fully initialize the stream anyhow, regardless of performance
considerations. Again, I realize that it may seem like a weird idiom if
this is the first time you see it, but I like it because it often allows me
to make a local variable const even if it has to be initialized in several
steps: if it's not modified later, I can just stash these steps into a
lambda and still initialize the variable in a single statement.

 Of course, here the stream is not, and can't be, const, but the idea of
having a (local) function initializing it still seems nice to me.

GC> I imagine that there's a possibility for error when we call clear() without
GC> at the same time calling str(""). We expect the EOF flag in the static
GC> stream object after each use, so we certainly need to clear that flag
GC> before each reuse; yet clear() would also wipe out the 'bad' and 'fail'
GC> flags, potentially leaving unwanted data in the buffer. We could of course
GC> use rdstate() to test EOF only, but that sounds too complicated, and it's
GC> cheap to call str().

 Yes, you're right, sorry for missing this.

GC> I think this unit test should suffice for deciding what to change; we'll
GC> run a few manual tests anyway to see how much faster PDF generation
GC> becomes on our machines, but ledger_format() doesn't need its own unit
GC> test at this time.

 Yes, you've already done what I was just thinking of potentially doing.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]