[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Micro-optimization in ledger_format

From: Greg Chicares
Subject: Re: [lmi] Micro-optimization in ledger_format
Date: Thu, 17 Jan 2019 01:35:30 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0

On 2019-01-16 21:08, Vadim Zeitlin wrote:
> On Wed, 16 Jan 2019 20:23:29 +0000 Greg Chicares <address@hidden> wrote:
> GC>   commit cbb6cade2f89
> GC>   Measure speed of various stream-cast implementations
> GC> There, I added some alternatives to the existing
> GC> [...unit test for stream_cast<>(), which...] uses the same fundamental 
> [...]
> GC> technique, and can benefit from the same optimizations as ledger_format().
> GC> [...] But already you'll see that I wrote
> GC>   std::stringstream& imbued() // The name will make sense when you read 
> it.
> GC> to return a reference, while PR 104 has
> GC>   static std::stringstream interpreter{[]() {... return ss;} ()};
> GC> where I don't see a reference. How can a std::stringstream be returned
> GC> other than by reference, given that it lacks the special member functions
> GC> that would seem to be required? Or are reference semantics used 
> implicitly,
> GC> somehow?
>  I can answer this one: stringstream does have special member functions, it
> is movable, i.e. provides move ctor and move assignment operator, so this
> is why returning it by value works.

Thanks, I had leapt to an incorrect conclusion. Originally, I wrote
this without the '&', and now when I again try omitting it...

--- a/stream_cast_test.cpp
+++ b/stream_cast_test.cpp
@@ -71 +71 @@ To cast_2(From from, To = To())
-std::stringstream& imbued()
+std::stringstream imbued()

...I again see these diagnostics:

/opt/lmi/src/lmi/stream_cast_test.cpp: In function 'std::stringstream imbued()':
/opt/lmi/src/lmi/stream_cast_test.cpp:75:12: error: use of deleted function 
'std::__cxx11::basic_stringstream<_CharT, _Traits, 
_Alloc>::basic_stringstream(const std::__cxx11::basic_stringstream<_CharT, 
_Traits, _Alloc>&) [with _CharT = char; _Traits = std::char_traits<char>; 
_Alloc = std::allocator<char>]'
     return interpreter;
In file included from /opt/lmi/src/lmi/stream_cast.hpp:53:0,
                 from /opt/lmi/src/lmi/stream_cast_test.cpp:24:
/usr/lib/gcc/i686-w64-mingw32/7.3-win32/include/c++/sstream:734:7: note: 
declared here
       basic_stringstream(const basic_stringstream&) = delete;

...so apparently at least the copy ctor is undefined--and when I realized
that, I seemed to remember that it is so because streams are inherently
problematic to copy.

It crossed my mind to try move semantics instead, but at the time I
avoided doing so because of your review comment on PR 104. Nevertheless,
I just tried it now, with this diff vs. HEAD (which uses 'git diff -U0'
for extreme terseness):

diff --git a/stream_cast_test.cpp b/stream_cast_test.cpp
index d391ed6f..f7d471cd 100644
--- a/stream_cast_test.cpp
+++ b/stream_cast_test.cpp
@@ -71 +71 @@ To cast_2(From from, To = To())
-std::stringstream& imbued()
+std::stringstream imbued()
@@ -75 +75 @@ std::stringstream& imbued()
-    return interpreter;
+    return std::move(interpreter);
@@ -83 +83 @@ To cast_3(From from, To = To())
-    std::stringstream& interpreter {imbued()};
+    std::stringstream&& interpreter {imbued()};
@@ -103 +103 @@ To cast_4(From from, To = To())
-    std::stringstream& interpreter {imbued()};
+    std::stringstream&& interpreter {imbued()};

and the result seems quite interesting:

  stream_cast     : 4.835e-006 s mean;          4 us least of 2069 runs
  minimalistic    : 3.807e-006 s mean;          4 us least of 2627 runs
  static stream   : 2.392e-006 s mean;          2 us least of 4181 runs
  static facet too: 2.929e-006 s mean;          3 us least of 3415 runs
  without str()   : 2.795e-006 s mean;          3 us least of 3578 runs

In light of the extra improvement on the last two lines (e.g., compared
to the original timings quoted immediately below), do you still want to
follow the canonical advice cited in your PR 104 review comment, i.e.,
to avoid that using rvalue references on the theory that they inhibit
RVO, which "should" be more efficient...except that it seemingly isn't?

How can such a large difference as
  static stream   : 2.392e-006 s mean;          2 us least of 4181 runs
  static facet too: 2.929e-006 s mean;          3 us least of 3415 runs
be obtained by replacing a static local variable with a reference
initialized by a function call?

Or is this yet another case where I've misinterpreted the data? Or,
alternatively, could it be that these measurements for MinGW gcc-7.3
reflect some peculiarity version of that compiler?

> GC> I'm seeing timings like this:
> GC> 
> GC>   Speed tests...
> GC>   stream_cast     : 4.707e-006 s mean;          4 us least of 2125 runs
> GC>   minimalistic    : 3.638e-006 s mean;          3 us least of 2750 runs
> GC>   static stream   : 2.335e-006 s mean;          2 us least of 4283 runs
> GC>   static facet too: 2.271e-006 s mean;          2 us least of 4404 runs
> GC>   without str()   : 2.207e-006 s mean;          2 us least of 4532 runs
> GC> 
> GC> The comments in the left-hand column are terribly terse, but comments in
> GC> the code explain what's going on.
>  I wonder why did you decide to number cast_N functions and fN lambdas
> (and why do the latter even exist in the first place as named variables
> instead of just passing them directly to TimeAnAliquot()?) instead of using
> some suffixes based on the labels above? This would make it simpler to
> track which test corresponds to which result IMO.

Numbering them took less effort than naming them well, at a time when
I was focusing all the effort I could muster on analyzing the underlying
issues--which [above] I still don't believe I understand.

And the only reason they exist in the first place as named variables is
that I haven't yet internalized the syntax and can't write it fluently,
so I copied it from the
        << "\n  Database speed tests..."
section of 'input_test.cpp'--where, IIRC now, the variables exist only
because two reasonably short lines like these:
      auto f3 = [&db]     {db.query<oenum_alb_or_anb>(DB_AgeLastOrNearest);};
        << "\n  query<enum>(scalar) : " << TimeAnAliquot(f3)
were less bad than a single 120-character line.

Such flaws can be addressed in a later revision, after we establish
whether the "prefer RVO to move semantics" advice truly applies here.

> GC> Because you reported an overall speedup
> GC> in PDF generation on the order of ten percent, I figured that this
> GC> subroutine would have to become about twice as fast--which is 
> approximately
> GC> what I observe, so...no surprise there.
>  FWIW I see something similar under Linux with clang:
>   Speed tests...
>   stream_cast     : 3.584e-06 s mean;          3 us least of 2790 runs
>   minimalistic    : 2.725e-06 s mean;          2 us least of 3670 runs
>   static stream   : 1.947e-06 s mean;          1 us least of 5137 runs
>   static facet too: 1.846e-06 s mean;          1 us least of 5419 runs
>   without str()   : 1.574e-06 s mean;          1 us least of 6354 runs
> but IMO in both cases the tests running time is too small and I think they
> should be running for longer than they do to have more accurate results
> (but I haven't bothered checking if this is really the case...).

There's an argument that lets us force the tests to take longer.
Changing each of the five relevant lines in this fashion:

-        << "\n  stream_cast     : " << TimeAnAliquot(f0)
+        << "\n  stream_cast     : " << TimeAnAliquot(f0, 10.0)

produces [with the move-semantics patch above]:

  stream_cast     : 4.550e-006 s mean;          3 us least of 21977 runs
  minimalistic    : 3.027e-006 s mean;          3 us least of 33033 runs
  static stream   : 2.153e-006 s mean;          2 us least of 46448 runs
  static facet too: 2.505e-006 s mean;          2 us least of 39916 runs
  without str()   : 2.481e-006 s mean;          2 us least of 40303 runs

As we've discussed before, the most robust statistic here may actually
be the minimum time ("least" above), at least if we have enough timer
resolution to measure it (which may not be the case here--we could
calculate and display nanoseconds instead, but IIRC we don't actually
have an accurate nanosecond timer).

For comparison, here are results of HEAD without the move-semantics
patch, but with the extra TimeAnAliquot() argument immediately above:

  stream_cast     : 5.155e-006 s mean;          4 us least of 19400 runs
  minimalistic    : 3.095e-006 s mean;          3 us least of 32315 runs
  static stream   : 2.053e-006 s mean;          2 us least of 48722 runs
  static facet too: 2.079e-006 s mean;          2 us least of 48104 runs
  without str()   : 2.080e-006 s mean;          2 us least of 48067 runs

which, as ever, are to be taken cum grano salis.

> GC> Imbuing a facet statically, OAOO, is a comparatively tiny improvement, at
> GC> least for the particular facet used in this unit test.
>  But it's the same one as is used in the real code, isn't it?

There are two real functions:
 - stream_cast<>(), which uses the blank_is_not_whitespace_locale() facet
     (used as rarely as possibly, perhaps close to never)
 - ledger_format(), which uses std::comma_punct
     (used so often for PDFs that its effect is palpable)
But I assume that analyzing the effect of one facet is sufficient to
optimize the use of any, because I'm a facet egalitarian:

  What though on hamely fare we dine,
  Wear hodden grey, an' a that;
  Gie fools their silks, and knaves their wine;
  A Man's a Man for a' that

> GC> It looks like just adding the word 'static' (instead of full-blown
> GC> IIFE) suffices for the one really dramatic improvement.
>  It probably does, I don't know/remember if we tested just this change
> separately. But OTOH it makes sense -- to me at least -- to use an IIFE to
> fully initialize the stream anyhow, regardless of performance
> considerations. Again, I realize that it may seem like a weird idiom if

s/weird/mondo bizarro/

    // Initialize an integer to zero.
    static int zero{[]() {int z {0}; return z;} ()};

In C++98, I could write that off the top of my head, and feel highly
confident that I'd gotten it right the first time.

> this is the first time you see it, but I like it because it often allows me
> to make a local variable const even if it has to be initialized in several
> steps: if it's not modified later, I can just stash these steps into a
> lambda and still initialize the variable in a single statement.

What's wrong with writing a simple auxiliary function (like imbued()
in this unit test, though in a namespace and with a less cryptic name)
right above its single point of use?

Having slept on it, however briefly, I awake viewing this as the
opposite of syntactic sugar--this is more like syntactic olestra:
(Be glad you live in a country that bans it.)

>  Of course, here the stream is not, and can't be, const, but the idea of
> having a (local) function initializing it still seems nice to me.

Putting it into a function: highly beneficial.

Making that function local: slight incremental benefit, but is it
really worth that outlandish syntax? Can we reasonably aspire to
fluency in such a language?

> GC> I imagine that there's a possibility for error when we call clear() 
> without
> GC> at the same time calling str(""). We expect the EOF flag in the static
> GC> stream object after each use, so we certainly need to clear that flag
> GC> before each reuse; yet clear() would also wipe out the 'bad' and 'fail'
> GC> flags, potentially leaving unwanted data in the buffer. We could of course
> GC> use rdstate() to test EOF only, but that sounds too complicated, and it's
> GC> cheap to call str().
>  Yes, you're right, sorry for missing this.

Okay, we'll add
which AFAICT writes its argument in the way that most clearly tells
the compiler what we want it to do. Then, once we've figured out the
std::move() vs. RVO thing (well, once you've figured it out and
explained it to me), we'll be ready to change both ledger_format and

reply via email to

[Prev in Thread] Current Thread [Next in Thread]