lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Preliminary emit_ledger code refactoring


From: Vadim Zeitlin
Subject: Re: [lmi] Preliminary emit_ledger code refactoring
Date: Wed, 29 Jul 2015 16:14:46 +0200

On Wed, 29 Jul 2015 10:14:23 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2015-07-16 20:41, Vadim Zeitlin wrote:
GC> [...]
GC> >   Use ledger_emitter class instead of [pre_]emit_ledger() functions.
GC> 
GC> I'm studying this patchset now, and I'm stuck on this particular patch.
GC> 
GC> Previously the ledger-emitter implementation was just two functions:
GC>   double LMI_SO pre_emit_ledger(...);
GC>   double LMI_SO     emit_ledger(...);
GC> 
GC> It might seem sensible enough to combine those into one single class
GC>   class LMI_SO ledger_emitter
GC> because, after all, some of the arguments of those two functions were
GC> the same and may as well be stored as members.

 The real impetus behind this change is the need to store as members
something that is _not_ passed as arguments to these functions, the rest is
just an extra benefit, although the ability to easily add an end() method
without duplicating the arguments already passed to the previously called
[pre_]emit_ledger() is also very nice.

GC> OTOH, this patch adds
GC>   class ledger_single_format_emitter
GC> with derived classes
GC>   class ledger_emitter_pdf_file       :public ledger_single_format_emitter
GC>   class ledger_emitter_pdf_to_printer :public ledger_single_format_emitter
GC>   class ledger_emitter_pdf_to_viewer  :public ledger_single_format_emitter
GC>   class ledger_emitter_test_data      :public ledger_single_format_emitter
GC>   class ledger_emitter_text_stream    :public ledger_single_format_emitter
GC>   class ledger_emitter_custom_0       :public ledger_single_format_emitter
GC>   class ledger_emitter_custom_1       :public ledger_single_format_emitter
GC> and additionally
GC>   class ledger_emitter_single_output_base :public 
ledger_single_format_emitter
GC> with its derived classes
GC>   class ledger_emitter_spreadsheet    :public 
ledger_emitter_single_output_base
GC>   class ledger_emitter_group_roster   :public 
ledger_emitter_single_output_base
GC> and further
GC>   class ledger_emitter_impl
GC> all of which makes it much more complex.

 I think it depends on the point of view. From the point of view of the
functions/class user I think it makes things unambiguously simpler because
all these extra classes are not seen at all (they're completely private)
and using the public class is shorter, simpler and more clear than using the
functions, just look at the diff to group_values.cpp (the only additions
are due to extra timer-related lines which make sense for other reasons,
but ledger emission-related changes just remove the now unneeded arguments,
basically).

 From the point of implementation, the change replaces a single monolithic
function with a long chain of "ifs" with a collection of classes providing
different implementations of virtual functions. While this is not
obviously a simplification if you compare it with the original code in
emit_ledger.cpp as we still have the same chain of "if"s, I'd argue that
the new "if"s are slightly simpler because they're very homogeneous and
they only occur once while they were repeated in both pre_emit_ledger() and
emit_ledger() before. But the real reason for this change is that without
it we would have to repeat this chain of "ifs" one more time in
post_emit_ledger() or whatever end() would have been called and I really
didn't want to do this. Replacing "if"s with virtual methods call is, I
believe, a rather well-known way of dealing with this problem and so this
is what I used.

 Next, ledger_emitter_single_output_base is really just a way to reuse code
and avoid duplicating these 2-3 lines which are the same for the 3 classes
(2 existing ones and a new one). It could be just removed, of course, but
then the assert and std::remove() call in its ctor would need to be copied
thrice. Using this class is just a way to avoid this (notice, BTW, that
this is only possible when using classes to represent different choices,
with the "if chain" approach we'd have no choice but to duplicate it).

 Finally, ledger_emitter_impl is the only part of my changes which clearly
does add some complexity. This is done in order to reduce compilation
dependencies, as usual, with the "pImpl" idiom, and completely isolates the
clients of the ledger_emitter class from any changes to its implementation.
I'm used to writing all the new classes in this way, not only because it
helps a lot with the compilation speed of big projects, but also because it
clearly separates the interface (what is in the header file) from the
implementation (everything else). But if you believe it's an overkill here,
ledger_emitter_impl can, of course, be easily merged into ledger_emitter
itself.

GC> Maybe that's in the eye of the beholder. Maybe this is a more modern way of
GC> doing things, and I've simply fallen behind the times. Maybe this is clearer
GC> from a functional-programming point of view, while I'm clinging to a 1950s
GC> imperative style and understand C control structures more readily than class
GC> hierarchies. Whatever the reason, I've tried pretty hard to like this,

 This is definitely not new and as purely OO (and not FP) as it gets, but
the main thing is that it is not about liking, or disliking, it at all
because the benefits are perfectly objective and not subjective: when
adding a new emitter kind ("inforce bill"?), you would have had to add code
specific to it in 3 different places with the old approach: to
pre_emit_ledger(), emit_ledger() and post_emit_ledger() (the latter is
needed when producing any kind of reports with a footer, so it's really
non-negotiable). With the new design, you add a single class implementing a
well-defined (and checked by compiler) set of pure virtual methods and you
modify the code in exactly one place to map a bit in mcenum_emission to
this class. With a chain of "if"s it was trivially easy to forget to update
one of them and even if you did find all places to update, your code was
scattered among them. With the class-based approach you have a clear view
of what needs to be done and it is done in a single place, inside the same
class. Don't you like this?

GC> but I'm stuck because, to me, this way seems harder to understand.

 Superficially the old code was simpler to follow, but it was a deceptive
simplicity. Looking at the code in emit_ledger() you could easily forget
that it relied on something done in pre_emit_ledger() a long way away from
the code you were looking at. Looking at it now, you see everything related
to the given emission kind in one place.

 And, as explained above, the code is definitely easier to modify.

GC> > They are needed because the
GC> > PDF generation can't be done in a single pass, if only because of the page
GC> > numbers (but also for a few other reasons), and so the code needs to keep
GC> > its state between the calls to emit_ledger().
GC> 
GC> How big and complicated is that state?

 Very much so.

GC> Is it the entire premium-quote-pdf implementation,

 Yes.

GC> or just some small part of that? I'd guess that it's not simple, just
GC> from my limited experience with trying to use html for a comparable
GC> purpose--in order to say "page M of N" on each page, you need to know
GC> N, which means you have to lay out the whole thing and then go back and
GC> plug N in. Of course, in that example, if a single integer N is the
GC> only state that needs to be preserved, then that's quite simple; but I
GC> doubt the present case is that trivial.

 There is the number of pages and also the totals which can only be
computed at the end, so the generation just can't be done in a single pass.
Because of this, all the data is just stored initially and then everything
is generated at once when end() is called.

GC> Could this state be stored elsewhere?

 It is, there is a separate class storing it.

GC> Alternatively, could we use emit_ledger() to write a TSV file--much like
GC> the present PrintRosterTabDelimited() but with different contents
GC> distilled out of the ledger objects--and generate the PDF from that file?

 Sorry but I really don't see any advantages of doing it like this.

GC> > It also needs to know when
GC> > the last ledger is added, and while it could be done by adding a
GC> > post_emit_ledger() function or even just supposing that the call to
GC> > emit_ledger() with a composite ledger indicates the end of the generation,
GC> > adding a class with a separate end() method looked like the most natural
GC> > solution to me.
GC> 
GC> The (unpatched) emit_ledger() can be used for a single_cell_document (.ill),
GC> in which case there's no composite. But for premium quotes it can be used
GC> only for a multiple_cell_document (.cns), in which case the composite by
GC> its nature is always the last ledger generated--that's required by the
GC> problem domain.

 So do you think end() should be dropped and call to add_composite() be
used instead? This would work, of course, but it just seems ugly to me
because it overloads the add_composite() function with a second role
("finalize ledger emission") that is not obvious at all.


 Let me summarize: if I understand you correctly, the perceived sources of
extra complexity are:

0. The existence of emit_ledger class instead of 2 (or 3) free functions.
1. The use of class hierarchy instead of duplicating a chain of "if"s.
2. The existence of ledger_emitter_single_output_base intermediate class.
3. The use of ledger_emitter_impl to isolate interface from implementation.

 I can understand the desire to get rid of (3), use of "impl" is an
acquired taste in C++ (it would be better if the language just did it
automatically behind the screens as other ones do) and lmi is not such a
huge project that the use of it is indispensable for the project
well-being. So let's drop this if you want.

 The use of ledger_emitter_single_output_base is a small detail and I could
accept that introducing an extra class to avoid triplicating 2 lines of
code is an overkill and could drop it too if you object to it more strongly
than I object to code duplication.

 The use of class hierarchy is, IMO, a very good thing. I explained that

(a) Any extra complexity is not seen from the outside.
(b) This makes the code easier to modify.
(c) The simplicity of the existing code is an illusion.

above and I can't really add anything to it. I strongly believe that the
approach using it instead of a (triplicated, again) chain of "if"s is much
better. But if you absolutely can't stand it, I would obviously have no
choice but to use the "if"s.

 Finally, the use of an object with several methods instead of free
functions is just a necessity as any non trivial report generation emission
type requires having internal state. I can't imagine doing things otherwise
unless the group premium generation code is taken completely outside of the
ledger emission context. But to fit it into it (which, I think, is the
right thing to do), we just need to introduce the ledger_emitter class and
I don't see any reasonable way around it -- nor do I understand why should
we be looking for one, while I know of people who use C++ without
exceptions, templates or deep object hierarchies, I've never heard about
any rules against using _simple_ objects.


 To summarize the summary, I'm unrepentant about the perceived complexity
introduced by this patch because I still believe it's done for good
reasons. The only part which I don't care much about is (3) but I hope you
can reconsider your feelings about (1) because I'd really like to keep it.
(2) is trivial so I'll do whatever you prefer and (0) just can't be
changed.

 Please let me know what should I do/how should I change the patches,
VZ

reply via email to

[Prev in Thread] Current Thread [Next in Thread]