lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Control flow in page_with_tabular_report::render()


From: Vadim Zeitlin
Subject: Re: [lmi] Control flow in page_with_tabular_report::render()
Date: Mon, 12 Feb 2018 19:47:28 +0100

On Sun, 11 Feb 2018 20:23:36 +0000 Greg Chicares <address@hidden> wrote:

[...]
GC> > GC> I understand that render() doesn't work that way today: instead, it
GC> > GC> waits until it's about to print something, and decides at that point
GC> > GC> whether to print a row, print a blank line, or start a new page.
GC> > GC> But in principle all those decisions can be made in one unit-testable
GC> > GC> class and embodied in some data structures that it can provide.
GC> > 
GC> >  Yes.
GC> 
GC> What are those data structures? What information do they embody,
GC> and in what fashion? That's what I'm trying to figure out.

 In general, focusing on data structures is, of course, an excellent idea
(even if, regrettably, some Finnish guy who wrote a hobby OS beat you to
the most quotable way of formulating it), but I'm not sure if finding the
most appropriate data structure for the outputs of a pagination algorithm
is really the best way to handle this particular problem, as this would make
the code using rather unnatural IMO. I'd prefer to use what is formally
called "template method design pattern" instead and keep these data
structures implicit in the algorithm itself and just allow to customize
what happens when something interesting -- e.g. starting a new page,
outputting or skipping a row -- happens. I'll try to justify my point of
view below.

GC> Given this preliminary sketch:
GC> 
GC> /// Spread J rows in groups of K over a number of pages of length L.
GC> ///
GC> /// For example, lay out
GC> ///   [J] 100 data rows
GC> ///   [K] in groups of five, with blank lines between groups
GC> ///   [L] on pages with 37 usable lines (net of headers and footers)
GC> ///
GC> /// data consists of "rows"; pages consist of "lines"
GC> ///
GC> /// Preconditions:
GC> ///  0 <= J
GC> ///  0 < K <= L
GC> 
GC> ...my candidates for questions we'd like to ask are:
GC>  - how many pages are needed?
GC>  On a "normal" page (not the last):
GC>  - how many groups are there?
GC>  - how many data rows are consumed?
GC>  - how many output lines are printed?
GC>  On the last (which might be the only) page:
GC>  - [same questions, I suppose, with different answers]
GC> Should there be, say...
GC>   one data_page_layout struct for the last page,
GC>   one data_page_layout struct for any and all prior pages, and
GC>   one int for the total number of pages?
GC> Should 'data_page_layout' above be a struct, a tuple, or what?
GC> Instead of two structs (last page, other pages), might we want
GC> a vector<struct> so that the render() logic can just iterate
GC> across it rather than do an "is this the last page yet"
GC> calculation that could be coded incorrectly...especially since
GC> the cardinality of the vector is going to be tiny?

 I think representing the outputs of this algorithm as something the code
using it could iterate over is indeed a good idea, i.e. I'd make it return
a collection (so vector, in C++) of pages and each page would provide an
iterator over either groups (but then we'd need to pass it the number of
rows in the group, which could be less than maximum) or rows (but then we'd
need to have a way to distinguish between blank and non-blank rows).

 However any such approach would have 2 fundamental (AFAICS) problems:
first, the code using it would be more complicated than currently because
we need to render the fixed part of the page in order to compute its height
and feed it to this algorithm so that it could have the number of rows per
page, yet we also need to do it during iteration. This just can't be not
cumbersome (a bit like this sentence).

 Second, this completely separates the handling of the row and its vertical
position, as one of them is (implicitly, now) in the general part, while
the other one is still in PDF code. I find this problematic because they
must be in sync, and while it should be simple to notice if this is not the
case, it's still not automatic and I don't like that.


 By contrast, in a "template method" approach, we'd have some paginator
class with virtual methods (or callbacks, for which we could use any
callable, such as std::function<> or a lambda, if you prefer) that would be
called when

1. The new page starts, and return the size of the fixed part (and render
   it if necessary).
2. A row is output, and return the height of the row (and render it). Do
   nota that this method is called with the vertical position of the row
   that would be updated by paginator itself.
3. A row is skipped (i.e. a blank row is output): this could be merged
   with (2) if desired, but I think it's nicer to keep it separate.

 This would still allow us to keep the algorithm entirely generic and
easily testable; make the code using it very simple and without any loops
at all; keep the row index and vertical position in sync automatically.
It wouldn't make the data structures apparent, which is, again, a bad thing
in theory, but I have trouble convincing myself that this is really the
case in this particular example, where the data structures are so simple,
while the code is not, and making the data structures explicit complicates
the code rather than simplifies it.

 So what do you think of this approach?


GC> What I'm trying to say is: could we please just call B::foo()
GC> instead of C::foo(), because it's much easier for me to read
GC> and I don't think we'll ever want to implement a C::foo()?

 This is completely unrelated to the rest of this message and is a much
more narrow, and so less important, issue, but I'd still like to re-state
that I strongly disagree with the proposal above. In a well-designed class
hierarchy it shouldn't matter which particular base class method is called
and seeing the call to B (grand parent class) method instead of C (direct
parent class) is unusual, fragile (as explained before) and immediately
provokes the question of why do we need to skip C implementation here. A
call to the base class method, whether it exists or not, is a clear and
simple indication that the derived class augments the base class behaviour,
which is fine. A call to indirect base class method is a big warning sign
indicating that something weird is going on and always requires an
explanatory comment IMO. Let's please not hard-code the base class name in
the code, it's unnecessary, makes the code easier to break and confusing.

 Regards,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]