lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Pagination anomaly


From: Greg Chicares
Subject: Re: [lmi] Pagination anomaly
Date: Mon, 5 Feb 2018 22:04:02 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 2018-02-05 19:24, Vadim Zeitlin wrote:
> On Sun, 4 Feb 2018 13:59:43 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> I've applied and pushed both changesets, and rebuilt lmi with MinGW
> GC> (without rebuilding wx). The previously-reported anomaly in the
> GC> seventh cell is no longer seen (thanks), but now there seems to be
> GC> another problem: I load the input census shared the other day,
> GC> press Ctrl-Shift-I [Census | Print case to PDF], and all expected
> GC> PDF files are written, but a messagebox warns:
> GC>   Logic error: 1 missing extra page(s)
> GC> on line 1207 of 'ledger_pdf_generator_wx.cpp'.
> 
>  Sorry, it took me a long(er than expected) time to find this problem
> because I couldn't see it initially -- because I completely forgot that I
> had intentionally modified my sample.policy to insert a "FIXME-VZ
> NonGuaranteedFootnote" string into the corrseponding datum because I wasn't
> sure where this string was supposed to appear.

First of all, IIRC, there's a 'sample2' product that should differ from
'sample' in that all its '.policy' strings are non-empty. At least that
was the intention as far as I can remember. But if I had remembered it
better then I would have used 'sample2' for the test case discussed here.

Are you _generating_ these 'sample*' product files, as opposed to simply
extracting them from some tarball in lmi.nongnu.org's ftp area? If you're
running lmi's [non-auto]make files, then these 'sample*' files are
generated and installed, and should have a current date, which is good;
if they date from the last millennium, that's a problem.

Perhaps I should add another 'sample' product for each ledger type.

> And this, of course, changed
> the page layout just so slightly, but enough to hide the problem. After
> undoing this change to sample.policy, I could see the bug immediately and,
> examining the PDF generated for the composite ledger more closely, I saw
> that there was an even worse problem in it and all the other ones: the
> first year on the non-first page of each tabular report was skipped because
> of a bug I introduced in the last fix. The fix is trivial and I've created
> https://github.com/vadz/lmi/pull/63 for it (which also contains another
> small change) or you can just do this:

I think it's better for me to "apply your pull-request" (or "merge
your branch", or however a git maven would express that). That's
cleaner than applying a patch copied out of an email, it fits
better with your way of working, and it's actually easier for me
now that I've registered your repository as a "remote".

> but I still have a few more problems/questions with/about this code.
> 
>  First one is that it's painfully obvious that I can't avoid bugs in it, so
> it would be great to have tests verifying that this works correctly.

Bentley said "While the first binary search was published in 1946,
the first binary search that works correctly for all values of n did
not appear until 1962." (And his "correct" version was wrong for
arrays larger than half the machine's addressable memory.) Even TeX
has defects: it's a universal human condition--so, we test everything.

> Unfortunately this code is not easy to test in isolation, so I need to
> either refactor it in some non-trivial and not really natural way to allow
> it, or we need to implement the text output idea from the other thread, but
> do it in a way allowing to test the pagination logic. The latter would
> probably be more useful, wouldn't it?

This is a hard question. Superficially, yes, of course it would be
beneficial to have text output available as a side effect. But the
real question--how to ensure that pagination is always correct--may
be orthogonal to that. For example, flat-text output blocks contain
an integral number of lines, but a pdf renderer probably needs to
deal with fractional measurements: 5+1 surely equals 6, but does
4.999 + 1.001?

In some ideal sense it seems that it ought to be possible to abstract
out the pagination code into a small, easily-testable routine. But
AFAICS that would essentially mean taking a vector of the vertical
sizes of a set of blocks, and comparing its total to the available
page size.
  P = Page length
  H = Header (vertical) size
  F = Footer size
  Q = size of each Quinquennial group of table elements
Then the problem is determining the maximum N such that
  P - H - F >= N*Q
with due regard to padding between blocks. However, without having
studied the code yet, I would suspect that this inequality might
not be the source of the difficulty here, and that abstracting it
out wouldn't be helpful. I'm tempted to delete this paragraph before
sending because it seems only to belabor the obvious, but I'll keep
it in the hope that it helps us to see something less obvious that
actually might prove helpful.

>  Second problem is that currently there are 2 functions dealing with page
> breaks: page_with_tabular_report::render() and get_extra_pages_needed().
> This gives me twice as many opportunities to introduce bugs in them (as if
> I needed it), so I wonder if I should remove the latter and just run the
> code from render() twice, once with output disabled, to just compute the
> number of pages needed, and second time for real.

Again, I haven't studied this code, but from my naive ivory-tower
perspective this sounds like a Surpassingly Good Idea.

> One disadvantage of doing
> it like this is that get_extra_pages_needed() is currently more efficient
> than render(), but I'm not sure if it's really noticeable.

Can you measure it easily?

BTW, we currently have a header that only defines

enum enum_output_mode
    {e_output_normal
    ,e_output_measure_only

and I'd like to move that into 'oecumenic_enumerations.hpp', where
a number of similar enums already reside. In doing so, I'd like to
rename everything so that whole entity is self-documenting--e.g.:

enum oenum_render_or_measure
    {oe_render
    ,oe_measure
    };

The android documentation speaks of screen rendering as
  "a two pass process: a measure pass and a layout pass"
although they also have a later "draw" step; so I figure that
the meanings of "render" and "measure" in the PDF or HTML context
are unmistakably clear. Any objection?

> Another possible
> advantage of the current approach is that get_extra_pages_needed() result
> is used as a check for render() and that if (exactly) one of them is buggy,
> chances are that we notice it due to the warning which was triggered. Does
> this justify keeping both functions?

Brooks [_The Mythical Man Month_, page 64 of the Anniversary Edition] said:
  "Never go to sea with two chronometers; take one or three."
but that doesn't mean that having two, we should throw one overboard.
Sometimes I've retained two competing implementations during testing,
using one to check the other, and popping up a messagebox if they
disagree. Rarely, I've even done that in production, stealthily
drafting end users to do the final testing, after fixing every
discrepancy we were able to find by any other means...and ultimately
removing the redundant implementation after a few months.

Clearly we're dealing with a hard problem here, and such an approach
might be fruitful. If it's easy enough to make one implementation
call the other for comparison, then it's be a shame to throw one
away, especially if it might readily have found a subtle error.

>  Third and final problem is that I think that the current code is still
> wrong, although at least now it's consistently wrong in both functions. The
> case when it is possibly wrong is when the page can fit 5*n+k rows with
> k = 2, 3 or 4 and the total number of rows is 5*n+l with l<k. In this case,
> all but the last page contain exactly n groups, but the last but one page
> could contain n+1 groups, with the last one being incomplete. Instead,
> currently it still contains n groups and a new page is used for the last
> incomplete group, so we could have a page with just one row on it, even if
> there was space for up to 4 rows on the previous page. This seems wrong to
> me because it doesn't use the pages optimally, but I'm not completely sure
> whether this should really be changed because it has a potential advantage
> of having exactly the same number of years on all the previous pages. E.g.,
> to take the concrete example I've been testing with, there are 74 years in
> the composite PDF and the page could fit 39 of them. As this is only enough
> for 7 complete groups, the first page shows the years 1..35 and the second
> one currently (i.e. with the fix above applied) shows the years 36..70,
> with the years 71..74 overflowing to the last page. Would it be better to
> show 1..35 on the first page, but 36..74 on the second one, even if it
> would make it longer than the first one? I think it would, but I'd be
> grateful if you could please confirm it.

Usually I try to abbreviate quoted text so that it's not too much longer
than the answer, but that won't work here, so the answer is "Yes":
  {1..35}, {36..70}, {71..74} wastes a page, objectionably
  {1..35}, {36..74} is a tasteful tradeoff of uniformity for brevity
The latter is preferable.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]