lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Discussion of using external templates for PDF generation


From: Greg Chicares
Subject: Re: [lmi] Discussion of using external templates for PDF generation
Date: Mon, 24 Jul 2017 15:57:54 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 2017-07-23 23:02, Vadim Zeitlin wrote:
> 
>  This is the second part of the PDF generation discussion. In the first one
> I've shown the current approach and the first natural question is: what's
> not to like about it? My main problem with it is that it's repetitive and
> while I probably should have defined an array of terms names/definitions
> in the particular example I chose to make it less so, using arrays doesn't
> work for the other pages and even in this example I'm not totally sure it
> would help things that much because of the conditions surrounding some of
> the paragraphs.

Much of that page consists of terms and their definitions, like this:

  Death Benefit: The amount payable by reason of death.
  -----term----  ------------definition----------------

so it's natural to consider an approach like

  for(auto const& i : term_definition_pairs)
       {os << '"' << i.first << '"' << ": " << i.second << '"' << '\n';}

but I think you're right to avoid that because of the conditionals. The
array isn't the lowest-level concept here, because the conditionals are
at a lower level and their complexity would probably invade the array
code, polluting it. Maybe it could be done by iterating over a vector
of function pointers like {DeathBenefitTermAndDefinition, ...} where the
complexity would be walled off inside individual functions, but most of
them would be tiny and we'd wonder why we didn't just inline them: that
would be at best a premature optimization that we might ultimately want
to remove (e.g., if we find it bothersome to track down individual
functions and yearn for a flat and linear representation, preferring
linearity over the elimination of redundancy), so it seems better not to
do it in the first place.

>  The other problem is that this is still C++ code and I think it's too
> intimidating for anybody but C++ programmers to modify it. Again, I don't
> know if it's actually a problem or an advantage from your point of view,
> but from purely technical perspective, I think it would make sense to pull
> all the contents into an external file, which could then be edited
> independently.

Yes, especially because (as you note below) C++ code must be compiled.

Looking over the example (in which I suppose you've omitted the simplest
cases whose strings are invariant):

> <p align="center">
> Column Headings and Key Terms Used in This Illustration
> </p>
> <font size="-1">
> <p>
> <b>{{AvName}} Value:</b>

Here is an example of inherent complexity. Most terms are constant strings,
but this (and {{CsvName}}) vary by product.

> The accumulation at interest of the net premiums paid,
> {{#SinglePremium}}

Probably some conditional is wanted here, but we'd want it to depend on
a different parameter: in the problem domain, "SinglePremium" is not
naturally equivalent to "PermitWithdrawals".

> less any withdrawals,
> {{/SinglePremium}}
> less any monthly charges deducted.
> </p>
> <p>
> <b>{{CsvName}} Value:</b>
> {{AvName}} Value less policy debt.
> {{#Has1035ExchCharge}}
> {{CashSurrValueFootnote}}
> {{/Has1035ExchCharge}}
> </p>

If 'CashSurrValueFootnote' is appropriately named, then it must be used
elsewhere as a footnote, so including it here is probably redundant.
We might want to establish a 'CsvDefn' variable that includes only the
crucial portion of that footnote. We might want to establish separate
variables for each term's name and definition, and push them all into
the '.policy' files. It's difficult to know until we begin to rewrite
this whole thing. Until now, we haven't tried to, because the quirky
verbosity of XSL is too great a barrier to refactoring, and even to
comprehension.

> {{^IsInforce}}
> <p>
> <b>Current Illustrated Crediting Rate:</b>
> {{CreditingRateFootnote}}
> </p>
> {{/IsInforce}}

Here's a different kind of complexity: this conditional governs not
the contents of the paragraph, but whether or not it's shown at all.
I guess all simple term-definition pairs are alike, but each complex
one is complex in its own way; and I think that's a further argument
against a design assumption that this page is essentially about
term-definition pairs of (usually) constant strings: its essence is
the unique complexity of each item--i.e., not the largely-uniform
list structure, but the multifarious exceptions.

> and the code itself would just read this file into a string and pass it to
> interpolate_html, before calling output_html() with it.
> 
>  There are several problems/disadvantages with doing this, however, here is
> a list of the most important ones I'm currently aware about and haven't
> solved yet:
> 
> - Some conditions above can't be expressed in Mustache: e.g.
>   "ModifiedSinglePremium || ModifiedSinglePremium0" one, so it would mean
>   either duplicating the contents guarded by this conditions or predefining
>   "ModifiedSinglePremiumOrModifiedSinglePremium0" variable in the code,
>   which is arguably better but is still not great. Alternatively, we could
>   extend Mustache syntax -- but this contradicts the point just above. Or
>   we could switch to Handlebars (these web 2.0 guys are so funny), which is
>   (almost exact) superset of Mustache, but supports "{{#if condition}}"
>   blocks. However for Handlebars there is no decent C++ library, so I'd
>   have to write one myself, which not only contradicts the point above, but
>   would take some non-trivial amount of time, too.

Right on cue, another variety of complexity enters the picture.

In this case, though, I question the semantics: when I see variables named thus:
  SomeVariable
  SomeVariable0
I reach for my eraser. Indeed, the corresponding definitions of
  set_modified_single_premium
  set_modified_single_premium0
are a coding horror. It is as though, in order to choose the best basketball
player from among this mailing list's subscribers, we wrote functions like:

  bool plays_basketball(Person p)
    {return "V" == p.first_name()[0] && "Z" == p.last_name()[0];}

  bool plays_basketball_0(Person p)
    {bool is_european = p.home_address.contains("รด");
     return "m" == p.first_name().last() && "t" == p.last_name()[3] && 
is_european;}

perhaps using the second one to select an appropriate anthem to play before
the game starts. Exhaustive black-box-only testing would find no problem
with this code given today's subscriber list, but we don't really want to
perpetuate its semantics, especially not at the cost of choosing a different
template system that might otherwise not be required.

The best way to extirpate such horrors is to push the information into the
product database (the "Person" database in this example. When we want to
query a database to ascertain athletic abilities, the natural approach is
to add appropriate database fields.

> - Some things that can be done by directly drawing on wxPdfDC can't be done
>   with wxHTML, the most trivial example is the blue (sorry, #002F6C) border
>   on the cover page. So I'd need to either extend wxHTML to support borders
>   like this or have some hack for drawing just this part of the page in
>   code. Of course, a more difficult problem is not this simple rectangle,
>   but drawing the lines in different tables which are also not directly
>   supported by wxHTML. And modifying wxHTML is not exactly simple, Vaclav
>   did do a good job with it, but this was 20 years ago and, well, quite a
>   few things have changed since then...

This issue in isolation wouldn't cause us to choose one approach over another.
It could be handled in post-processing. For instance, we could emit HTML like:

    Initial Columns     Other Columns
  -------------------   -------------
  Column 0   Column 1   Column 2
  --------   --------   --------

and then change strings of hyphens (or of some distinctive hyphen-like
character) to solid lines when rendering it to PDF.

> - The tables are, actually, the problem that I haven't yet managed to solve
>   satisfactorily with HTML templates. They're clearly easier to handle in
>   the code, there are multiple problems with generating them from HTML. The
>   first and obvious one is that basic Mustache just has no way of doing
>   this at all. If we used Handlebars, we could use its "helpers", which are
>   blocks like {{#my_table some args}} that invoke a custom function
>   "my_table", defined at the code level, with the given arguments. But
>   using Handlebars is problematic, as mentioned above. So finally I decided
>   to make this work with just {{my_table_some_args}} by defining the
>   appropriate variable but so far I didn't yet manage to do even this
>   (mostly due to lack of time though, not any fundamental problems).

That is a real problem. We can't decide which approach is better until we
know whether both can handle tables adequately.

> - The second problem with tables is even worse as I just don't know how
>   to solve it currently: it's about page breaks. In C++ code it's not
>   really nice neither, but is doable because I know the height of the page
>   and the height of each line of text, so it's just a question of tracking
>   the number of lines and I do this already in group_quote_pdf_generator_wx
>   code. But with external templates I am really not sure how to do it, I
>   think it could involve modifying wxHTML again.

When I was young and foolish I proposed this:
  http://trac.wxwidgets.org/ticket/5730
My motivation was to generate illustrations as HTML that could be displayed
and printed with wxHTML. I won't claim today that this patch was any good,
but I just thought I'd mention it.

Page numbering is a crucial regulatory requirement: the regulators' concern
is that illustrations often explain important details on later pages, but
some sales representatives might present only the first few pages. That's
why we must say e.g. "Page 3 of 9": the "of 9" lets customers know if they
didn't receive a complete package.

Here's an idea that probably doesn't help: write the HTML without page
breaks (as HTML is normally written), and then break it into pages and add
page numbers when rendering it to PDF. That would work well enough for
breaking up the tables themselves, but we need headers and footers on
each page.

A further complication is that we want spacing between consecutive sets
of five table lines, without breaking any five-line group across pages.

> - While the code is repetitive, it does allow easily defining helpers
>   making it less so, e.g. add_term_paragraph() function in the example
>   above. In HTML, everything would need to be written out by hand, which
>   could well end up being quite annoying too. Again, Handlebars helpers
>   could help with this, but for now I am still not convinced switching to
>   Handlebars is worth it.

External templates would seem to encourage sprawling and repetitive code,
but I'm not terribly concerned about that, at least not right now, because
it would have the considerable offsetting advantage of linearity. It would
be good to preserve the ability to condense it in the future, and maybe
"handlebars" would help there (or maybe we should just push more of the
variation into the product database); but that's a solution-domain idea,
and we don't necessarily know how to condense it in the problem domain.

> - This is not a technical problem, but I'd still like to mention it: it's
>   easy to imagine that you can write arbitrary HTML when editing the
>   template files because you could open them in the browser (which, on its
>   own, is an important _advantage_ of this approach, of course) and see
>   that it renders it correctly, but this is not at all the case as wxHTML
>   is limited to HTML3, i.e. 1990s state of things. Currently, with the
>   (very few) predefined HTML tags and attributes that we have, all HTML
>   generated by C++ code using these helpers is almost guaranteed to be
>   displayed correctly ("almost" because you could always use unsupported
>   attribute value, they're untyped).

Maybe I'm missing something here, because I don't see any problem at all.
wxHTML renders only HTML3, but AFAICT HTML3 fully meets our needs here.
If we happen to write invalid HTML3, running it through a validator will
find our mistakes for us.

OTOH, yes, being able to browse the HTML files is a big advantage.
Another is that we could add a command to preview an illustration as
HTML, in a wxHTML window: that could be a more-detailed optional
alternative to the present calculation summary, and would be faster
(I imagine) than any PDF preview.

>  As for the advantages of this approach are probably quite clear, but let
> me list them here just to be explicit:
> 
>  + Much better (although still not perfect because complex conditions still
>    require modifying C++ code in order to introduce ad hoc variables for
>    them) separation between the code and the contents.

I'm pretty sure that a lot of the variation should be pushed into
the product database, which multiplies this benefit.

>  + Ability to update the PDF without recompiling the program. This may not
>    seem like much, but it's very appreciable during development, being able
>    to tweak HTML slightly and just return the illustration generation is
>    orders of magnitude faster than updating the code, rebuilding,
>    relaunching the program, reloading the illustration and finally
>    generating it again.

That's an enormous advantage. Last week, I was fine-tuning some XSL
changes, and I was able to do so in real time while staying on the
phone with Kim and describing the effects to her, because regenerating
a PDF file after changing the XSL takes only four seconds--faster than
we can hope for gcc to compile and link today.

>  + Possibility to preview HTML in any browser. This is probably not that
>    useful with the already finished template files as Mustache stuff gets
>    in the way (although by using a (subset of the) standard template
>    syntax, we get the ability to run any of Mustache processor programs to
>    get rid of it if necessary), but I believe it could be quite handy when
>    drafting a new page for example: you'd start from just HTML, make sure
>    it looks like you intend in the browser and then just replace some parts
>    of it with variables and/or put section begin/end markers around other
>    parts.

Yes, that also sounds like a big advantage.

>  There are other possible/potential advantages too, but I think even just
> these ones are convincing enough to agree that it would be better to
> generate illustrations using external templates. The only question, but
> unfortunately not the least one, if I can manage to produce tables in this
> way. All the rest translates to external templates without any real
> problems (except for minor details like the blue frame on the cover page),
> but I'm still not sure what to do about the tables.

Tables...and pagination, too.

>  But to be honest, while writing these emails and thinking about all this
> again, I became personally convinced that we should use external templates
> in any case: for everything if possible, but if not, then I'd still like to
> use them for everything except the tables and handle the tables in some
> other way. This does mean that a lot of work already done (e.g. all the
> HTML generation helpers...) will have been wasted

Sunk costs are sunk. No tears must be shed over that.

> , but even nicely
> type-safe HTML generation code in C++ is still not as nice as just writing
> this HTML directly in a file.

We already have lots of C++ code that emits HTML, and it can never be
nearly as simple and clean as an actual HTML file.

>  Would you agree with switching to this approach?

I agree that it seems clearly better, provided that we can find good
solutions to the pagination and table issues; so isn't resolving those
issues the critical path that we must traverse before turning this strong
inclination into a final decision?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]