I'm aware that there are several database gurus reading this group,
and I'd appreciate any comments or suggestions you (or anyone else)
may have regarding the following.


Preliminary design for a PostgreSQL report generator.
=====================================================


There exist very few Free (libre) report generators for GNU and UNIX
systems.  Those that do exist are generally inadequate for my purposes
in several ways:

* Graphical report generators don't run in a "batch mode" for
  automation by cron and other processes.
* Most define their own input language to run queries, and provide
  other programmatic constructs (looping, conditionals etc.).  This
  means learning Yet Another Language, which is usually crippled.
* Most incorporate their own markup language into the input language,
  e.g. for headings, tables, bold and italic, etc.  They are always
  inferior to "full" markup/formatting languages e.g. troff, LaTeX
  etc.
* Most arbitrarily constrain the output format, usually to PostScript
or PDF.

In summary, these tools are not "UNIX-like" in that they do not just
do one job very well; they do several jobs very inflexibly.  The total
number of Free report generators meeting the following criteria number
zero.

I've made a preliminary design of a new report generator.  This is
unlike existing report generators in that it:

* Defines only a very minimal language, which is implemented in terms
  of GNU m4, and is hence extensible through m4 macros.  The input
  (markup) language and reporting language are logically separate.
* No new language is created: all of m4, including looping and
  conditionals may be utilised by the user either directly or in their
  own macros.  They are used internally by the provided macros.
* The macros may be freely used in the input file, which may be in any
  format desired (plain text, *roff, LaTeX, DocBook etc.).
* A post-processor is run on the final output, to process e.g. with
  groff, TeX, mail, xmlto etc.  This makes the program amenable to
  trivial automation, and any output format is possible.


The initial design is as follows:

Get parameters:

RP_PARAM(NAME, TYPE, DEFAULT, CONSTRAINT)
RP_PARAM_UNQUOTED(NAME, TYPE, DEFAULT, CONSTRAINT)

NAME is the name of the parameter, which is defined via m4_pushdef as
an m4 macro.  TYPE is the PostgreSQL data type expected.  DEFAULT is
the default value (which must match the given TYPE).  CONSTRAINT is a
constraint on the allowed values, used for input validation; it must
be a valid SQL SELECT statement.  SELECT may be used to return a set
of allowed values, but can it be used to validate ranges?  For
example, "SELECT 2 AS int WHERE (int > 4)" is not valid.

By default, the value will be quoted with single quotes (') unless
RP_PARAM_UNQUOTED is used.

These parameters can be stripped out of the input file using m4_trace,
and presented in a user interface or provided via command-line
options.  They are used to customise the report (e.g. print all sales
between DATE1 and DATE2).


Substitute parameters:

Use the NAME above, which will be expanded appropriately by m4.


Run a query:

RP_SELECT(QUERY, TEXT)

Runs an SQL SELECT query (minus the initial "SELECT").  For each row
in the result set, the fields of the row will be defined via
m4_pushdef and TEXT expanded.  TEXT may contain more RP_SELECT
queries, to enable recursion.  For example, a top-level query could
pull out the names of customers, and then the nested query (run once
for each row in the top-level result set) could select all of the
customer orders over the last year.  The macros substituted for each
row go out of scope (via m4_popdef) at the end of TEXT, and may be
overridden by nested queries.

[Nested queries will be run in a separate m4 pass, but there is a need
to preserve predefined fields from previous higher-level queries.]
This is possible to be automated entirely with m4 (quote a quoted
definition of the same macro in the expansion text).  i.e. wrap
m4_define/m4_popdef:
  RP_DEFINE
  RP_PUSHDEF
  RP_POPDEF

RP_EXEC(QUERY)

Execute an SQL statement.  No result set will be returned; instead
RP_AFFECTED_ROWS will be defined via pushdef.  This macro is not
available in "safe mode".  This could be used to log how many times a
report was printed, and by whom, for example.

After each m4 run, the report generator will use m4_trace to determine
if there are any more queries left to run.  If not, it will run a
final m4 pass, and then invoke the post-processor.  The final pass
will contain additional/alternate macros to expand previously
untouched macros, and alter existing ones.

Output:

RP_POSTPROCESS(COMMAND)

Pipe all output to this point to the command (or pipeline of commands)
specified by command.  This might be to run troff with its
preprocessors or LaTeX, or mail to send an email.  This is not run
directly by m4 (though it might be a possibility).  It expands to a
magic token ("_rp_postprocess_command") which is read by the report
generator itself after the final m4 pass, and run via popen().

The post-processor should be overridable as a command-line option.


The program itself will be written in C++ using libpqxx, and so will
use all of the libpq environment variables itself.  It will invoke m4
with stdin and stdout redirected to pipes from and to itself
respectively, or possibly as temporary files if required.


TODO:

Need to handle temporary files.  This should be possible with
m4_maketemp, but cleanup should be automatic (use m4_wrap?).  Some
files may be needed for several m4 passes, however...

RP_TEMPFILE(NAME, PREFIX)
RP_TEMPFILE_FINAL(NAME, PREFIX)

NAME is a macro which will contain the temporary file name.  PREFIX is
a name to pass to m4_maketemp (without the XXXXXX required by
m4_maketemp).  RP_TEMPFILE_FINAL will defer deletion until completion
of the final pass, otherwise it will be deleted at the end of the
current pass.

Need to handle processing of specific subsets of data with an external
program.  For example, sending data to gnuplot, then including the
generated plot in the report.  m4 does not appear to allow this
directly (diversion of output to a temporary file).  This might need
to be done manually by the report generator prior to the final pass
(if the data can be extracted with m4_trace), and the final pass can
provide a null macro to forget the used input.  Would this be possible
with diversions and m4_syscmd?  (The input could exceed the max size
of an argument if I use m4_syscmd([echo 'm4_undivert(n)' > tmpfile]),
and there are also quoting issues.

Is m4 the most appropriate language?  I still need to do manual
processing for database queries etc., but m4 will provide the actual
interface, and will allow trivial programming and extensibility.
Ideally, and "all-in-one" solution would be preferable, but I need to:
1) Have a very simple interface so that non-programmers may use it.
2) Be reasonably extensible, but I'm not bothered about providing a
   "full" programming language--if you want that, you may as well use
   Perl or Python directly.
3) Cope with nested queries and expanding the input text
   appropriately.
4) Capable of doing more complex reporting e.g. using plot, gnuplot, R
   or other external programs to generate input e.g. figures, while
   still having a simple interface.
Any ideas?


Many thanks,
Roger