I'm aware that there are several database gurus reading this group, and I'd appreciate any comments or suggestions you (or anyone else) may have regarding the following. Preliminary design for a PostgreSQL report generator. ===================================================== There exist very few Free (libre) report generators for GNU and UNIX systems. Those that do exist are generally inadequate for my purposes in several ways: * Graphical report generators don't run in a "batch mode" for automation by cron and other processes. * Most define their own input language to run queries, and provide other programmatic constructs (looping, conditionals etc.). This means learning Yet Another Language, which is usually crippled. * Most incorporate their own markup language into the input language, e.g. for headings, tables, bold and italic, etc. They are always inferior to "full" markup/formatting languages e.g. troff, LaTeX etc. * Most arbitrarily constrain the output format, usually to PostScript or PDF. In summary, these tools are not "UNIX-like" in that they do not just do one job very well; they do several jobs very inflexibly. The total number of Free report generators meeting the following criteria number zero. I've made a preliminary design of a new report generator. This is unlike existing report generators in that it: * Defines only a very minimal language, which is implemented in terms of GNU m4, and is hence extensible through m4 macros. The input (markup) language and reporting language are logically separate. * No new language is created: all of m4, including looping and conditionals may be utilised by the user either directly or in their own macros. They are used internally by the provided macros. * The macros may be freely used in the input file, which may be in any format desired (plain text, *roff, LaTeX, DocBook etc.). * A post-processor is run on the final output, to process e.g. with groff, TeX, mail, xmlto etc. This makes the program amenable to trivial automation, and any output format is possible. The initial design is as follows: Get parameters: RP_PARAM(NAME, TYPE, DEFAULT, CONSTRAINT) RP_PARAM_UNQUOTED(NAME, TYPE, DEFAULT, CONSTRAINT) NAME is the name of the parameter, which is defined via m4_pushdef as an m4 macro. TYPE is the PostgreSQL data type expected. DEFAULT is the default value (which must match the given TYPE). CONSTRAINT is a constraint on the allowed values, used for input validation; it must be a valid SQL SELECT statement. SELECT may be used to return a set of allowed values, but can it be used to validate ranges? For example, "SELECT 2 AS int WHERE (int > 4)" is not valid. By default, the value will be quoted with single quotes (') unless RP_PARAM_UNQUOTED is used. These parameters can be stripped out of the input file using m4_trace, and presented in a user interface or provided via command-line options. They are used to customise the report (e.g. print all sales between DATE1 and DATE2). Substitute parameters: Use the NAME above, which will be expanded appropriately by m4. Run a query: RP_SELECT(QUERY, TEXT) Runs an SQL SELECT query (minus the initial "SELECT"). For each row in the result set, the fields of the row will be defined via m4_pushdef and TEXT expanded. TEXT may contain more RP_SELECT queries, to enable recursion. For example, a top-level query could pull out the names of customers, and then the nested query (run once for each row in the top-level result set) could select all of the customer orders over the last year. The macros substituted for each row go out of scope (via m4_popdef) at the end of TEXT, and may be overridden by nested queries. [Nested queries will be run in a separate m4 pass, but there is a need to preserve predefined fields from previous higher-level queries.] This is possible to be automated entirely with m4 (quote a quoted definition of the same macro in the expansion text). i.e. wrap m4_define/m4_popdef: RP_DEFINE RP_PUSHDEF RP_POPDEF RP_EXEC(QUERY) Execute an SQL statement. No result set will be returned; instead RP_AFFECTED_ROWS will be defined via pushdef. This macro is not available in "safe mode". This could be used to log how many times a report was printed, and by whom, for example. After each m4 run, the report generator will use m4_trace to determine if there are any more queries left to run. If not, it will run a final m4 pass, and then invoke the post-processor. The final pass will contain additional/alternate macros to expand previously untouched macros, and alter existing ones. Output: RP_POSTPROCESS(COMMAND) Pipe all output to this point to the command (or pipeline of commands) specified by command. This might be to run troff with its preprocessors or LaTeX, or mail to send an email. This is not run directly by m4 (though it might be a possibility). It expands to a magic token ("_rp_postprocess_command") which is read by the report generator itself after the final m4 pass, and run via popen(). The post-processor should be overridable as a command-line option. The program itself will be written in C++ using libpqxx, and so will use all of the libpq environment variables itself. It will invoke m4 with stdin and stdout redirected to pipes from and to itself respectively, or possibly as temporary files if required. TODO: Need to handle temporary files. This should be possible with m4_maketemp, but cleanup should be automatic (use m4_wrap?). Some files may be needed for several m4 passes, however... RP_TEMPFILE(NAME, PREFIX) RP_TEMPFILE_FINAL(NAME, PREFIX) NAME is a macro which will contain the temporary file name. PREFIX is a name to pass to m4_maketemp (without the XXXXXX required by m4_maketemp). RP_TEMPFILE_FINAL will defer deletion until completion of the final pass, otherwise it will be deleted at the end of the current pass. Need to handle processing of specific subsets of data with an external program. For example, sending data to gnuplot, then including the generated plot in the report. m4 does not appear to allow this directly (diversion of output to a temporary file). This might need to be done manually by the report generator prior to the final pass (if the data can be extracted with m4_trace), and the final pass can provide a null macro to forget the used input. Would this be possible with diversions and m4_syscmd? (The input could exceed the max size of an argument if I use m4_syscmd([echo 'm4_undivert(n)' > tmpfile]), and there are also quoting issues. Is m4 the most appropriate language? I still need to do manual processing for database queries etc., but m4 will provide the actual interface, and will allow trivial programming and extensibility. Ideally, and "all-in-one" solution would be preferable, but I need to: 1) Have a very simple interface so that non-programmers may use it. 2) Be reasonably extensible, but I'm not bothered about providing a "full" programming language--if you want that, you may as well use Perl or Python directly. 3) Cope with nested queries and expanding the input text appropriately. 4) Capable of doing more complex reporting e.g. using plot, gnuplot, R or other external programs to generate input e.g. figures, while still having a simple interface. Any ideas? Many thanks, Roger