[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: goals for rewritten output subsystem

From: Ed
Subject: Re: RFC: goals for rewritten output subsystem
Date: Fri, 30 May 2008 20:00:42 +0100

2008/5/30 Ben Pfaff <address@hidden>:
> 2. Problem: PSPP output is not easily machine-readable in a
>   semantically meaningful way.  That is, data produced as part
>   of the output is difficult to extract for use by other
>   software or by subsequent PSPP procedures.  Another aspect of
>   the same issue is that PSPP tests that compare output end up
>   compare cosmetic details of the formatting, not just the data
>   produced.
>   Goal: The new output subsystem should be able to produce
>   machine-readable, semantically meaningful data output in at
>   least one widely understood format, such as CSV or an XML
>   schema.  Then tests can compare this output format.

There seems to me to be a disconnect here between the problem and the
goal as stated. The problem that "output is difficult to extract for
use by ... subsequent PSPP procedures" is not addressed specifically
(unless the argument is that parsing XML or CSV back into memory is
the only solution for this - various in-memory options seem naively
viable too, and to offer much better performance, and are perhaps less
cumbersome for the user..)

> 4. Problem: Tables larger than memory cannot be efficiently
>   formatted.  (This is why the LIST procedure more or less
>   sidesteps the output subsystem, without producing real tables
>   in its output.)
>   Goal: Efficiently support tables larger than memory.

I haven't been reading the codebase for very long, but I'm confused
about the way that larger than memory dataset support is meant to
work. In the stats area at least, I don't see the (highly complex)
infrastructure to support these use cases... Is it intended that
_everything_ work with larger than memory tables, or merely some
things? This again will impact output design (since the viability
keeping some structures in-memory depends on whether this is
one-size-fits-all or not).

Otherwise, this seems like a good framing of the goals of an output subsystem.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]