lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Boost serialization library


From: Greg Chicares
Subject: [lmi] Boost serialization library
Date: Thu, 27 Nov 2008 00:44:52 +0000
User-agent: Thunderbird 2.0.0.17 (Windows/20080914)

Recently, Vaclav mentioned the idea of using the boost serialization
library instead of 'ihs*pios.?pp', and I wanted to share my thoughts.

The old 'pstream' serialization library is available here:
  http://www.ibiblio.org/pub/Linux/devel/lang/c++/pstream-0.0.3.tar.gz
There was no really strong reason to use serialization for lmi's '.db4'
files in the first place. The legacy application from which lmi descends
happened to use a long-forgotten class library that built serialization
into its mother-of-all-objects class, so it was convenient to use it for
this purpose. When I replaced that classlib, I chanced upon this GPL
library that was easily made binary-compatible with the old classlib.

The 'pstream' library is more than ten years old, and unmaintained.
There's no reason to keep it, save that dread which

    ...puzzles the will
    And makes us rather bear those ills we have
    Than fly to others that we know not of

or, in a latter-day poet's epistemological reflection:

    There are known knowns.
    There are things we know that we know.

    There are known unknowns.
    That is to say,
    there are things that we now know we don't know.

    But there are also unknown unknowns.
    There are things we do not know we don't know.

I'm not familiar with the boost serialization library: to me, it's an
unknown unknown. Is it a standalone library, or does it require separate
compilation? Does it work with comeau C++? Is it fast? Does it preserve
the full precision of floating-point numbers? But those are only known
unknowns. Most boost work is quite good, but the quality isn't uniform,
and using another boost library introduces yet another dependency.

There's a part of me that says it might be better to build it ourselves:

http://www.joelonsoftware.com/articles/fog0000000007.html
| 'Find the dependencies -- and eliminate them.'
| If it's a core business function -- do it yourself, no matter what.

OTOH, I look in our inventory and find 'streamable.?pp', which is rather
sketchy. It underlies input ('input.hpp') and output ('ledger.hpp'); and
the database class ('ihs_dbdict.hpp') has similar needs. Can all three
use one library for file storage? That'd be a big win.

Perhaps the boost serialization library will turn out to be the best
solution for all those needs, as well as for 'configurable_settings.?pp'
and the other product files:
  *.fnd *.pol *.rnd *.tir
In the legacy application, BTW, the old 'pstream' library was used for
storing input. That was a bad choice, notably because the files weren't
human readable, and also because they weren't platform independent. But
the boost library can save and load xml, so perhaps it'll be ideal, and
it definitely seems like a good idea to experiment with it.

We don't need all the fancier persistency features, e.g., deep pointer
save and restore. Here are the attributes that seem most desirable:

 - The format should be xml. Often we need to compare one file with
   another, and there are useful comparison tools for flat files.
     http://www.nongnu.org/lmi/index.html
   | [database] files can be printed as flat text. [...]
   | if you want to see how two similar products differ, you can use a
   | side-by-side file-comparison tool to compare the text output.

 - The particular xml format doesn't matter very much on principle. We
   already have a function print_databases() to transform '.db4' files
   to flat text for study and comparison, and it can't be hard to adapt
   it to use xml input if necessary.

 - The particular xml format may matter for the practical purpose of
   compatibility with existing files, especially if we can use the same
   library for input. However, one xml format can be translated into
   another, so that shouldn't be a serious obstacle.

 - The particular xml format may matter for interoperability with
   other systems. To me today, it's an "unknown unknown", so I can't
   gauge the importance of this point. As one example, the data that
   class actuarial_table reads are available as xml. Perhaps we could
   transform that xml into something the boost library could read. It
   would also be useful to be able to write such files.

 - "Versioning" is crucial, and it's good to see that the boost library
   addresses it. Here
     http://lists.nongnu.org/archive/html/lmi/2008-11/msg00011.html
   is a problem that versioning could solve. This command
     grep detritus *.?pp
   indicates a technique we've used quite successfully for ensuring that
   input files are backward compatible.

 - Robustness. Of course. Users don't want to lose their data. I assume
   that boost has taken care of this.

 - Run-time speed.

 - Compactness. An excessively verbose format could be difficult for
   humans to read, and might affect speed adversely.

I recommend we proceed with some experiment to ascertain whether this
boost library is suitable for our grand purpose of reading and writing
xml files in a consistent way. If not, then we should follow Spolsky's
"do it yourself" advice for the long term. It might still make sense in
the short term to replace the 'pstream' thing with the boost library if
that saves some cross-platform grief and doesn't hurt speed too much.

Sorry if this seems noncommittal, but the last time I endorsed a novel
approach without exploring its "unknown unknowns" was here:
  http://lists.nongnu.org/archive/html/lmi/2008-11/msg00003.html
and I'm eager not to repeat that costly mistake.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]