[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Terse list of valuable projects

From: Greg Chicares
Subject: Re: [lmi] Terse list of valuable projects
Date: Thu, 18 Mar 2010 01:39:44 +0000
User-agent: Thunderbird (Windows/20090812)

On 2010-03-17 14:42Z, Vaclav Slavik wrote:
> On Tue, 2010-03-16 at 20:10 +0000, Greg Chicares wrote:
>> I suppose it would be convenient to use "XTbML", if it suits our needs
>> well enough, 
> IMO, if it's at least somehow standard or if data are readily available
> in that form (and especially if both are true, as here), then it would
> be greatly preferable to use that format instead of anything custom,
> even if the latter was better designed.

Okay. I'd like to make a couple of tangential comments about these
tables and about serializing vectors with xml.

Let's use the term "rate tables" to refer to anything stored in either
of the SOA formats, and "database" for what's stored in '.db4' files.
We always look up rate tables through rate-table index numbers in the
database. Here's an example from 'dbdict.cpp':

    int guar_coi_dims[n] = {1, 1, 3, 1, 1, 1, 1};
    // smoker, nonsmoker, unismoke
    double guar_coi_tables[] = {111, 109, 107};
            ,TDBValue(DB_GuarCOITable, n, guar_coi_dims, guar_coi_tables)

  1 1 3 1 1 1 1 reshape 111 109 107
if you speak APL]

Combine those indices with the particular rate-table file here
(in 'my_prod.cpp'):
    foo.GuarCOIFilename                = "qx_cso";
and (e.g.) nonsmokers get table 109 in that file.

The question almost asks itself: why don't we move each table's data
into the database? At first blush, it's an attractive idea, which
brings these advantages:
  - Transparency: end users wouldn't need to look up a file name in
    a '.pol' file and a table number in a '.db4' file, then use
    another program to examine the data a rate table contains. [If we
    store each table in a standalone xml file, however, then the '.pol'
    part vanishes: the xml files could be named, e.g., 'N.rates' where
    N is the table number...but we'd have to make the numbers unique....]
  - Logical grouping: as in the example above, rate tables usually come
    in sets that vary across such axes as gender, and often across several
    axes simultaneously. The SOA ought to have provided for that, but
    they didn't; yet our database can handle it naturally. This is
    significant because rate tables are generally used in such sets: users
    normally think only of which set is wanted, and consider variation
    across natural axes to be a distracting detail that's better hidden.
  - Consistency: storing all numeric arrays in xml in the same way would
    be simpler, but that's not a big enough advantage to overcome real
And, OTOH, that idea has serious disadvantages:
  - Published sources: instead of directly using published rate tables,
    we'd be using copies.
  - Error: when actuaries copy a rate table, the copy is rarely identical
    to the original. Sad, but true.
  - Size: a rate table often contains about 100 x 100 floating-point
    numbers, but many rate tables (especially the standard ones published
    by the SOA) are shared by many products in many companies. (You can
    think of "product" as what is designated by the filename without the
    extension--i.e., "xyz" in {'xyz.db4', 'xyz.pol', etc.}.) Duplicating
    rate tables takes more space than pointing to single instances. This
    isn't a large concern per se, but it multiplies "Error" above.
  - Comprehensibility: the inverse of "Logical grouping" above. Generally,
    end users want to know which logical group is used--for instance,
    rate tables {107, 109, 111} in the code snippet above are all part of
    a "1980 CSO 80% Male" group, and that's the most important thing.
    They would sometimes, but less often, want to see the data contained
    in a rate table. Therefore, to show only that lowest-level data would
    be a step in the wrong direction, because it would eliminate the most
    important information.
So I think it's better to store tables in the format used for publishing
them, even though we use our own different format elsewhere.

On a different subject...the "XTbML" format represents vectors thus:
  <Y t="0">0.02182</Y>
  <Y t="100">1.00000</Y>
OTOH, your 'xml_serialize.hpp' format looks like this:
which seems to be the most usual way of doing it--for example:

    'propC': (3, 4, 5),
    <item key="propC">

What I'm wondering is why no one seems to serialize this way instead:

    <item key="propC">
      <data>3 4 5</data>

With so few data, it doesn't matter much; but in the following example
(using the output of lmi's 'print_matrix_test' unit test, which prints
a 10 X 1 X 1 X 2 X 5 matrix) the xml is easier for humans to read, edit,
and validate without those intrusive <item> tags:

        0.00418 0.00107 0.00099 0.00098 0.00095
        0.0009  0.00086 0.0008  0.00076 0.00074

        0.00073 0.00077 0.00085 0.00099 0.00115
        0.00133 0.00151 0.00167 0.00178 0.00186

        0.0019  0.00191 0.00189 0.00186 0.00182
        0.00177 0.00173 0.00171 0.0017  0.00171

        0.00173 0.00178 0.00183 0.00191 0.002
        0.00211 0.00224 0.0024  0.00258 0.00279

        0.00302 0.00329 0.00356 0.00387 0.00419
        0.00455 0.00492 0.00532 0.00574 0.00621

        0.00671 0.0073  0.00796 0.00871 0.00956
        0.01047 0.01146 0.01249 0.01359 0.01477

        0.01608 0.01754 0.01919 0.02106 0.02314
        0.02542 0.02785 0.03044 0.03319 0.03617

        0.03951 0.0433  0.04765 0.05264 0.05819
        0.06419 0.07053 0.07712 0.0839  0.09105

        0.09884 0.10748 0.11725 0.12826 0.14025
        0.15295 0.16609 0.17955 0.19327 0.20729

        0.22177 0.23698 0.25345 0.27211 0.2959
        0.32996 0.38455 0.4802  0.65798 1

reply via email to

[Prev in Thread] Current Thread [Next in Thread]