[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing, and parsimony [Was: [lmi] overview of C++ expression templa

From: Greg Chicares
Subject: Re: Parsing, and parsimony [Was: [lmi] overview of C++ expression template libraries]
Date: Tue, 30 May 2006 21:10:55 +0000
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

On 2005-9-6 14:15 UTC, Greg Chicares wrote:
> In 'input_sequence.?pp' we have a hand-coded recursive-descent parser.
> It's a lot like Stroustrup's example in TC++PL3, which in turn is a lot
> like the example in the Red Dragon Book. It's 1526 lines of code to
> maintain, or 3222 if you count '*seq*.?pp'.
> I gag when people ask for enhancements to this "little language"; but
> it does need enhancements

...such as:

0. Calendar year: Today, if comments are to be believed, we allow
  // GRAMMAR duration-scalar: integer
  // GRAMMAR duration-scalar: @ integer
  // GRAMMAR duration-scalar: # integer
  // GRAMMAR duration-scalar: duration-constant
  // TODO ?? calendar year not yet implemented
and it would be helpful to permit calendar years as duration-scalars. I'm
not sure how to do this best.

One line of thought would say that '2006' is unambiguous because we've
already used only, I think, [0, 100], representing plausible human ages; we
might extend that to, say, [0, 120], but there's an absolute limit imposed by
foreseeable medical technology that's certainly less than 1000, while calendar
years before 1000 AD are never of interest to us.

Another line of thought would hold that no ambiguity is good, even in a range
that is erroneous for all uses we foresee: our foresight is limited, and at
any rate error messages need to be unambiguous. Imagine saying:
  1000 is ambiguous but clearly an error. It might mean 1000 AD, but
  that was long ago. It might mean an interval of 1000 years, but no
  insurance contract lasts that long. Whatever you meant, try again.
It's far better to say things like:
  Duration 1000 exceeds maturity date. ["maturity date" is a technical term]
  Calendar year must not precede 1900.
And we already require '@' to signify ages, so something like 'y2006' would
follow that precedent; perhaps 'year 2006' would be more readable.

1. Percentages etc.: It would be nice to allow '0.05' to be entered as '5%',
and alternatively as '500bp' where 'bp' is read as 'basis points': common
financial jargon for hundredths of a percent. Floating literals with these
suffixes should also be allowed, e.g., '5.73%'. 'bp' is an atom: '5b' is
not permitted.

Changes like this, BTW, show the elegance of describing the "little language"
instead of writing a parser by hand: starting from C99, we just add
productions like
    floating-constant '%' [opt]
    floating-constant 'bp' [opt]
or so I hope.

Here an issue arises. Today, this "little language" is used for 'sequence'
fields, as distinguished from 'scalar' fields; every field is one or the
other by its nature. It would not be good to allow '5.73%' (which end users
find much more expressive) only in the one and not in the other. 'Scalar'
fields today use 'numeric_io*.?pp', which is a wrapper for snprintf() and
the strto* family of functions. Now it makes little sense to reimplement
strtod(), for instance, because that is truly difficult to get right; and
the same can be said of snprintf(). So I suspect that we're led to an
intermediate routine that handles both 'sequence' and 'scalar' fields; but
here I'm imagining that the parser delegates to 'numeric_io*.?pp', yet it
doesn't seem to, and I must admit that I spent several minutes looking at
'input_sequence.cpp' and can't guess how it handles numbers.

2. Geometric progressions: End users have asked for a way to express, say,
   1000, increasing by five percent per year, compounded, for ten years
producing the values
   1000, 1050, 1102.5, 1157.625, 1215.50625, 1215.50625
where presumably we'd perform no rounding. I'm not sure this problem should
be addressed within this "little language", which might easily become an
unmanageably big "little languange". It might be better for each 'sequence'
field to have its own 'multiplier' field, which would itself be a 'sequence'
field. But I'm not sure.

3. Iterative solves: This term, familiar to end users, refers to finding X
such that, for example, paying a premium of X in period [d0, d1], ceteris
paribus, results in a target value of B at duration db. Here, X may mean a
yearly premium to be paid, an annual loan, a death benefit, or certain other
things. Today, the parameters d0, d1, W, and dw are given on a separate
screen. It's an attractive idea to fold them into the "little language",
which could then come much closer to expressing all that needs to be known
about a particular 'sequence' field; yet that would seem to complicate the
language quite a bit.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]