lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Stream cast problems with trailing spaces and more


From: Greg Chicares
Subject: Re: [lmi] Stream cast problems with trailing spaces and more
Date: Wed, 29 Jan 2020 02:15:53 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0

[Although I replied just hours ago to a more recent message on the same
topic, this older message contains some intriguing detail that deserves
to be addressed however belatedly.]

On 2018-11-16 01:00, Vadim Zeitlin wrote:
> 
>  I've been looking at stream_cast() code because it started throwing
> "Trailing whitespace remains" exception with the latest MSVS version
> (15.9.0) when initializing Input::ListBillDate field from its JDN given in
> the string form in this class ctor, which means that no illustrations can
> be created when using lmi built this compiler, which is, as you can
> imagine, somewhat problematic. Leaving aside the question of why does this> 
> member need to be initialized from a string in the first place,

Initialization from string for uniformity:

Input::Input()
    :IssueAge                         {"45"}
    ,RetirementAge                    {"65"}
    ,Gender                           {"Male"}
...
    ,ListBillDate                     {"2440588"} // Assume no inforce so old

> the sudden
> appearance of this problem is due to a bug fix in the latest version of the
> MSVS standard library implementation which has started to (correctly, as it
> turned out) set failbit when calling std::ws() if the stream was already at
> EOF. Libstdc++ doesn't do it yet, however this was accepted as a bug (see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88034) and so will hopefully
> change in the future. And, FWIW, clang libc++ behaves correctly, i.e. as
> latest MSVS.

Fascinating. I thought ms used the dinkumware standard library, which
purports to be so rigorously tested that it shouldn't have had such an
error, even decades ago.

>  Beyond the problem with MSVS (and clang with libc++ if we ever need to use
> it, which is not currently the case as I only use clang with libstdc++ so
> far),

Probably we should treat clang as favorably as any other free toolchain.

> I find the code in stream_cast.hpp very suspicious. If you look at
> the part inside "#if !defined LMI_COMO_WITH_MINGW" guard,

[old comeau-specific code has been removed]

> it checks for
> "!(interpreter >> std::ws)", i.e. if failbit is set on the stream after
> calling ws() on it. According to the standard[*], this should only happen
> if the stream is already at EOF. But this should be the case only in the
> "successful" case, i.e. when all the input was consumed by "interpreter >>
> result" above, while the code handles this as an error for some reason.

That's where I introduced the defect. The classic one-liner, given in
'stream_cast_test.cpp' as template function streamlined(), is:

    if
        (  !(interpreter << from)
        || !(interpreter >> result)
        || !(interpreter >> std::ws).eof()
        )
        throw ...

There are at least three conditions there that might cause an
exception to be thrown. I successfully distinguished those on the
'<< from' and '>> result' lines. But I mishandled the last physical
line. I took it to mean

  extract std::ws, which might or might not succeed, much as the
    two lines preceding it might (so it seemed like a good idea
    to report any failure); and then
  invoke eof(), which similarly might or might not succeed

but the real meaning is

  extract std::ws to ignore trailing whitespace (which is to be
    treated as irrelevant), paying no attention to whether or not
    any such whitespace was so ignored; and then
  verify that we're at EOF

>  So, AFAICS, this test is simply reversed. It so happens that, due to
> libstdc++ bug above, this doesn't trigger any errors when converting valid
> inputs currently, as libstdc++ (and previous versions of MSVS, like the one
> I still use) never sets failbit anyhow. However it definitely doesn't give
> the expected error if there is any trailing whitespace in the input
> neither.

That's the root cause of the defect: I coded various exceptions
but failed to test whether they could be thrown. Now I've gone
back and tried to construct tests for all of them, but I couldn't
find a way to elicit the "trailing whitespace" diagnostic,
apparently because, as you point out, it cannot be elicited:

> Instead, it only gives the "Unconverted data remains" error when
> non-space (see below) trailing whitespace is present. This can be easily
> seen by applying this patch:

Thanks.

>  Note that the matters are further confused by the use of the special
> blank_is_not_whitespace_locale() for the interpreter stream.

BTW, that locale is used so that natural-language strings like:

#define solve_type_NAMES \
    {"No solve" \
    ,"Specified amount" \
    ,"Employee premium" \

can be extracted as a unit, regardless of embedded '0x020' ASCII spaces.
(No such strings embed any more exotic types of whitespace.)

> So everything
> I wrote above was correct for non-space white space, but real trailing
> spaces, wouldn't be consumed by ws() because they're explicitly not
> considered to be whitespace here. So if "2454687 " is used above instead of
> "2454687\n", the test still fails, but it does it with a different message:
> 
>     ???? test failed: Caught exception
>         'Unconverted data remains converting '2454687 ' from type 'char 
> const*' to type 'calendar_date'.'
>       when
>         '^Trailing whitespace remains .+'
>       was expected.
>     [file stream_cast_test.cpp, line 107]
> 
> because std::ws doesn't consume anything in this case.

Indeed.

>  I think the above convincingly shows that there is something wrong with
> this code. But before suggesting how to fix it, I'd like to ascertain the
> intention of this function. It seems to want to handle trailing spaces as
> an error when they're not significant, while at the same time preserving
> spaces, presumably including trailing ones, when converting between
> strings. Is this correct?

No. There was no such deep thought behind it. I just took a standard
idiom (streamlined() above) and broke its one-line implementation
into pieces so that I could distinguish errors arising for each piece;
but I did that incorrectly.

And specializations like

template<>
inline std::string stream_cast<std::string> (std::string from ,std::string)
{
    return from;
}

are only for efficiency: they aren't intended to vary the behavior
based on argument types.

>  If so, I'd suggest splitting the string and non-string implementations, as
> they clearly [try to] do incompatible things. In fact, it looks like the
> primary template is already not used for strings due to the existence of a
> specialization for std::string at the end of stream_cast.hpp header. So I
> think that the imbue() call in the primary template should be simply
> removed (at the very least, removing it doesn't break the existing tests,
> which is a good sign).

It didn't break 'stream_cast_test', but it does after some commit that
I'll push soon.

However, suppressing the imbue() line would always have broken 'input_test':

???? uncaught exception: std::runtime_error: Unconverted data remains 
converting 'Allow MEC' from type 'mc_enum<mcenum_mec_avoid_method>' to type 
'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
>'.

> As an aside, it seems very suspicious to want to
> preserve space but not, for example, tabs, to me anyhow. The rationale
> explaining that "spaces are more common" given in the comment above this
> function doesn't convince me at all. On the contrary, IMO if a bug (and
> losing the part of the string after a tab would be a bug) happens more
> rarely, it's worse, and not better, than if it happened more frequently.

As-yet uncommitted correction:

 /// Blank is the only whitespace character not treated as whitespace,
-/// because blanks are more common than other whitespace characters in
-/// std::strings.
+/// because blanks are deliberately used in strings like "Allow MEC"
+/// that are mapped to enumerators in 'mc_enum_types.xpp', where other
+/// whitespace characters would not be used.

>  Also, is detecting trailing whitespace separately from arbitrary trailing
> characters really necessary?

I think so: "Allow MEC" is good, and I suppose "Allow MEC\n" is good if
we're reading a flat-text file line by line, but "Allow MEC " is wrong.

> Personally I don't think so, so I would just
> remove the "interpreter >> std::ws" clause completely and content myself
> with the "Unconverted data remains" error.

Please take a look at how I changed that (once I push the commit).
"interpreter >> std::ws" is a valid part of the customary idiom;
the error was testing stream state with operator!() after it.

There's no need to distinguish these erroneous cases:
  "Allow MEC   "
  "Allow MEC sometimes"

>  Finally, while I'm looking at this code, I can't help wondering why do we
> have a dummy "To" parameter in this function template? Is it some
> workaround for ancient and not supported any more compiler(s)? Because it
> doesn't seem to be needed any longer.

Isn't it needed because the template-parameter list and the argument list
have different orders?

  template<typename To, typename From>
  To stream_cast(From from, To = To())

With
-  To stream_cast(From from, To = To())
+  To stream_cast(From from)
I get:

i686-w64-mingw32-g++ ... ... /opt/lmi/src/lmi/datum_string.cpp -odatum_string.o
In file included from /opt/lmi/src/lmi/value_cast.hpp:29,
                 from /opt/lmi/src/lmi/datum_string.hpp:29,
                 from /opt/lmi/src/lmi/datum_string.cpp:24:
/opt/lmi/src/lmi/stream_cast.hpp:153:20: error: template-id 
‘stream_cast<std::__cxx11::string>’ for
 ‘std::__cxx11::string stream_cast(char*, std::__cxx11::string)’ does not match 
any template declaration
 inline std::string stream_cast<std::string>(char* from, std::string)
                    ^~~~~~~~~~~~~~~~~~~~~~~~
/opt/lmi/src/lmi/stream_cast.hpp:99:4: note: candidate is: ‘template<class To, 
class From> To stream_cast(From)’
 To stream_cast(From from)
    ^~~~~~~~~~~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]