[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #52594] textscan continues past EOL

From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #52594] textscan continues past EOL
Date: Wed, 6 Dec 2017 10:54:26 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0

Follow-up Comment #4, bug #52594 (project octave):

OK, a number of things there then.

The %d translated to NAN is a consequence of textscan::scan_one() using
textscan::read_double() for integer formats (read_double is sort of a custom
scanf for the buffered stream):

    double v;    // Matlab docs say 1e30 etc should be valid for %d and
                 // 1000 as a %d8 should be 127, so read as double.
                 // Some loss of precision for d64 and u64.

I suppose the easiest thing is after the double read to check if the value is
+/-NaN or +/-Inf.  On the other hand, my suspicion is even that wouldn't be
compatible with Matlab in odd cases like trying to read, say, 1.23e-04 with
%d%s which in Matlab might produce 1 and ".23e-04".

I'm wondering if we could make better use of C++/C formatted read routines
somehow rather than replicating C formatted I/O routines.  For example, the
fscanf() routine is a variable argument routine:


But it is presented in terms of the way it might be used at compilation time. 
However, I think that from a programming perspective there is a way to view
that as simply a list of arguments.

On the other hand, the thing that this textscan routine adds is a delimiter. 
A custom delimiter wouldn't work so well in the approach I described in the
previous paragraph.  So, maybe rather than handle the delimiter as part of the
delimited stream it would be better to first slice and dice the whole line,

1) Search for the next EOL and extract that string
2) Break that line up according to delimiters
3) Loop through each individual hunk scanning with sscanf() while
interpretting d, u, s, etc.

The disadvantage of this approach is that the breaking up according to
delimiters is somewhat dynamic, i.e., a list of strings.  But perhaps if that
is done in a smart way the dynamic growth won't be much after the first one or
two lines (i.e., the user will have utilized pretty much the full expected
number of variables and max data width in the first few lines of data).


Reply to this item at:


  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]