octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

xtextscan [WAS: Re: strread.m]


From: Philip Nienhuis
Subject: xtextscan [WAS: Re: strread.m]
Date: Thu, 04 Aug 2011 23:38:40 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

John W. Eaton wrote:
On  3-Aug-2011, Philip Nienhuis wrote:

|>  I will probably try to write textscan in C++.  It's up to you whether
|>  you want to continue fixing problems in strread, but given the
|
| Do you have a time schedule in mind?
| That would help me make a better decision of what to do.

I started working on it yesterday.  So far I've only implemented the

Magnificent.

Are you planning to get it finished before Octave 3.4.3?

Just today I prepared a fix for bug #33876 along the lines I sketched yesterday... never mind.


part that decodes the format.  I'll try for at least some of the
conversions today.  Then I may need help in figuring out how to
properly return the variables that are read from the file.  Then we
will also need to handle the parameter/value options.

Whitespace and delimiter processing was a bit of sorting out.

There are also some "implicit" options, like presence of a trailing "\n" in the input stream.

Once you get the format string properly parsed I suppose it is fairly straightforward to match it to the input stream.
But just FYI, here is some ML r2007a behavior that I find peculiar:

Assume an input string '54321a'.
Applying a format string like '%f321a' it turns out that Matlab prefers to interpret it as '%f32', ignores the digits in the literal and also the trailing "a", yielding 54321 (class single).
If you do
  c = textscan ('54321a', '%f321a', 'returnonerror' 0)
it emerges that ML first parses the number as far as it can, rather than first analyzing the trailing literal to see where the numeric field is supposed to end. To read the field as a double you'd need '%f 321a' (yielding 54321), or if you'd rather expect 54, use '%2f64321a'.
Another one:
  c = textscan ('54321a', '%2f64') gives {54; 32; 1}
(Given field width is ignored for the last number which is reported as OK. "'returnonerror', 0" shows that ML complains about row 4, the "a")

I find this behavior (a.o., mixing up a literal if it starts with digits, and lax interpretation of user-specified field width) a bit inconsistent from a user point of view - of course from a programmers POV it may just be obvious although I don't see it.

These examples do show that setting the returnonerror parameter to false is vital for understanding what ML does.

The point here:
I assume (that is, I hope) you have a clearer view of this than me, but IMO we should be wary of striving for ML compatibility so much that we wander into various degrees of bug-for-bug compatibility.
Or should we call it "surprise-for surprise" compatibility?


The diffs below are what I have now.  You can do things like

   fid = fopen ("any-existing-file");
   xtextscan (fid, "any format here for testing")

and xtextscan will display the components of the format.

I can't comment as this is the Octave dialect of C++ :-)  (beyond me)
Thank you anyway.

Philip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]