octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strread.m


From: Philip Nienhuis
Subject: Re: strread.m
Date: Wed, 03 Aug 2011 21:47:32 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

John W. Eaton wrote:
On  3-Aug-2011, Philip Nienhuis wrote:

| John W. Eaton wrote:
|
|>  to have many more format options.  So why handle textscan with
|>  strread?
|
| Because Octave's textscan has been written that way.
| Perhaps the thinking was along the lines of "there's a scripted
| strread.m available; a binary strread replacement can easily be swapped
| in as soon as there is one."
| Ben might be able to tell you more (he is the author of textscan).

| Anyway, would there be a problem in extending (parts of) Octave's
| strread (and textread) versatility beyond that of Matlab's? I guess not.

The Matlab docs say that textscan is intended to replace both textread
and strread.  And since textscan seems more versatile than either of
the obsolete functions, it seems better to me to write a complete
textscan implementation in C++ and then perhaps try to use that to
implement strread and textread.  Though I think there may be problems
even with that.  For example, what does Matlab do with

   [a, b] = strread ('1 8 1', '%u8 %u');

a =
        1
b =
        1

vs

   a = textscan ('1 8 1', '%u8 %u');

a =
        [2x1 uint8][8]

a{1} =
        1
        1

a{2} =
                8

?  I expect that in the first case, it will skip reading the 8 and
return a and b as double values,

Indeed.

         while in the second it will read all
three values and return the first and third as uint8 values and the
second as a uint32 value.

Yep.

              If so, then I don't know how you would take
the format that is passed to strread and convert it to something that
textscan can use to obtain the same result as strread.

Yeah, a strread deficiency that is unavoidable and caused by strictly ignoring whitespace: accepting that "cuddling" literals in the format string can match non-cuddling literals in the file (string). But I have seen ML textscan behaviour that is not much better; those corner cases are just more concealed.

Imitating this strread behaviour in Octave (which IMO comes close to bug-for-bug compatibility) goes along the way outlined in my long post in bug #33875: - Separating the format string parsing into a separate utility function in /private subdir of /io - Let textscan, textread and also strread call these functions directly (they need parts of this anyway to a.o., determine number of output args) - Separating more parts of dev source strread.m into /private utility functions (exploring file column build-up and matching it to the format; comment line handling; maybe more) - Have textread and textscan communicate with strread v.v. using undocumented parameter/value pairs to convey info & modify behavior. The latter (communication using undocumented args) is also needed for properly resuming reading by textscan.

Splitting up strread would make the code easier to maintain as well. But it wouldn't solve the fundamental issues of the way Octave's strread parses files (the biggest headache for %g, %c, and %[] formats).

Admittedly this all looks like, or it just is, prolonged polishing of a big kludge, but IMO it is doable, would need less time investment, and could be done faster than rebuilding textscan (-.oct) from the ground up (though I could be wrong there, of course).

If you think it's a waste of time, just say so; no offense taken.

Dumping my work in favor of a compiled textscan (or oct-file called by textscan-as-it-stands) isn't a problem for me and even preferrable. I just needed my patches to get urgent things done. I might still go ahead fixing the current scripts for myself as long as I see an urgent need while there is no viable alternative. (the beauty of open source)

I've got little proficiency at C++; I can understand simple existing code and even fix little things, but creating complete functions is beyond me.
So fixing the script versions is all I can do.

Philip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]