Re: strread.m

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strread.m

From:	Philip Nienhuis
Subject:	Re: strread.m
Date:	Wed, 03 Aug 2011 21:47:32 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

John W. Eaton wrote:

On  3-Aug-2011, Philip Nienhuis wrote:

| John W. Eaton wrote:
|
|>  to have many more format options.  So why handle textscan with
|>  strread?
|
| Because Octave's textscan has been written that way.
| Perhaps the thinking was along the lines of "there's a scripted
| strread.m available; a binary strread replacement can easily be swapped
| in as soon as there is one."
| Ben might be able to tell you more (he is the author of textscan).

| Anyway, would there be a problem in extending (parts of) Octave's
| strread (and textread) versatility beyond that of Matlab's? I guess not.

The Matlab docs say that textscan is intended to replace both textread
and strread.  And since textscan seems more versatile than either of
the obsolete functions, it seems better to me to write a complete
textscan implementation in C++ and then perhaps try to use that to
implement strread and textread.  Though I think there may be problems
even with that.  For example, what does Matlab do with

   [a, b] = strread ('1 8 1', '%u8 %u');

vs

   a = textscan ('1 8 1', '%u8 %u');


a =
        [2x1 uint8][8]

a{1} =
        1
        1

a{2} =
                8

?  I expect that in the first case, it will skip reading the 8 and
return a and b as double values,


Indeed.

         while in the second it will read all
three values and return the first and third as uint8 values and the
second as a uint32 value.


Yep.

              If so, then I don't know how you would take
the format that is passed to strread and convert it to something that
textscan can use to obtain the same result as strread.

Yeah, a strread deficiency that is unavoidable and caused by strictlyignoring whitespace: accepting that "cuddling" literals in the formatstring can match non-cuddling literals in the file (string). But I haveseen ML textscan behaviour that is not much better; those corner casesare just more concealed.

Imitating this strread behaviour in Octave (which IMO comes close tobug-for-bug compatibility) goes along the way outlined in my long postin bug #33875:- Separating the format string parsing into a separate utility functionin /private subdir of /io- Let textscan, textread and also strread call these functions directly(they need parts of this anyway to a.o., determine number of output args)- Separating more parts of dev source strread.m into /private utilityfunctions (exploring file column build-up and matching it to the format;comment line handling; maybe more)- Have textread and textscan communicate with strread v.v. usingundocumented parameter/value pairs to convey info & modify behavior.The latter (communication using undocumented args) is also needed forproperly resuming reading by textscan.

Splitting up strread would make the code easier to maintain as well. Butit wouldn't solve the fundamental issues of the way Octave's strreadparses files (the biggest headache for %g, %c, and %[] formats).

Admittedly this all looks like, or it just is, prolonged polishing of abig kludge, but IMO it is doable, would need less time investment, andcould be done faster than rebuilding textscan (-.oct) from the ground up(though I could be wrong there, of course).


If you think it's a waste of time, just say so; no offense taken.

Dumping my work in favor of a compiled textscan (or oct-file called bytextscan-as-it-stands) isn't a problem for me and even preferrable. Ijust needed my patches to get urgent things done. I might still go aheadfixing the current scripts for myself as long as I see an urgent needwhile there is no viable alternative. (the beauty of open source)

I've got little proficiency at C++; I can understand simple existingcode and even fix little things, but creating complete functions isbeyond me.

So fixing the script versions is all I can do.

Philip

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Release goals for 3.6, (continued)
- Re: Release goals for 3.6, Konstantinos Poulios, 2011/08/03

Prev by Date: Re: Binary distribution
Next by Date: Re: strread.m
Previous by thread: Re: strread.m
Next by thread: Re: strread.m
Index(es):
- Date
- Thread