octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4]


From: Philip Nienhuis
Subject: Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4]
Date: Fri, 20 Sep 2013 13:56:59 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Markus wrote:
Am 2013-09-18 22:34, schrieb Philip Nienhuis:
:
<snip>
:
If xlsxread.m works OK to your satisfaction, we'll have to think on:
- Do you want it to remain Matlab-compatible, or Octave style

It's already Matlab-incompatible ;)

- Do you want in the OF-io package
- If yes (I suppose so), well: how then?
* do we split it up cf. the other interfaces, or
* keep it in one file as just xlsxread
* do you want to add more options, like
> specifying a range to read

Range-read would be nice but it has the lowest priority for me.

OK; although I can imagine this to be a desirable option for many users.

What I'd like to do in the next 1-2 Weeks is:

1) get rid of the named NaN cols and rows who Matlab ignores too

If there are extremely many of those, and reading them from XML takes lots of time, sure, invest there. Otherwise, have a look at xls2oct.m in the io package, it has a final section for stripping enveloping empty rows and columns. It may do just what you need.

2) implement Date and Time interpretation. (never take a look at it)

At that point (date & time) the OF io package is not compatible with Matlab. There are several reasons for that, but I won't digress into those. Just note that severe headaches are looming. Octave's xlsread returns numeric date "values" for dates later than 31 Dec 1899, and text strings for earlier dates; Matlab's xlsread usually -not always- returns all dates as text strings. To convert numerical Excel date values (in the 1/1/900 date system) to Octave datenums just add 693960

3) clean up the code
3.1) here and there little optimizations
3.2) split and merge everything for OF-io package.

> specifying sheets by name and/or by index number

This is implemented long time ago ;) You can do
xlsxread('filename.xlsx','Stupid Map Name') or
xlsxread('filename.xlsx',2) for example.

What does the index stand for? ("2" in the second example)
Is it the position of the worksheet tabs counted from the left, or
the order in which the worksheets have been created (age)? Perusing the XML in expanded .xlsx files I couldn't say with certainty.
In the io package it is the former.

etc etc.

Splitting up would impose a little speed penalty, but add a lot of
flexibility.
You could have a look on how things work for xlsread.m in the io
package and see if you like the philosophy there:

xlsread.m is a wrapper for xlsopen-xls2oct-[xls2oct-][...-]xlsclose
fuctions, to allow reading successive worksheets w/o repeated opening
& closing of a complete spreadsheet file (one of Matlab's more
braindead implementation details).

xlsopen.m is a wrapper for interface-specific functions in ./private/
Same for xls2oct.m and xlsclose.m. The interesting interface-specific
part is at the bottom of those files.

Interface-specific functions would be needed for:
1. Opening the spreadsheet file (unzipping it to a temp dir);
(This has already been done (for ODS) using your solution)

The question is, should we keep the unpack file from upstream octave
3.7.x as __unpack function to make it work in octave 3.6.x (most user
are using this) too? For me this is the best solution and still don't
hurt developement and usage in 3.7.x.

You mean: bug #39148?
There's an easier solution: simply including the patched unpack.m from Octave 3.7.x in the OF io ./inst dir, it would then shadow the "buggy" unpack function from core Octave 3.6.x The only potential harm would be a (another?) start-up message about shadowing that may arouse some overly cautious novice users.

I do plan to make one more OF io package release (1.2.4) for Octave 3.6.x in the next weeks; I plan to include the Java-free odsread there (almost ready), hopefully also your xlsxread.
For those two new features I'd need the patched unpack from 3.7.x as well.

In a more general sense, I guess that once Octave 3.8 is out, most OF package maintainers would rather concentrate on making their packages work with that rather than keeping backward compatibility with Octave-3.6.x. Backward compatibility is just too much for a volunteer project.
So I wouldn't worry too much about unpack.m

2. Detecting the occupied cell range in a worksheet;
3. Reading the data;

2 and 3 is one function (__readnr) in my xlsxread.

It would be very handy to split those. Again, if feasible.

In the java-free odsread.m-to-come (actually private/__OCT_spsh2oct__.m and /private/__OCT_getusedrange__.m), I invoke separate strategies for detecting occupied cell ranges and reading actual data.
In ODS that is easily done; I haven't looked in detail at OOXML yet.

4. Closing the spreadsheet file;
(Extremely easy)

This is the first i do in __readnr :D

Yes but then you'll loose the unzipped temp dir. For each successive worksheet in the same file you'd have to unpack it again, imposing a potentially severe speed penalty. In many a Matlab forum or news group this is one of the popular recurring xlsread issues. The only advice from the Mathworks is to either accept it or dive into ActiveX / Visual Basic programming (yech)

In the OF io package, xlsopen.m returns a struct with a pointer (file handle) to the unzipped file, or temp dir, precisely to avoid reopening/re-unpacking. I just keep those struct pointers around until I'm finally done with the file in question (often only at the end of the Octave session).

In the next week i'm more busy too because it's beginning of term.
So we'll see how fast xlsxread is going forward :)

Even if still a bit incomplete (no range read, no occupied cell range detection, etc) it would be worthwile to include it in upcoming 1.2.4 As long as xlsxread.m doesn't return clearly faulty results I see no obstacles there. We'll just mark it as "experimental".

After all, development will go on, and on, and on :-)

Philip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]