Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4]

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4]

From:	Philip Nienhuis
Subject:	Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4]
Date:	Fri, 20 Sep 2013 13:56:59 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Markus wrote:

Am 2013-09-18 22:34, schrieb Philip Nienhuis:

:
<snip>
:

If xlsxread.m works OK to your satisfaction, we'll have to think on:
- Do you want it to remain Matlab-compatible, or Octave style


It's already Matlab-incompatible ;)

- Do you want in the OF-io package
- If yes (I suppose so), well: how then?
* do we split it up cf. the other interfaces, or
* keep it in one file as just xlsxread
* do you want to add more options, like
> specifying a range to read


Range-read would be nice but it has the lowest priority for me.


OK; although I can imagine this to be a desirable option for many users.

What I'd like to do in the next 1-2 Weeks is:

1) get rid of the named NaN cols and rows who Matlab ignores too

If there are extremely many of those, and reading them from XML takeslots of time, sure, invest there.Otherwise, have a look at xls2oct.m in the io package, it has a finalsection for stripping enveloping empty rows and columns. It may do justwhat you need.

2) implement Date and Time interpretation. (never take a look at it)

At that point (date & time) the OF io package is not compatible withMatlab. There are several reasons for that, but I won't digress intothose. Just note that severe headaches are looming.Octave's xlsread returns numeric date "values" for dates later than 31Dec 1899, and text strings for earlier dates; Matlab's xlsread usually-not always- returns all dates as text strings. To convert numericalExcel date values (in the 1/1/900 date system) to Octave datenums justadd 693960

3) clean up the code
3.1) here and there little optimizations
3.2) split and merge everything for OF-io package.

> specifying sheets by name and/or by index number


This is implemented long time ago ;) You can do
xlsxread('filename.xlsx','Stupid Map Name') or
xlsxread('filename.xlsx',2) for example.


What does the index stand for? ("2" in the second example)
Is it the position of the worksheet tabs counted from the left, or

the order in which the worksheets have been created (age)? Perusing theXML in expanded .xlsx files I couldn't say with certainty.

In the io package it is the former.

etc etc.

Splitting up would impose a little speed penalty, but add a lot of
flexibility.
You could have a look on how things work for xlsread.m in the io
package and see if you like the philosophy there:

xlsread.m is a wrapper for xlsopen-xls2oct-[xls2oct-][...-]xlsclose
fuctions, to allow reading successive worksheets w/o repeated opening
& closing of a complete spreadsheet file (one of Matlab's more
braindead implementation details).

xlsopen.m is a wrapper for interface-specific functions in ./private/
Same for xls2oct.m and xlsclose.m. The interesting interface-specific
part is at the bottom of those files.

Interface-specific functions would be needed for:
1. Opening the spreadsheet file (unzipping it to a temp dir);
(This has already been done (for ODS) using your solution)


The question is, should we keep the unpack file from upstream octave
3.7.x as __unpack function to make it work in octave 3.6.x (most user
are using this) too? For me this is the best solution and still don't
hurt developement and usage in 3.7.x.


You mean: bug #39148?

There's an easier solution: simply including the patched unpack.m fromOctave 3.7.x in the OF io ./inst dir, it would then shadow the "buggy"unpack function from core Octave 3.6.xThe only potential harm would be a (another?) start-up message aboutshadowing that may arouse some overly cautious novice users.

I do plan to make one more OF io package release (1.2.4) for Octave3.6.x in the next weeks; I plan to include the Java-free odsread there(almost ready), hopefully also your xlsxread.

For those two new features I'd need the patched unpack from 3.7.x as well.

In a more general sense, I guess that once Octave 3.8 is out, most OFpackage maintainers would rather concentrate on making their packageswork with that rather than keeping backward compatibility withOctave-3.6.x. Backward compatibility is just too much for a volunteerproject.

So I wouldn't worry too much about unpack.m

2. Detecting the occupied cell range in a worksheet;
3. Reading the data;


2 and 3 is one function (__readnr) in my xlsxread.


It would be very handy to split those. Again, if feasible.

In the java-free odsread.m-to-come (actually private/__OCT_spsh2oct__.mand /private/__OCT_getusedrange__.m), I invoke separate strategies fordetecting occupied cell ranges and reading actual data.

In ODS that is easily done; I haven't looked in detail at OOXML yet.

4. Closing the spreadsheet file;
(Extremely easy)


This is the first i do in __readnr :D

Yes but then you'll loose the unzipped temp dir. For each successiveworksheet in the same file you'd have to unpack it again, imposing apotentially severe speed penalty.In many a Matlab forum or news group this is one of the popularrecurring xlsread issues. The only advice from the Mathworks is toeither accept it or dive into ActiveX / Visual Basic programming (yech)

In the OF io package, xlsopen.m returns a struct with a pointer (filehandle) to the unzipped file, or temp dir, precisely to avoidreopening/re-unpacking.I just keep those struct pointers around until I'm finally done with thefile in question (often only at the end of the Octave session).

In the next week i'm more busy too because it's beginning of term.
So we'll see how fast xlsxread is going forward :)

Even if still a bit incomplete (no range read, no occupied cell rangedetection, etc) it would be worthwile to include it in upcoming 1.2.4As long as xlsxread.m doesn't return clearly faulty results I see noobstacles there. We'll just mark it as "experimental".


After all, development will go on, and on, and on :-)

Philip

[Prev in Thread]

Current Thread

[Next in Thread]

Re: xlsread in Octave 3.6.4, (continued)
- Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/30
  - Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/30

Prev by Date: Re: assert () taking long time
Next by Date: About two open bug reports for the dataframe package
Previous by thread: Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4]
Next by thread: Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4]
Index(es):
- Date
- Thread