octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xlsread in Octave 3.6.4


From: Markus Bergholz
Subject: Re: xlsread in Octave 3.6.4
Date: Wed, 4 Sep 2013 02:54:07 +0200




On Tue, Sep 3, 2013 at 11:42 PM, Philip Nienhuis <address@hidden> wrote:
<moved from help-octave to octave-maintainers ML>

Markus Bergholz wrote:
On Mon, Sep 2, 2013 at 11:38 AM, Markus Bergholz <address@hidden
<mailto:address@hidden>> wrote:
    On Mon, Sep 2, 2013 at 12:10 AM, Markus Bergholz <address@hidden
    <mailto:address@hidden>> wrote:
        On Sun, Sep 1, 2013 at 11:42 PM, PhilipNienhuis
        <address@hidden <mailto:address@hidden>> wrote:
            Markus Bergholz wrote

<snip>


             >>>>     Markus Bergholz wrote
             >>>> > I haven't follow this thread and it's issue, but
            i've wrote a
             >>>>     xlsxread
             >>>> > function whitch don't need java.
             >>>> > but it's very very rudimentary, works just with
            linux and is a
             >>>>     quick&dirty
             >>>> > write-down.
             >>>> > furthermore, you have to remove the string-analyse
            part, if your
             >>>>     sheet
             >>>> > don't contain strings.
             >>>> > but maybe it helps someone else or someone want to
            improve it or
             >>>>     someone
             >>>> > rewrite it in c/c++ as oct file, to get it even
            faster than
             >>>>     matlab (for me
             >>>> > it's still faster than the java stuff atm).

<snip, see thread on help-octave ML>


    i've made a few quick and dirty changes, change to gpl licence and
    commit the broken range part too.

    https://github.com/markuman/xlsxread

    it's now plattform indepentend and - once again - faster than before
    (~58 seconds). now it's nearly twice as fast as matlab (~110 seconds).
    enough time to waste it for ranges, strings etc in future.



here comes version 0.6 - https://github.com/markuman/xlsxread

* strings and calculations are now replaced with NaN (without any speed
losses!)
   * testet with a excel 2007 and excel for mac 2011 file (example files
are added)
* it's using now nested functions. this should be easier to ingetrate it
in octave-io

ranges and empty columns still don't work!

Good work Markus.

Anyway, sorry to come up with a few more potential gotchas:

i know it's not finished ;)
 

- Interesting would be if your code properly handles merged and hidden cells. I don't know what they look like in raw OOXML.

- Does OOXML have repeated-rows and repeated-columns "folding"?
E.g., ODS1.2 has the table:TableNumberRowsRepeated and table:TableNumberColumnsRepeated tags.

Yes, I think we are talking about the same. The last time I've take a look at it is 3 month ago. That's the next step.
And it would be helpful if some others can commit some example files for this situation. That's all i got https://github.com/markuman/xlsxread/tree/master/example

 

It would be really good to have a Java-free (and ActiveX-free) spreadsheet reading capability in Octave, even if only a basic one. 
Sergei suggested a Perl-based solution; but Perl would still be a dependency, not all systems have Perl installed (e.g., Windows). 

So this is obsolete now.
 
You've made a first try for OOXML; I have a basis for decoding ODS lying around, it doesn't work at all yet but might not need undue amounts of attention.
You made the vital piece: unzipping the spreadsheet file to disk.

For inclusion in the OF io package (in a later stage, first try to get your version robust and fail-safe) I'd suggest to see how the various "interfaces" are built and called in the OF io package.


I was guessing that this was the message from the "ToDo" and "ChangeLog" part from the README.md file.
* do the cell folding (e.g. empty columns) part
* make it more robust (there may be >100 disregarded xlsx variation) 
* cleanup the code and fit it for the io package
but next week i'm on travel so there won't be any updates in the next ~2 weeks from me.
 
For follow-up I'd suggest to move this discussion to the maintainers list. I've swapped help-octave into octave-maintainers.


Oh BTW another idea (that I explored in 2009 but couldn't get to work at the time):
There is a binary (compiled) xmlread function, currently it is in the io package. Maybe with a proper "template" it could just read the worksheets into a struct in RAM, faster than regexp can decode it. The missing piece is the "template". (sorry for my lack of XML proficiency & lingo) Unfortunately that xmlread is very tersely, if not badly documented.
However there are xml toolboxes around that could be gotten to work in Octave.

I'll take a look at it.  
But at the moment i'm happy with this solution. It's easy and fast to kill several needs (like t="s" and formulars like <f>) with just one regexp command - even if it's nested.

 

Philip



--
icq: 167498924
XMPP|Jabber: address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]