octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #54100] fread using SKIP larger than zero is e


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #54100] fread using SKIP larger than zero is extremely slow
Date: Mon, 11 Jun 2018 16:48:57 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0

Follow-up Comment #1, bug #54100 (project octave):

Without even testing, I think there is a clear source of slow down.  The code
that stands out to me is in libinterp/corefcn/oct-stream.cc, around lines
6635-6659:


            if (is && skip != 0 && nel == block_size)
              {
                // Seek to skip.
                // If skip would move past EOF, position at EOF.

                off_t orig_pos = tell ();

                seek (0, SEEK_END);

                off_t eof_pos = tell ();

                // Is it possible for this to fail to return us to
                // the original position?
                seek (orig_pos, SEEK_SET);

                off_t remaining = eof_pos - orig_pos;

                if (remaining < skip)
                  seek (0, SEEK_END);
                else
                  seek (skip, SEEK_CUR);

                if (! is)
                  break;
              }


The above is inside a while loop that is reading in blocks of data.  The seek
(skip, SEEK_CUR) is probably OK, but the stuff preceding it that checks the
end of the file pointer is most likely disrupting the cache and slowing things
down.

If possible, the context might allow checking the EOF location just once
before the loop starts.  But another option might be to use the streams
existing features.  I'm pretty sure that there is some way to simply do the
seek and then inquire the status of the input stream, i.e., whether it has
advanced past the end of the file.

Furthermore, the read routine will set some flags if it runs out of data to
read at the EOF:

http://www.cplusplus.com/reference/istream/istream/read/
"If the input sequence runs out of characters to extract (i.e., the
end-of-file is reached) before n characters have been successfully read, the
array pointed to by s contains all the characters read until that point, and
both the eofbit and failbit flags are set for the stream."

so just check those flags.  Upon getting those error flags, THEN retroactively
compute how many fields were actually read.  Even if that number isn't readily
available in the stream library, simply keep track of the "cpos = tell();"
prior to every file read.  In that case, one would know the position of the
start of the last read and the EOF position can be gotten easily.

In summary, the way to speed this up is to not compute how many bytes to read
before reading.  Instead just let the C library do its thing.  (If one wants
to compute how many bytes to read, do the whole computation prior, i.e.,
compute something like N_blocks the number of full blocks to read and
N_leftover the number of bytes leftover for the last non-full block.)

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?54100>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]