[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Duplicity-talk] [patch] massive performance fix for large volume si
From: |
Peter Schuller |
Subject: |
Re: [Duplicity-talk] [patch] massive performance fix for large volume sizes |
Date: |
Mon, 10 Sep 2007 20:15:29 +0200 |
User-agent: |
Mutt/1.5.16 (2007-06-09) |
> def get_data_block(self, fp, max_size):
> """Return pair (next data block, boolean last data block)"""
> - buf = fp.read(max_size)
> + buf = fp.read(min(max_size, 64*1024))
> if len(buf) < max_size:
> if fp.close(): raise DiffDirException("Error closing
> file")
> return (buf, 1)
This is broken. I just noticed that python, being python, has strange
read() semantics - a read(n) is actually guaranteed to return n bytes
except on EOF, which the duplicity code is written to exploit. As a
result none of the code seems to be written to handle multiple reads
being required to fill a block. Will have to look at it properly to
come up with a proper fix. In the mean time, don't apply this one. I
patched it like this since in any normal API a read() is pretty much
always guaranteed to return short unless you explicitly asked for
other behavior, so capping the read felt like a very safe
change... apparantly not.
Note however that the performance improvement is not bugus; the reason
I discovered this to begin with was that it took seconds to process
even tiny files of a coupleof kilobytes large. In other worse, the
difference does not all lie in the fact that you end up reading less
data.
--
/ Peter Schuller
PGP userID: 0xE9758B7D or 'Peter Schuller <address@hidden>'
Key retrieval: Send an E-Mail to address@hidden
E-Mail: address@hidden Web: http://www.scode.org
pgpqFY8PyFP48.pgp
Description: PGP signature