[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Windows truncating data - Bug in src/libpspp/ext-array.c
From: |
John Darrington |
Subject: |
Windows truncating data - Bug in src/libpspp/ext-array.c |
Date: |
Wed, 13 Jun 2012 06:53:16 +0000 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
Thanks to the efforts of Harry and Henry, I think we have a lead on
this problem which has been reported on and off on Windows where
the dataset gets truncated and mysterious write errors on the temporary
files are reported. It would seem that the bug has the
potential to cause problems on systems other than Windows too.
I believe the cause is the optimisation in src/libpspp/ext-array.c
which contains this code:
static bool
do_seek (const struct ext_array *ea_, off_t offset)
{
struct ext_array *ea = CONST_CAST (struct ext_array *, ea_);
if (!ext_array_error (ea))
{
if (ea->position == offset)
return true;
else if (fseeko (ea->file, offset, SEEK_SET) == 0)
{
ea->position = offset;
return true;
}
else
error (0, errno, _("seeking in temporary file"));
}
return false;
}
The lines:
if (ea->position == offset)
return true;
avoid performing a seek if the destination of the potential seek
happens to be the current position (which would be the most common case).
This is ok, except when the current operation is a write and the previous
one a read (or vici-versa).
The posix spec says:
When a file is opened with update mode ( '+' as the second or third character
in the mode argument), both input and output may be performed on the
associated
stream. However, the application shall ensure that output is not directly
followed by input without an intervening call to fflush() or to a file
positioning function ( fseek(), fsetpos(), or rewind()), and input is not
directly followed by output without an intervening call to a file positioning
function, unless the input operation encounters end-of-file.
[http://pubs.opengroup.org/onlinepubs/009695399/functions/fopen.html]
By avoiding the seek, we are violating this condition.
The Microsoft documentation basically says the same:
When the "r+", "w+", or "a+" access type is specified, both reading and
writing are allowed (the file is said to be open for "update"). However, when
you switch between reading and writing, there must be an intervening fflush,
fsetpos, fseek, or rewind operation. The current position can be specified for
the fsetpos or fseek operation, if desired.
[http://msdn.microsoft.com/en-us/library/yeby3zcb%28v=vs.80%29.aspx]
Interestingly, until quite recently, the GNU libc documentation, after
mentioning that this requirement exists in the ANSI standard, then had the
sentance:
The GNU C library does not have this limitation; you can do arbitrary reading
and writing operations on a stream in whatever order.
But this sentance has recently been deleted, and the bug report which
gave rise to its deletion suggests that it was and had for a long time
been erroneous:
[http://pubs.opengroup.org/onlinepubs/009695399/functions/fopen.html]
So it would seem that it is unsafe (even on GNU/Linux) not to seek (or flush)
before switching between reading and writing.
I sent Harry a patch which basically disabled this optimisation completely, and
Henry's report suggested that this fixed the problem, but caused the operation
to take a lot longer to run (not suprising). I suggest that the correct fix
should involve a flag in the ext_array struct which records the direction of
the most recent operation (read or write) and ensures that the seek is always
done if the direction has changed.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature
- Windows truncating data - Bug in src/libpspp/ext-array.c,
John Darrington <=