[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Race condition on downloaded files among multiple wget in
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] Race condition on downloaded files among multiple wget instances |
Date: |
Wed, 04 Sep 2013 09:38:15 +0200 |
User-agent: |
KMail/4.10.5 (Linux/3.10-2-amd64; KDE/4.10.5; x86_64; ; ) |
On Tuesday 03 September 2013 23:17:09 Ángel González wrote:
> On 03/09/13 11:16, Tim Ruehsen wrote:
> > What should it say than ?
> > My ideas are limited to something like
> > "There was an unexpected signal SIGBUS. It may be a bug or a misuse of
> > Wget or your hardware is broken. Please think about it.".
> >
> > This does not give more information than a "SIGBUS".
> > Ideas welcome.
>
> Well, if it shall provide more information...
>
> Error reading links.html file. I was expecting it to have 23K, but it
> now suddenly has
> only 420 bytes. Seems that another program has changed it behind my
> back. It is
> unacceptable to perform my job under this conditions.
> *wget exited*
Very well, if this would be possible. Right now I have no idea how to print
something like the above. I made Tomas Hozza's test with valgrind and wget
having debug info. I got 18x (out of 20x) SIGBUS, but on completely different
places in the code. Within the misuse test situation, SIGBUS could occur at
any place where memory access (read or write) allocated by wget_read_file().
Absolutely randomly / unpredictable if an outside process changes the file
size and/or content at the same time.
And SIGBUS could also occur out of any other reason (e.g. real bugs in Wget).
As was already said, replacing mmap by read would not crash (wget_read_file()
reads as many bytes as there are without prior checking the length of the
file). But without additional logic, it might read random data (many processes
writing into the file at the same time, not necessarily the same data). Wget
would try to parse / change (-k) it, the result would be broken, but no error
would be printed. So, replacing mmap is not a solution, but maybe a part of a
solution.
Now to the possible solutions that come into my mind:
1. While downloading / writing data, Wget could build a checksum of the file.
It allows checking later when re-reading the file. In this case we could
really tell the user: hey, someone trashed our file while we are working...
To get this working, we must remove the mmap code.
2. Using tempfiles / tempdirs only and move them to the right place. That
would bring in some kind of atomicity, though there are still conflicts to
solve (e.g. a second Wget instance is faster - should we overwrite existing
files / directories).
3. Keeping html/css files in memory after downloading. These are the ones we
later re-read to parse them for links/URLs. Writing them to disk after parsing
using a tempfile and a move/rename to have atomicity.
4. Using (advisory) file-locks just helps against other Wget instances (is
that enough ?). And with -k you have to keep the descriptor open for each file
until Wget is done with downloading everything. This is not practical, since
there could be (10-, 100-)thousands of files to be downloaded.
If someone likes to work on a patch, here is my opinion: I would implement 1.
as the least complex to code (but it needs more CPU). Point 4 is would not
work in any cases.
Regards, Tim
- [Bug-wget] Race condition on downloaded files among multiple wget instances, Tomas Hozza, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Tim Ruehsen, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Tomas Hozza, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Tim Ruehsen, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Daniel Stenberg, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Darshit Shah, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Tim Ruehsen, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Daniel Stenberg, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Tim Ruehsen, 2013/09/04
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Ángel González, 2013/09/03
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances,
Tim Ruehsen <=
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Ángel González, 2013/09/04
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Tomas Hozza, 2013/09/09
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Anthony Bryan, 2013/09/09
- Re: [Bug-wget] Race condition on downloaded files among multiple wget instances, Giuseppe Scrivano, 2013/09/10