bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tar extract errors occurring rarely but with same symptoms


From: Daniel Villeneuve
Subject: Re: tar extract errors occurring rarely but with same symptoms
Date: Sun, 6 Jun 2021 01:23:16 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0

On 6/5/21 4:24 PM, Paul Eggert wrote:
On 6/5/21 9:39 AM, Daniel Villeneuve wrote:

And when a problem like the above occurs, our process tries it again and it 
usually succeeds (that is, I've never witnessed two successive failures).

When you say "it usually succeeds", do you mean that you extract from the exact 
same tarball and it works sometimes, but not others? Or that you generate a new tarball 
and the extraction fails from the new tarball?

The same tarball is extracted anew.
Can you supply an example of a tarball where extraction failed?
Unfortunately, these tarballs contain proprietary information.
I might try to build something similar and expect a failure on _this_ instance.
# suspicious SIGCHLD
wait4(47737, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 47737
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=47737, 
si_uid=759245037, si_status=0, si_utime=674, si_stime=37} ---

That looks OK to me; tar is being signaled that gzip exited.
Indeed, I have 58 extracts (with 2 failures) that all have this wait4+SIGCHLD.
It happens that the 2 failures are the only ones for which the SIGCHLD happens 
so late (just before newfstatat).
As you say, maybe it's not relevant.

# some successful file time updates
Yes, these are from tar creating delayed symlinks (or hardlinks to symlinks) 
when the symlink contents are dicey and could have caused problems if they had 
been extracted earlier.

# first errors
newfstatat(AT_FDCWD, "XXXXXX_config/static/users", 0x7ffc9b136f50, 
AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)

This looks like the start of another attempt to create a delayed symlink; the 
first step is to get the status of the placeholder file. Please go back to 
earlier parts of the strace output, and look for references to this same 
filename. What syscalls do you see?

This is what happens earlier in the traces, both the one which fails and the 
one which succeeds:

openat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", 
O_WRONLY|O_CREAT|O_EXCL, 000) = 4
fstat(4, {st_mode=S_IFREG|000, st_size=0, ...}) = 0
close(4)                                = 0
read(3, "XXXXXX_config/static/users\0\0\0\0\0\0"..., 10240) = 10240
openat(AT_FDCWD, "XXXXXX_config/static/users", O_WRONLY|O_CREAT|O_EXCL, 000) = 4
fstat(4, {st_mode=S_IFREG|000, st_size=0, ...}) = 0
close(4)                                = 0

where unlockInLine.bm is a symlink on an existing file, and users is a symlink 
on a non-existent file.

At the end of the extract that fails:

# a few files with correct time settings
newfstatat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 0) = 0
linkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el6-gnu/jdk1.8.0", AT_FDCWD, 
"prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 0) = 0
newfstatat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 0) = 0
linkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el6-gnu/jdk1.8.0", AT_FDCWD, 
"prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 0) = 0
# first error (symlink on non-existent file)
newfstatat(AT_FDCWD, "XXXXXX_config/static/users", 0x7ffc9b136f50, 
AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
# next error (symlink on existent file)
newfstatat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", 0x7ffc9b136f50, 
AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
# all other attempts for newfstatat are ENOENT

At the end of the extract that succeeds:

# a few files with correct time settings
newfstatat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 0) = 0
linkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el6-gnu/jdk1.8.0", AT_FDCWD, 
"prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 0) = 0
newfstatat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 0) = 0
linkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el6-gnu/jdk1.8.0", AT_FDCWD, 
"prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 0) = 0
# here, a success
newfstatat(AT_FDCWD, "XXXXXX_config/static/users", {st_mode=S_IFREG|000, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "XXXXXX_config/static/users", 0) = 0
symlinkat("../../gui/config/static/users", AT_FDCWD, 
"XXXXXX_config/static/users") = 0
utimensat(AT_FDCWD, "XXXXXX_config/static/users", [UTIME_OMIT, 
{tv_sec=978325200, tv_nsec=0} /* 2001-01-01T00:00:00-0500 */], AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "XXXXXX_config/static/users", {st_mode=S_IFLNK|0777, 
st_size=29, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "XXXXXX_config/static/users", 
O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 3
newfstatat(3, "", {st_mode=S_IFLNK|0777, st_size=29, ...}, AT_EMPTY_PATH) = 0
close(3)                                = 0
# followed by other successes
newfstatat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", 
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", 0) = 0
symlinkat("../../gui/config/static/unlockInLine.bm", AT_FDCWD, 
"XXXXXX_config/static/unlockInLine.bm") = 0
utimensat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", [UTIME_OMIT, 
{tv_sec=978325200, tv_nsec=0} /* 2001-01-01T00:00:00-0500 */], AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", 
{st_mode=S_IFLNK|0777, st_size=39, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", 
O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 3
newfstatat(3, "", {st_mode=S_IFLNK|0777, st_size=39, ...}, AT_EMPTY_PATH) = 0
close(3)                                = 0

In the original data, what is the file XXXXXX_config/static/users? I assume it's a symlink; what 
does it point to? Also, what's "XXXXXX_config" and "XXXXXX_config/static"? Are 
they symlinks to directories?
XXXXXX_config/static/users: symlink on non-existent file
XXXXXX_config/static: real directory
XXXXXX_config: real directory

Was XXXXXX_config one of the files you explicitly mentioned when creating the 
tarball? Did it follow other files?
Yes, it was one of the directory entries.
This is a plain tar of a directory, but with first level, reordered, containing 
only existing files and directories.
No directories "finalized" multiple times (that I understand would require 
--delay-directory-restore).

Please send along any data that could help us reproduce the bug. Thanks.
I have not succeeded at reproducing the problem by myself, with any archive 
created.
I've just observed it from time to time out of multiple runs.

I've diff'ed the strace outputs, and the only differences (apart from addresses 
and sizes of reads) are the moment SIGCHLD is received and the tail part 
starting at the first error on the users symlink.

I'll try to find a way to share a failing tarball.

Regards,
--
Daniel Villeneuve





reply via email to

[Prev in Thread] Current Thread [Next in Thread]