bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20597: ‘unlinkat’ bug in Linux 4.0.2 leads to tar test failure


From: Pádraig Brady
Subject: bug#20597: ‘unlinkat’ bug in Linux 4.0.2 leads to tar test failure
Date: Sun, 24 May 2015 12:57:56 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 24/05/15 12:33, Ludovic Courtès wrote:
> (Please keep address@hidden Cc'd.)
> (Gnulib: please scroll further down for the ‘unlinkat’ issue.)
> 
> Andy Patterson <address@hidden> skribis:
> 
>>> I suppose this is Guix 0.8.2 on top of another distribution, right?  Did
>>> you install from source or from the binary tarball?  Did you enable
>>> substitutes (info "(guix) Substitutes")?
>>
>> I was using the USB install medium in a live environment.
> 
> So this is on GuixSD 0.8.2.  ‘test-suite.log’ indeed mentions
> Linux-libre 4.0.2.
> 
>> I had substitutes enabled (I'm pretty sure they're enabled by default
>> here, but I also enabled them manually just to be sure). I wasn't able
>> to install anything with substitutes enabled; it would always stall
>> while trying to update the substitutes list from hydra. When my
>> network went down briefly, it informed me that it was still at 0.0%
>> before exiting. I think that this is probably a separate issue, but
>> which which I was less concerned about since I didn't want to use
>> substitutes anyway.
> 
> OK.
> 
> hydra.gnu.org is unfortunately too often overloaded these days, so you
> probably arrived on a bad day.  Nevertheless, the solution to this
> specific issue is for you to use substitutes to circumvent the bug
> described below.
> 
>>> Does the build succeed if you run it another time with:
>>>
>>>   guix build tar -K -c 1
>>
>> I tried this (with --no-substitutes), but I don't think the test suite
>> actually runs in parallel. I didn't notice any difference in that regard
>> when it was running; it seemed to take up the same amount of time with
>> or without -c 1. I had the same tests fail with the flag enabled.
> 
> Oh you must be right.  Looking at tests/Makefile.in, I see:
> 
> --8<---------------cut here---------------start------------->8---
> check-local: atconfig atlocal $(TESTSUITE)
>       $(SHELL) $(TESTSUITE) $(TESTSUITEFLAGS)
> --8<---------------cut here---------------end--------------->8---
> 
> ... which shows that ./testsuite is not automatically passed -j,
> contrary to what I thought.
> 
> <http://lists.gnu.org/archive/html/bug-tar/2014-08/msg00010.html>
> reports a similar issue but on a different OS.
> 
> I just tried this in a GuixSD VM with Linux-libre 4.0.2:
> 
> --8<---------------cut here---------------start------------->8---
>   mkdir foo
>   mkdir bar
>   echo foo/foo_file > foo/foo_file
>   echo bar/bar_file > bar/bar_file
>   tar -cvf foo.tar --remove-files -C foo . -C ../bar .
>   find .
>   stat bar
> --8<---------------cut here---------------end--------------->8---
> 
> And indeed, it fails (that is, ‘bar’ is left behind.)  It works fine on
> 4.0.4-gnu though.
> 
> On 4.0.2-gnu, I strace’d the ‘tar’ command above:
> 
> --8<---------------cut here---------------start------------->8---
> openat(AT_FDCWD, "foo", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 
> 4
> 
> [...]
> 
> openat(4, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 5
> 
> [...]
> 
> openat(5, "foo_file", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 6
> 
> [...]
> 
> openat(4, "../bar", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> newfstatat(5, ".", {st_mode=S_IFDIR|0755, st_size=60, ...}, 
> AT_SYMLINK_NOFOLLOW) = 0
> openat(5, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 6
> 
> [...]
> 
> openat(6, "bar_file", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 7
> fstat(7, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
> write(1, "./bar_file\n", 11)            = 11
> read(7, "x\n", 2)                       = 2
> fstat(7, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
> close(7)                                = 0
> fstat(6, {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
> brk(0x1a34000)                          = 0x1a34000
> close(6)                                = 0
> write(3, "./\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 10240) = 10240
> close(3)                                = 0
> unlinkat(4, "foo_file", 0)              = 0
> unlinkat(AT_FDCWD, "foo", AT_REMOVEDIR) = 0
> unlinkat(5, "bar_file", 0)              = 0
> unlinkat(4, "../bar", AT_REMOVEDIR)     = -1 ENOENT (No such file or 
> directory)
> --8<---------------cut here---------------end--------------->8---
> 
> Contrast this with the same thing on 4.0.4-gnu:
> 
> --8<---------------cut here---------------start------------->8---
> unlinkat(4, "foo_file", 0)              = 0
> unlinkat(AT_FDCWD, "foo", AT_REMOVEDIR) = 0
> unlinkat(5, "bar_file", 0)              = 0
> unlinkat(4, "../bar", AT_REMOVEDIR)     = 0
> --8<---------------cut here---------------end--------------->8---
> 
> So this looks like a 4.0.2 kernel bug that Gnulib’s unlinkat should
> perhaps work around.
> 
> Thoughts?

Maybe. How widely deployed was 4.0.2 (It's not used in Red Hat land for 
example).
How many versions was the bug present for?
If it was just a fleeting issue, then there is less incentive to workaround.

cheers,
Pádraig






reply via email to

[Prev in Thread] Current Thread [Next in Thread]