[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: false-positive failure of the root-removal test
From: |
Pádraig Brady |
Subject: |
Re: false-positive failure of the root-removal test |
Date: |
Thu, 15 Oct 2015 02:44:40 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 |
On 14/10/15 18:43, Jim Meyering wrote:
> Running a massively parallel "make very-expensive-check"
> (-j73 on a 48-core system), the rm/r-root.sh test would fail
> about 1-in-2 or 1-in-3 trials due to expiration of the 2-second
> timeout here:
>
> diff --git a/tests/rm/r-root.sh b/tests/rm/r-root.sh
> index c06332a..4e645e6 100755
> --- a/tests/rm/r-root.sh
> +++ b/tests/rm/r-root.sh
> @@ -88,7 +88,7 @@ exercise_rm_r_root ()
> skip_exit='CU_TEST_SKIP_EXIT=1'
> fi
>
> - timeout --signal=KILL 2 \
> + timeout --signal=KILL 5 \
> env LD_PRELOAD=$LD_PRELOAD:./k.so $skip_exit \
> rm -rv --one-file-system "$@" > out 2> err
>
> I made the above change and observed that the whole test then
> succeeded 6 times in a row. Then I read the comment above that change:
>
> # exercise_rm_r_root: shell function to test "rm -r '/'"
> # The caller must provide the FILE to remove as well as any options
> # which should be passed to 'rm'.
> # Paranoia mode on:
> # For the worst case where both rm(1) would fail to refuse to process the "/"
> # argument (in the cases without the --no-preserve-root option), and
> # intercepting the unlinkat(1) system call would fail (which actually already
> # has been proven to work above), and the current non root user has
> # write access to "/", limit the damage to the current file system via
> # the --one-file-system option.
> # Furthermore, run rm(1) via timeout(1) that kills that process after
> # a maximum of 2 seconds.
>
> So maybe compromise at 3 seconds (with that, it's passed 4 times so far)?
> Probably better still: I'll remember this and decrease -j's argument from
> 1+3N/2 to something slightly less abusive.
The comment above that is a bit more explicit, i.e.
# This isn't terribly expensive, but it must not be run under heavy load.
# The reason is the conservative 'timeout' setting below to limit possible
# damage in the worst case which yields a race under heavy load.
# Marking this test as "expensive" therefore is a compromise, i.e., adding
# this test to the list ensures it still gets _some_ (albeit minimal)
# coverage while not causing false-positive failures in day to day runs.
It would be better to avoid the race of course.
timeout(1) isn't great protection anyway
as 2 seconds allows for a lot of unlink calls.
Perhaps leveraging gdb to limit the number of unlink calls is better.
...
So I had a look and ended up debugging the debugger :/
https://sourceware.org/bugzilla/show_bug.cgi?id=10079
...
anyway the attached leverages gdb to add the protection
which should be better and free of races.
cheers,
Pádraig.
coreutils-rm-root-test-race.patch
Description: Text Data
- false-positive failure of the root-removal test, Jim Meyering, 2015/10/14
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test,
Pádraig Brady <=
- Re: false-positive failure of the root-removal test, Bernhard Voelker, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Bernhard Voelker, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15
- Re: false-positive failure of the root-removal test, Pádraig Brady, 2015/10/15
- Re: false-positive failure of the root-removal test, Jim Meyering, 2015/10/15