bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22533: Python bytecode reproducibility


From: Marius Bakke
Subject: bug#22533: Python bytecode reproducibility
Date: Tue, 06 Mar 2018 00:21:21 +0100
User-agent: Notmuch/0.26 (https://notmuchmail.org) Emacs/25.3.1 (x86_64-pc-linux-gnu)

Ricardo Wurmus <address@hidden> writes:

> I have applied this patch locally:
>
> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
> index 5f701701a..0d1ecc3c6 100644
> --- a/gnu/packages/python.scm
> +++ b/gnu/packages/python.scm
> @@ -359,8 +359,42 @@ data types.")
>                                "Lib/ctypes/test/test_win32.py" ; fails on 
> aarch64
>                                "Lib/test/test_fcntl.py")) ; fails on aarch64
>                    #t))))
> -    (arguments (substitute-keyword-arguments (package-arguments python-2)
> -                 ((#:tests? _) #t)))
> +    (arguments
> +     (substitute-keyword-arguments (package-arguments python-2)
> +       ((#:tests? _) #t)
> +       ((#:phases phases)
> +        `(modify-phases ,phases
> +           (add-after 'unpack 'patch-timestamp-for-pyc-files
> +             (lambda _
> +               ;; We set DETERMINISTIC_BUILD to only override the mtime when
> +               ;; building with Guix, lest we break auto-compilation in
> +               ;; environments.
> +               (setenv "DETERMINISTIC_BUILD" "1")
> +               (substitute* "Lib/py_compile.py"
> +                 (("source_stats\\['mtime'\\]")
> +                  "(1 if 'DETERMINISTIC_BUILD' in os.environ else 
> source_stats['mtime'])"))
> +
> +               ;; Use deterministic hashes for strings, bytes, and datetime
> +               ;; objects.
> +               (setenv "PYTHONHASHSEED" "0")
> +
> +               ;; Reset mtime when validating bytecode header.
> +               (substitute* "Lib/importlib/_bootstrap_external.py"
> +                 (("source_mtime = int\\(source_stats\\['mtime'\\]\\)")
> +                  "source_mtime = 1"))
> +               #t))
> +           (add-after 'unpack 'disable-timestamp-tests
> +             (lambda _
> +               (substitute* 
> "Lib/test/test_importlib/source/test_file_loader.py"
> +                 (("test_bad_marshal")
> +                  "disable_test_bad_marshal")
> +                 (("test_no_marshal")
> +                  "disable_test_no_marshal")
> +                 (("test_non_code_marshal")
> +                  "disable_test_non_code_marshal"))
> +               #t))
> +           (add-before 'check 'allow-non-deterministic-compilation
> +             (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t))))))
>      (native-search-paths
>       (list (search-path-specification
>              (variable "PYTHONPATH")
>
> It allows me to build python-six and python-sip reproducibly.  It does
> not fix problems with Python 2, and I haven’t yet tested if it causes
> any new problems.
>
> It’s a little worrying that I had to disable three more tests that I
> think shouldn’t have failed.

Woow, nice work!  I can't tell what's going on with the tests, they do
some bytecode manipulation stuff.  Maybe it does not expect the low
timestamp somehow?

https://github.com/python/cpython/blob/374c6e178a7599aae46c857b17c6c8bc19dfe4c2/Lib/test/test_importlib/source/test_file_loader.py#L457-L484

I guess we'll do at least one 'core-updates' before 3.7 is released, so
it makes sense to include this.  It should also give us some experience
that might be relevant for 2.7, since it probably won't get the upstream
reproducibility patch that relies on 3.7 features.

The only remark I have is: is introducing a new variable necessary?
SOURCE_DATE_EPOCH implies that the user wants a deterministic build;
the upstream patch doesn't actually honor it outside of making the
hashing method deterministic.  So, I think it might be enough to just
test for SOURCE_DATE_EPOCH instead of DETERMINISTIC_BUILD.  The former
is also already set in the build environment.

However, I just noticed that you unset DETERMINISTIC_BUILD before the
'check' phase.  Did it break more things?

I suppose we'll have to set PYTHONHASHSEED somewhere in
python-build-system as well.  Did you check if that makes a difference
for numpy?  Perhaps it's enough to set it if we add an auto-compilation
step?

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]