[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41669: Cross-compiled powerpc64-linux bootstrap-tarballs not reprodu

From: Bengt Richter
Subject: bug#41669: Cross-compiled powerpc64-linux bootstrap-tarballs not reproducible
Date: Thu, 11 Jun 2020 00:20:08 +0200
User-agent: Mutt/1.10.1 (2018-07-13)

Hi Chris, et al,

On +2020-06-09 23:15:01 -0700, Chris Marusich wrote:
> Hi Vincent and everyone,
> Vincent Legoll <vincent.legoll@gmail.com> writes:
> > Is that showing the same (or a similar) problem :
> >
> > https://data.guix-patches.cbaines.net/gnu/store/0lcbxpw1vrca02dzpzw2rxhad7pn4zw7-gcc-objc-5.5.0
> >
> > ?
> Can you clarify what you mean?  I'm not sure what you're referring to.
> Chris Marusich <cmmarusich@gmail.com> writes:
> > At present, it seems possible that within the context of a single
> > machine, gcc-stripped-tarball-5.5.0.drv builds reproducibly, but on a
> > different machine, it may (reproducibly) build a different output.
> > I'm a bit paranoid about making mistakes, so I'll perform another full
> > GC and then try yet again to build gcc-stripped-tarball-5.5.0.drv in
> > order to verify whether it truly produces the same output when all (or
> > nearly all) of its inputs are rebuilt from scratch.
> I repeated the experiment on the same machine (it took a day or two to
> build), and the result was the same: on my machine,
> gcc-stripped-tarball-5.5.0.drv builds identical output to what it built
> before. To be clear, using Guix 8159ce1970d91567468cf1bacac313099a009d2a
> on an x86_64-linux machine, I tried (yet again) the following steps:

> Efraim's diff looks a little different in statx.h, even though he used
> the same Guix commit as me.  Maybe this is because he cross-compiled on
> an aarch64-linux machine, while I cross-compiled on an x86_64-linux
> machine.  In the other cases, it looks like the binary files differ in
> basically the same ways.  I will share some examples below.
> Here is some diffoscope output between my c++ and Efraim's (many other
> sections also differed in similarly cryptic ways):

> If I'm reading this correctly, one problem seems to be that our GCC
> toolchains are putting symbols at different locations.  This issue (and
> maybe others) could be trickling down, causing other aspects of the
> binaries to differ (e.g., in length).  Nothing really stands out, but
> when we discussed this on IRC, we thought perhaps factors like the
> following might contribute to the non-reproducibility:
> - Perhaps we are all running different Linux kernel versions?  In some
>   cases, the kernel version can unfortunately influence the build
>   output, so this might be worth testing.
> - Perhaps the GCC Makefiles etc. are doing something non-deterministic?

Questions triggered in my mind:

Where are respective machines getting their rules for packing and
aligning structs and unions?

Is any struct or rule/flags source dynamically generated, where different
rules could come from different defaults, or .configs, or even invalid
memoizations jumping domains?

Could pointer arithmetic get done in one domain and the offset be
misused in another? Wrong C preprocessor?

Difference in sort key comparisons for canonicalization of ordering?

Hope that's not all red herrings :)
Sorry for the noise otherwise.

> - Something else?

Hm, some race condition between processes that should be order-independent
but are not.

Then if different hardware components on different systems -- disks, memory,
processors -- cause different but repeatable patterns of waits (convoying?)
you could get repeatable but different builds.

I guess you'd have to figure out which order was really right, and force
the order of processing explicitly to that order, so all systems would
do it that way.

> Avenues of investigation:
> - If anything obvious stands out from the diffoscope output, please
>   leave a comment.
> - Try building with different kernel versions on the same machine, to
>   see if they differ.
> - If somebody else could please confirm that running the following
>   command reports no difference on their own machine (i.e., exit code
>   0), that would be good to know, since it would help further solidify
>   the theory that on a single machine, the build of gcc-static-5.5.0.drv
>   is reproducible, even if it is not reproducible across machines:
>   guix build --no-substitutes --check --target=powerpc64-linux-gnu \
>        -e '(@@ (gnu packages make-bootstrap) %gcc-static)'
> - Try building two different versions of gcc-7.5.0 (maybe by hand?), and
>   then use them to build a simple reproduction case and compare results.
>   If we're lucky, maybe this will help us understand the problem better.
> We'll get there!
> -- 
> Chris

Bengt Richter

reply via email to

[Prev in Thread] Current Thread [Next in Thread]