[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: missing patch for texlive-bin (e77412362f)
From: |
zimoun |
Subject: |
Re: missing patch for texlive-bin (e77412362f) |
Date: |
Thu, 03 Feb 2022 19:51:24 +0100 |
Hi Timothy,
On Thu, 03 Feb 2022 at 10:46, Timothy Sample <samplet@ngyro.com> wrote:
>> But the question is if Disarchive dissambles and preserves external
>> patches. Timothy?
[...]
> The bad news is that 0.75 is not there. At first I was going to
> apologize for the shortcomings of the sampling approach... until I
> realized you are trying to trick me! ;) Unless I’m misreading the Git
> history, that patch appeared and disappeared on core-updates and was
> never part of master.
Because of the good news, the same could be applied for these patches,
no?
For instance, one missing patch–as Maxime pointed it–is there:
https://github.com/archlinux/svntogit-packages/blob/155510dd18d2f290085f40d2a95a3701db4a224d/texlive-bin/repos/extra-x86_64/pdftex-poppler0.75.patch
And SWH contains it:
https://archive.softwareheritage.org/browse/revision/155510dd18d2f290085f40d2a95a3701db4a224d/?path=texlive-bin/repos/extra-x86_64/pdftex-poppler0.75.patch
Therefore, somehow it “only” misses to dissamble this data and add an
entry to the database, no?
I miss what you mean by «was never part of master». After the merge,
what was core-updates and what was master is somehow indistinguishable,
no? Or are you walking only to first-parent after merge commit? Well,
Git history and sorting leads to headache; as git-log doc shows. :-)
I think it is fine to simplify “complex” history with a sampling
considering only first-parent walk.
> The way the PoG script tracks down sources is pretty robust. It takes
> the derivation graph to be canonical, and only uses the graph of
> high-level objects (packages, origins, etc.) for extra info. I do my
> best to follow the links of the high-level objects, and then verify that
> I did a good job by lowering them and checking coverage against the set
> of derivations found by following the derivation graph. Since the
> derivation graph necessarily contains everything that matters, this is a
> good way to track down all the high-level objects that matter. See
> <https://git.ngyro.com/preservation-of-guix/tree/pog-inferior.scm#n113>
> for a rather scary looking procedure that finds the edges of the
> high-level object graph.
Cool! Thanks for explaining and pointing how PoG is doing.
> That being said, coverage is not perfect. The most obvious problem (to
> me) is the sampling approach. Surely there are sources that are missed
> by only examining one commit per week. This can be checked and fixed by
> using data from the Guix Data Service, which has data from essentially
> every Guix commit.
No, the Data Service and even Cuirass are using a sampling approach too;
they do not process all the commits.
Cuirass uses a «every 5 minutes» approach; please CI savvy people
correct me if I mistake. The Data Service uses a «batch guix-commits»
approach; more details in this thread [1].
Well, the coverage is twofold, IMHO.
1. preserve what is currently entering in Guix
2. archive what was available in Guix
About #1, the main mechanism are sources.json, “guix lint”, and update
disarchive-db (now done by CI). What is missed should be fixed by #2.
About #2, it is hard to fix all the issues at once. One commit per week
already provides a good view to spot some problems. Somehow, process
all the commits just means burn more CPU; it seems “easy” once the
infrastructure is in-place, no?
1: <https://yhetil.org/guix/863617oe1h.fsf@gmail.com/>
Cheers,
simon