guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

to PR or not to PR, is /that/ the question?


From: Giovanni Biscuolo
Subject: to PR or not to PR, is /that/ the question?
Date: Wed, 13 Sep 2023 17:31:41 +0200

Hi Simon,

please forgive me if I continue drifting in this beautiful sea...

This message is /very/ long but it's just because I'm trying to do my
best to provide a deep analisys of the **big** problems with eventually
adopting a «web based PR model» (please see below for a definition) for
contributions to Guix.

Please skip this if you don't have enough time to drift like I'm doing!

Also, please consider I don't have much experience on how a «web based
PR model» works and /doesn't work/ in practice... but I'm studying :-)

Simon Tournier <zimon.toutoune@gmail.com> writes:

[...]

> This branch of the looooong thread starts with:
>
>         Re: How can we decrease the cognitive overhead for contributors?
>         Attila Lendvai <attila@lendvai.name>
>         Fri, 25 Aug 2023 08:07:53 +0000
>         
> id:JRUs8LVm3AtAh0MnHjE5rnhB4sNET0vWTOO2N3w2KfvKoM3CALRNwHTmwJ0Y9Bg3ZDWCs8j-1bMCo9aRiaoac8TAkuXAvrWgSwt_8RcwhQA=@lendvai.name
>         
> https://yhetil.org/guix/JRUs8LVm3AtAh0MnHjE5rnhB4sNET0vWTOO2N3w2KfvKoM3CALRNwHTmwJ0Y9Bg3ZDWCs8j-1bMCo9aRiaoac8TAkuXAvrWgSwt_8RcwhQA=@lendvai.name
>         https://lists.gnu.org/archive/html/guix-devel/2023-08
>
> Attila speaks about the PR model.

Actually he is speaking about «the web based PR model».

I don't want to sound pedantic, but [1]:

--8<---------------cut here---------------start------------->8---

It's worth noting that the term pull request is not universally used:
GitLab calls them merge requests for example. Furthermore I regard the
terms pull request and merge request to be poorly named, as the terms
can be conflated with terminology used by your version control tool
(e.g. git pull or git merge).  And the implementations of a pull or
merge request may not even perform a pull or a merge (you can also
rebase a pull/merge request, but nobody is calling them rebase
requests). A modern day pull request is so much more than a version
control tool operation or even a simple request to pull or merge a
branch: it is a nexus to track the integration of a proposed change
before during and after that change is integrated.

--8<---------------cut here---------------end--------------->8---

Please let me underline: PR it's *a* nexus, one of the different one
available with a DVCS.  With git, for example, we also have "git
request-pull", with it's email based workflow.

--8<---------------cut here---------------start------------->8---

But alas. Because GitHub coined the term and is the most popular
collaboration platform implementing this functionality, I'll refer to
this general workflow as implemented on GitHub, GitLab, Bitbucket, and
others as pull requests for the remainder of this post.

[...] At its core, the pull request is fundamentally a nice UI and
feature layer built around the common Git feature branch workflow.

--8<---------------cut here---------------end--------------->8---

As I tried to tell elsewhere, all so called "pull request" features
incorporated in various "web based forges" are just /user interfaces/
built around the Git feature branch workflow.

...and guess what: Guix _is_ implementing a Git feature branch workflow
[2]!  Guix is also starting to use a clever "request merging" workflow,
integrated with qa.guix.gnu.org [3], see
https://issues.guix.gnu.org/65846 as an example.

Guix is just not implementing a feature branch workflow to incorporate
_commits_ coming from the contributors, and I still think this is wise.

Furthermore:

--8<---------------cut here---------------start------------->8---

[...] I think it is time for industry to scrutinize the pull request
model and to evolve it into something better.

I know what you are thinking: you are thinking that pull requests work
great and that they are popular because they are a superior model
compared to what came before. These statements - aside from some nuance
- are true. But if you live in the version control space (like I do) or
are paid to deliver tools and workflows to developers to improve
productivity and code/product quality (which I am), the deficiencies in
the pull request workflow and implementation of that workflow among
vendors like GitHub, GitLab, Bitbucket, etc are obvious and begging to
be overhauled if not replaced wholesale.

[...] In other words, the way I see the world is that a specific
vendor's pull request implementation is just that: an implementation
detail. And like all implementation details, they should be frequently
scrutinized and changed, if necessary.

--8<---------------cut here---------------end--------------->8---

The rest of the article is worth reading since it is a professional
analisys of the problems with "Pull Requests" /implementations/ and
workflow, backed by some (sort of?) research:

--8<---------------cut here---------------start------------->8---

[...] the current implementation of pull requests actively discourages
the many smaller changes workflow. And since smaller changes result in
higher quality and faster reviews, today's implementations of pull
requests are undermining quality and velocity.

[...] I posit that in order for us to author more, smaller changes, we
must either a) create more, smaller pull requests or b) have pull
request reviews put emphasis on the individual commits (as opposed to
the overall merge diff).

If we were to author more, smaller pull requests, this would seemingly
necessitate the need for dependencies between pull requests in order to
maintain velocity. And dependencies between pull requests adds a
potentially prohibitive amount of overhead.

--8<---------------cut here---------------end--------------->8---

Ouch! :-D

--8<---------------cut here---------------start------------->8---

In other words, I don't think you can implement the multiple pull
request model reliably and without causing excessive burden on people
without fundamentally changing the requirement that a pull request be a
Git branch. (I'd love to be proven wrong.)

--8<---------------cut here---------------end--------------->8---

Can we consider this a thoughtful argument _not_ to use "GitHub-like
pull requests" when contributing to Guix?

--8<---------------cut here---------------start------------->8---

Therefore, I don't think the more, smaller changes workflow can be
easily practiced with multiple pull requests using the common GitHub
model without effectively moving the definition of a pull request away
from equivalence with a Git branch

[...] Unfortunately, a trivial change of the default to show individual
commits instead of the merge diff is not so simple, as many authors and
projects don't practice clean commit authorship practices, where
individual commits are authored such that they can be reviewed in
isolation.

[...] A handful of mature projects - like the Linux kernel, Firefox,
Chrome, Git, and Mercurial - practice the series of individually-good
commits model, which I'll call a commit-centric workflow.

[...] I'm a strong proponent of a clean commit history where each commit
in the final repository history stands as good in isolation. But I tend
to favor more grown-up software development practices and am a version
control guru. That being said, the subject/debate is fodder for another
post.

--8<---------------cut here---------------end--------------->8---

Guix is also practicing a "individually-good commits model", right?

--8<---------------cut here---------------start------------->8---

If GitHub (or someone else) switched the pull request default to a
per-commit review without otherwise changing the relationship between a
pull request and a Git branch, that would force a lot of less
experienced users to familiarize themselves with history rewriting in
Git. This would impose considerable pain and suffering on pull request
authors, which would in turn upset users, hurt engagement, etc.

[...] But even if these services did emphasize individual commits by
default in pull request reviews, there's still a handful of significant
deficiencies that would undermine the more, smaller changes workflow
that we desire.

While it is possible to review individual commits, all the review
comments are still funneled into a single per pull request timeline view
of activity. If submitter and reviewer make the effort to craft and
subsequently review individual commits, your reward is that all the
feedback for the discrete units of change gets lumped together into one
massive pile of feedback for the pull request as a whole.

[...] Even if GitHub (or someone else) implements robust per-commit
review for pull requests, there's still a problem with velocity. And
that problem is that if the pull request is your unit of integration
(read: merging), then you have to wait until every commit is reviewed
before integration can occur. [...] I argue this is less optimal than a
world where a change integrates as soon as it is ready to, without
having to wait for the changes after it. As an author and maintainer, if
I see a change that is ready to integrate, I prefer to integrate it as
soon as possible, without delay.

The longer a ready-to-integrate change lingers, the longer it is
susceptible to bit rot (when the change is no longer valid/good due to
other changes in the system).

[...] What happens when some commits in a pull request are integrated
and the author rebases or merges their local branch against their new
changes? This may or may not just work. And when it doesn't just work,
the author can easily find themselves in merge conflict hell, where one
commit after the other fails to apply cleanly and their carefully
curated stack of commits quickly becomes a liability and impediment to
forward progress.

[...] While it is certainly possible to integrate changes as soon as
they are ready with a pull request workflow, I think that it is awkward
and that by the time you've made enough changes to accommodate the
workflow, very little is left of the pull request workflow as we know it
and it is effectively a different workflow altogether.

[...] But even if you don't buy into the change size arguments, there's
still a very valid reason why we should think beyond pull requests as
they are implemented today: tool scalability.

[...] Unfortunately, the tight coupling of pull requests to Git
branches/refs introduces unbound growth and a myriad of problems
associated with it. Most projects may not grow to a size that
experiences these problems. But as someone who has experience with this
problem space at multiple companies, I can tell you the problem is very
real and the performance and scalability issues it creates undermines
the viability of using today's implementation of pull requests once
you've reached a certain scale. Since we can likely fix the underlying
scaling issues with Git, I don't think the explosion of Git refs is a
long-term deal breaker for scaling pull requests. But it is today and
will remain so until Git and the tools built on top of it improve.

* Exploring Alternative Models

A pull request is merely an implementation pattern for the general
problem space of integrating a proposed change. There are other patterns
used by other tools. Before I describe them, I want to coin the term
integration request to refer to the generic concept of requesting some
change being integrated elsewhere.  GitHub pull requests and GitLab
merge requests are implementations of integration requests, for example.

[...] Before Git and GitHub came along, you were probably running a
centralized version control tool which didn't support offline commits or
feature branches (e.g. CVS or Subversion). In this world, the common
mechanism for integration requests was exchanging diffs or patches
through various media - email, post to a web service of a code review
tool, etc. Your version control tool didn't speak directly to a VCS
server to initiate an integration request. Instead, you would run a
command which would export a text-based representation of the change and
then send it somewhere.

--8<---------------cut here---------------end--------------->8---

Please let me say this in other words: email exchange of integration
requests is VCS agnostic: right?

I'm not proposing to replace git with another VCS :-D... but who knows
in 5 years what tool will fit better? :-O

--8<---------------cut here---------------start------------->8---

Today, we can classify integration requests by whether or not they speak
the version control tool's native protocol for exchanging data or
whether they exchange patches through some other mechanism. Pull
requests speak the VCS native protocol. Tools like Review Board and
Phabricator exchange patches via custom HTTP web services. Typically,
tools using non-native exchange will require additional client-side
configuration, including potentially the installation of a custom tool
(e.g. RBTools for Review Board or Arcanist for Phabricator). Although
modern version control tools sometimes have this functionality built-in.

[...] An interesting outlier is Gerrit, which ingests its integration
requests via git push. [...] The wizard behind the curtain here is that
Gerrit runs a special Git server that implements non-standard behavior
[...]. When you push to refs/for/master, Gerrit receives your Git push
like a normal Git server would. But instead of writing a ref named
refs/for/master, it takes the incoming commits and ingests them into a
code review request!

[...] On the surface, it may seem like using the version control tool's
native data exchange is a superior workflow because it is more native
and more modern. (Emailing patches is so old school.) [...] Instead, you
run git push and your changes can be turned into an integration request
automatically or with a few mouse clicks. And from a technical level,
this exchange methodology is likely safer, as round-tripping a
text-based representation of a change without data loss is surprisingly
finicky.

[...] But despite being more native, modern, and arguably robust,
exchange via the version control tool may not be better.

[...] When your integration request requires the use of a version
control tool's wire protocol, the client likely needs to be running that
version control tool. With other approaches like exchange of text based
patches, the client could be running any software it wanted: as long as
it could spit out a patch or API request in the format the server
needed, an integration request could be created! This meant there was
less potential for lock-in, as people could use their own tools on their
machines if they wanted and they (hopefully) wouldn't be inflicting
their choice on others.

[...] Because Firefox is using Phabricator (Review Board and Bugzilla
before that) for code review and because Phabricator ingests text-based
patches, the choice of the VCS on the client doesn't matter that much
and the choice of the server VCS can be made without inciting a holy war
among developers who would be forced to use a tool they don't
prefer. Yes, there are good reasons for using a consistent tool
(including organizational overhead) and sometimes mandates for tool use
are justified. But in many cases (such as random open source
contributions), it probably doesn't or shouldn't matter. And in cases
like Git and Mercurial, where tools like the fantastic git-cinnabar make
it possible to easily convert between the repositories without data loss
and acceptable overhead, adoption of the version control tool's native
wire protocol can exclude or inhibit the productivity of contributors
since it can mandate use of specific, undesired tooling.

[...] With Gerrit, I don't have to create a local Git branch to initiate
an integration request. With pull requests, I'm compelled to. And this
can undermine my productivity by compelling me to practice
less-efficient workflows!

[...] Our final point of comparison involves scalability. When you use
the version control tool wire protocol as part of integration requests,
you have introduced the problem of scaling your version control
server. Take it from someone who has had multiple jobs involving scaling
version control servers and who is intimately aware of the low-level
details of both the Git and Mercurial wire protocols: you don't want to
be in the business of scaling a version control server.

[...] Can your version control server handle ingesting a push every
second or two with reasonable performance?  Unless you are Google,
Facebook, or a handful of other companies I'm aware of, it can't. And
before you cry that I'm talking about problems that only plague the
0.01% of companies out there, I can name a handful of companies under
10% the size of these behemoths where this is a problem for them. And I
also guarantee that many people don't have client-side metrics for their
git push P99 times or reliability and don't even realize there is a
problem! Scaling version control is probably not a core part of your
company's business. Unfortunately, it all too often becomes something
companies have to allocate resources for because of poorly designed or
utilized tools.

Contrast the challenges of scaling integration requests with a native
version control server versus just exchanging patches. With the more
primitive approach, [...]

[...] Unfortunately, solutions like GitHub pull requests and Gerrit's
use of Git refs for storing everything exert a lot of pressure on
scaling the version control server and make this a very real problem
once you reach a certain scale. [...]

* Commit Tracking

[...] For example, if you submit a commit then amend it, how does the
system know that the commit evolved from commit X to X'.

Pull requests don't track commits directly. Instead, a commit is part of
a Git branch and that branch is tracked as the entity the pull request
is built around. The review interface presents the merge diff front and
center. It is possible to view individual commits. But as far as I know,
none of these tools have smarts to explicitly track or map commits
across new submissions.

[...] If all you are familiar with is pull requests, you may not realize
there are alternatives to commit tracking! In fact, the most common
alternative (which isn't do nothing) predates pull requests entirely and
is still practiced by various tools today.

The way that Gerrit, Phabricator, and Review Board work is the commit
message contains a unique token identifying the integration request for
that commit. e.g. a commit message for a Phabricator review will contain
the line Differential Revision:
https://phab.mercurial-scm.org/D7543. Gerrit will have something like
Change-Id: Id9bfca21f7697ebf85d6a6fa7bac7de4358d7a43.

[...] This Git hook will ensure that any newly-created commit has a
Change-ID: XXX line containing a randomly generated, hopefully unique
identifier.

[...] This approach of inserting a tracking identifier into commit
messages works surprisingly well for tracking the evolution of commits!
Even if you amend, reorder, insert, or remove commits, the tool can
often figure out what matches up to previous submissions and reconcile
state accordingly. Although support for this varies by tool.

[...] The tracking of commits is another one of those areas where the
simpler and more modern features of pull requests often don't work as
well as the solutions that came before. Yes, inserting an identifier
into commit messages feels hacky and can be brittle at times (some tools
don't implement commit rewriting very well and this can lead to a poor
user experience). But you can't argue with the results: using explicit,
stable identifiers to track commits is far more robust than the
heuristics that pull requests rely on.  The false negative/positive rate
is so much lower.

[...] The use of explicit commit tracking identifiers may not seem like
it makes a meaningful difference. But it's impact is profound.

The obvious benefit of tracking identifiers is that they allow rewriting
commits without confusing the integration request tool. This means that
people can perform advanced history rewriting with near impunity as to
how it would affect the integration request.

[...] I am a heavy history rewriter. I like curating a series of
individually high-quality commits that can each stand in isolation. When
I submit a series like this to a GitHub pull request and receive
feedback on something I need to change, when I enact those changes I
have to think will my rewriting history here make re-review harder? (I
try to be empathetic with the reviewer and make their life easier
whenever possible. I ask what I would appreciate someone doing if I were
reviewing their change and tend to do that.) With GitHub pull requests,
if I reorder commits or add or remove a commit in the middle of a
series, I realize that this may make review comments left on those
commits hard to find since GitHub won't be able to sort out the history
rewriting. And this may mean those review comments get lost and are
ultimately not acted upon, leading to bugs or otherwise deficient
changes. This is a textbook example of tooling deficiencies dictating a
sub-optimal workflow and outcome: because pull requests don't track
commits explicitly, I'm forced to adopt a non-ideal workflow or
sacrifice something like commit quality in order to minimize risks that
the review tool won't get confused. In general, tools should not
externalize these kinds of costs or trade-offs onto users: they should
just work and optimize for generally agreed-upon ideal outcomes.

[...] Another benefit to tracking identifiers is that they enable
per-commit review to be viable. Once you can track the logical evolution
of a single commit, you can start to associate things like review
comments with individual commits with a high degree of confidence.

[...] A secondary benefit of per-commit review is that this model
enables incremental integration workflows, where some commits in a
series or set can integrate before others, without having to wait for
the entire batch.

[...] But actually deploying this workflow can be tricky. One problem is
that your version control tool may get confused when you rebase or merge
partially landed state. Another problem is it can increase the overall
change rate of the repository, which may strain systems from version
control to CI to deployment mechanisms. Another potential problem
involves communicating review sign-off from integration sign-off. Many
tools/workflows conflate I sign off on this change and I sign off on
landing this change. While they are effectively identical in many cases,
there are some valid cases where you want to track these distinctly. And
adopting a workflow where commits can integrate incrementally will
expose these corner cases. So before you go down this path, you want to
be thinking about who integrates commits and when they are
integrated. (You should probably be thinking about this anyway because
it is important.)

[...] [Forks] existence forces the user to manage an additional Git
remote and branches. It forces people to remember to keep their branches
in sync on their fork. As if remembering to keep your local repository
in sync wasn't hard enough!

[...] Forks are essentially a veneer on top of a server-side git
clone. And the reason why a separate Git repository is used at all is
probably because the earliest versions of GitHub were just a pile of
abstractions over git commands. The service took off in popularity,
people copied its features almost verbatim, and nobody ever looked back
and thought why are we doing things like this in the first place.

[...] Could Git grow features to make the user experience much better so
users don't need to be burdened with complexity or magic and could
simply run commands like git submit --for review?  Definitely!

--8<---------------cut here---------------end--------------->8---

...meanwhile, I don't think thay adopting a «web based PR model» for
the Guix project is a good idea.


--8<---------------cut here---------------start------------->8---

My ideal integration request revolves around individual commits, not
branches.

[...] In this world, the branch would not matter. Instead, commits are
king. Because we would be abandoning the branch name as a tracker for
the integration request, we would need something to replace it,
otherwise we have no way of knowing how to update an existing
integration request!

[...] commit-centric integration requests aren't forcing you to change
your local workflow! If you are the type of person who doesn't want to
curate a ton of small, good-in-isolation commits (it does take a bit
more work after all), nobody would be forcing you to do so. Instead, if
this is your commit authorship pattern, the submission of the proposed
change could squash these commits together as part of the submission,
optionally rewriting your local history in the process. If you want to
keep dozens of fixup commits around in your local history, that's fine:
just have the tooling collapse them all together on submission.

[...] Making integration requests commit-centric doesn't force people to
adopt a different commit authorship workflow. But it does enable
projects that wish to adopt more mature commit hygiene to do so. That
being said, hows tools are implemented can impose restrictions. But
that's nothing about commit-centric review that fundamentally prohibits
the use of fixup commits in local workflows.

[...] I also largely ignored some general topics like the value that an
integration request can serve on the overall development lifecycle:
integration requests are more than just code review - they serve as a
nexus to track the evolution of a change throughout time.

--8<---------------cut here---------------end--------------->8---

In the Guix project (and many other) the evolution of changes throughout
time - made possible by the use of the "integration request" model
instead of the "pull request" model - is tracked by the guix-commit
mailing list, that is auto populated by a server side git hook:
https://lists.gnu.org/archive/html/guix-commits/

> Then many comments later about the
> pros and cons, the discussion is split intro the contributor side and
> the reviewer side of the PR model.  Ricardo reports then their
> experience reviewing Pull Request:
>
>         Re: How can we decrease the cognitive overhead for contributors?
>         Ricardo Wurmus <rekado@elephly.net>
>         Fri, 08 Sep 2023 16:44:41 +0200
>         id:87sf7o67ia.fsf@elephly.net
>         87sf7o67ia.fsf@elephly.net">https://yhetil.org/guix/87sf7o67ia.fsf@elephly.net
>         https://lists.gnu.org/archive/html/guix-devel/2023-09
>
> And I provide one typical example where “our“ model leads to some
> friction for the reviewer: the first step, apply the patches.
>
> And this exact same friction does not exist in the PR model by design of
> this very PR model.

I don't know if really "reviewer apply the first patch" friction does
not exist by design in a PR model (especially a web based one), but
AFAIU many other frictions /do/ exist... by design!

[...]

(I'm studying for a reply to the rest of the message, stay tuned! :-D )

Happy hacking! Gio'



[1]
https://gregoryszorc.com/blog/2020/01/07/problems-with-pull-requests-and-how-to-fix-them/
 (2020-01-07)

[2] look at the Branches listed here: https://qa.guix.gnu.org/

[3] 
https://guix.gnu.org/en/manual/devel/en/html_node/Managing-Patches-and-Branches.html

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]