qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/5] QEMU Gating CI


From: Daniel P . Berrangé
Subject: Re: [PATCH 0/5] QEMU Gating CI
Date: Mon, 27 Apr 2020 16:20:36 +0100
User-agent: Mutt/1.13.3 (2020-01-12)

On Mon, Apr 27, 2020 at 04:41:38PM +0200, Philippe Mathieu-Daudé wrote:
> On 4/27/20 4:28 PM, Cleber Rosa wrote:
> > On Mon, 27 Apr 2020 12:51:36 +0200
> > Philippe Mathieu-Daudé <address@hidden> wrote:
> > 
> > > On 4/27/20 7:12 AM, Cleber Rosa wrote:
> > > > On Thu, 23 Apr 2020 23:28:21 +0200
> > > > Philippe Mathieu-Daudé <address@hidden> wrote:
> > > [...]
> > > > > In some cases custom runners are acceptable. These runners won't be
> > > > > "gating" but can post informative log and status.
> > > > > 
> > > > 
> > > > Well, I have the feeling that some people maintaining those runners
> > > > will *not* want to have them as "informational" only.  If they
> > > > invest a good amount of time on them, I believe they'll want to
> > > > reap the benefits such as other not breaking the code they rely on.
> > > >   If their system is not gating, they lose that and may find
> > > > breakage that CI did not catch.  Again, I don't think "easily
> > > > accessible" hardware should be the only criteria for
> > > > gating/non-gating status.
> > > > 
> > > > For instance, would you consider, say, a "Raspberry Pi 4 Model
> > > > B", running KVM jobs to be a reproducible runner?  Would you blame a
> > > > developer that breaks a Gating CI job on such a platform and says
> > > > that he can not reproduce it?
> > > 
> > > I'm not sure I understood the problem, as I'd answer "yes" but I
> > > guess you expect me to say "no"?
> > > 
> > 
> > What I mean is: would you blame such a developer for *not* having a
> > machine himself/herself that he/she can try to reproduce the failure?
> > And would you consider a "Raspberry Pi 4 Model B" an easily available
> > hardware?
> 
> My view on this is if someone merged code in mainstream QEMU and maintains
> it, and if it is not easy to reproduce the setup (for a bug reported by a CI
> script), then it is the responsibility of the maintainer to resolve it.
> Either by providing particular access to the hardware, or be ready to spend
> a long debugging session over email and multiple time zones.
> 
> If it is not possible, then this specific code/setup can not claim for
> gating CI, and eventually mainstream isn't the best place for it.

I'd caution to be wary about using gating CI as a big stick for hitting
contributors with. The more rules we put in place whicih contributors
have to follow before their work gets accepted for merge, the less likely
someone is to have a positive experiance contributing to the project, or
even be willing to try. This view of gating CI requirements was a negative
aspect of contributing to the OpenStack project, which drove people away.
There was pushback against contributing work because it lacked CI, but
there was often no viable way for to actually provide CI in a feasible
timeframe, especially for stuff only testable in physical hardware and
not VMs. Even if you work for a big company, it doesn't make it easy to
magic up money to spend on hardware & hosting to provide CI, as corporate
beaurcracy & priorities will get in your way.

I'd really encourage the more nuanced approach of thinking in terms of
tiered support levels:

  - Tier 1: features that we have gating CI tests for. Will always work.

  - Tier 2: features that we have non-gating CI test for. Should work at
            time of release, but may be broken for periods in git master.
            
  - Tier 3: features that we don't have CI tests for. Compile tested only,
            relying on end user manual testing, so may or may not work
            at any time or in release.

Obviously tier 1 is the gold standard that we would like everything to
achieve but we'll never achieve that reality unless we cull 90% of QEMU's
code. I don't think that's in the best interests of our users, because
clearly stuff in Tier 2 and Tier 3 is still useful for a large portion of
our end users - not least because Tier 3 is the level everything is at
right now in QEMU unless using a downstream vendor's packages.

The tier levels and CI are largely around setting reasonable quality
expectations. Right now we often have a problem that poeople want to
re-factor code but are afraid of breaking existing functionality that
guests rely on. This causes delays in merging code or causes people to
not even attempt the refactoring in the first place. This harms our
forward progress in QEMU.

With gating CI, we are declaring that contributors should feel free to
refactor anything as long as it passes gating CI. IOW, contributors only
have to care about Tier 1 features continuing to work. It would be nice
if refactoring does not breaks stuff in Tier 2 / 3, but if it does, then
that is acceptable collatoral damage. We would not block the merge on
stuff that is Tier 2 / 3.

Based on what I experianced in OpenStack the other big challenge is
deciding when something can be promoted from Tier 2 to Tier 1. They
had the official gating CI (for Tier 1) being maintained by the core
project infrastructure team. Any CI provided by third party companies
was non-gating (Tier 2) (at least in the time I was involved) because
they did not want code merge blocked on ability to communicate with
third party companies who were often hard to contact when CI broke.
So the only real path from Tier 2 to Tier 1 was to give the project
direct access to the CI hardware, instead of having the providing
company self-manage it.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]