qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] gitlab: remove unreliable avocado CI jobs


From: Stefan Hajnoczi
Subject: Re: [PATCH] gitlab: remove unreliable avocado CI jobs
Date: Tue, 12 Sep 2023 12:19:30 -0400



On Tue, Sep 12, 2023, 12:14 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote:
>
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote:
> >> The avocado-system-alpine, avocado-system-fedora, and
> >> avocado-system-ubuntu jobs are unreliable. I identified them while
> >> looking over CI failures from the past week:
> >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614
> >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654
> >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571
> >>
> >> Thomas Huth suggest on IRC today that there may be a legitimate failure
> >> in there:
> >>
> >>   th_huth: f4bug, yes, seems like it does not start at all correctly on
> >>   alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody
> >>   noticed this by now, this is worrying
> >>
> >> It crept in because the jobs were already unreliable.
> >>
> >> I don't know how to interpret the job output, so all I can do is to
> >> propose removing these jobs. A useful CI job has two outcomes: pass or
> >> fail. Timeouts and other in-between states are not useful because they
> >> require constant triaging by someone who understands the details of the
> >> tests and they can occur when run against pull requests that have
> >> nothing to do with the area covered by the test.
> >>
> >> Hopefully test owners will be able to identify the root causes and solve
> >> them so that these jobs can stay. In their current state the jobs are
> >> not useful since I cannot cannot tell whether job failures are real or
> >> just intermittent when merging qemu.git pull requests.
> >>
> >> If you are a test owner, please take a look.
> >>
> >> It is likely that other avocado-system-* CI jobs have similar failures
> >> from time to time, but I'll leave them as long as they are passing.
> >>
> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884
> >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> ---
> >>  .gitlab-ci.d/buildtest.yml | 27 ---------------------------
> >>  1 file changed, 27 deletions(-)
> >>
> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> >> index aee9101507..83ce448c4d 100644
> >> --- a/.gitlab-ci.d/buildtest.yml
> >> +++ b/.gitlab-ci.d/buildtest.yml
> >> @@ -22,15 +22,6 @@ check-system-alpine:
> >>      IMAGE: alpine
> >>      MAKE_CHECK_ARGS: check-unit check-qtest
> >> 
> >> -avocado-system-alpine:
> >> -  extends: .avocado_test_job_template
> >> -  needs:
> >> -    - job: build-system-alpine
> >> -      artifacts: true
> >> -  variables:
> >> -    IMAGE: alpine
> >> -    MAKE_CHECK_ARGS: check-avocado
> >
> > Instead of entirely deleting, I'd suggest adding
> >
> >    # Disabled due to frequent random failures
> >    # https://gitlab.com/qemu-project/qemu/-/issues/1884
> >    when: manual
> >
> > See example: https://docs.gitlab.com/ee/ci/yaml/#when
> >
> > This disables the job from running unless someone explicitly
> > tells it to run
>
> What I don't understand is why we didn't gate the release back when they
> first tripped. We should have noticed between:
>
>   https://gitlab.com/qemu-project/qemu/-/pipelines/956543770
>
> and
>
>   https://gitlab.com/qemu-project/qemu/-/pipelines/957154381
>
> that the system tests where regressing. Yet we merged the changes
> anyway.

I think that green series is misleading, based on Richard's
mail on list wrt the TCG pull series:

  https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04014.html

  "It's some sort of timing issue, which sometimes goes away
   when re-run. I was re-running tests *a lot* in order to
   get them to go green while running the 8.1 release. "


Essentially I'd put this down to the tests being soo non-deterministic
that we've given up trusting them.

Yes.

Stefan


With regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]