Re: [RFC] QEMU Gating CI

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] QEMU Gating CI

From:	Thomas Huth
Subject:	Re: [RFC] QEMU Gating CI
Date:	Wed, 4 Dec 2019 09:55:50 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 03/12/2019 15.07, Alex Bennée wrote:
[...]
>> GitLab Jobs and Pipelines
>> -------------------------
>>
>> GitLab CI is built around two major concepts: jobs and pipelines.  The
>> current GitLab CI configuration in QEMU uses jobs only (or putting it
>> another way, all jobs in a single pipeline stage).

Yeah, the initial gitlab-ci.yml file was one of the very first YAML file
and one the very first CI files that I wrote, with hardly any experience
in this area ... there is definitely a lot of room for improvement here!

>>  Consider the
>> folowing job definition[9]:
>>
>>    build-tci:
>>     script:
>>     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x 
>> x86_64"
>>     - ./configure --enable-tcg-interpreter
>>          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; 
>> done)"
>>     - make -j2
>>     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
>>     - for tg in $TARGETS ; do
>>         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
>>         ./tests/boot-serial-test || exit 1 ;
>>         ./tests/cdrom-test || exit 1 ;
>>       done
>>     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
>>     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test 
>> -m slow
>>
>> All the lines under "script" are performed sequentially.  It should be
>> clear that there's the possibility of breaking this down into multiple
>> stages, so that a build happens first, and then "common set of tests"
>> run in parallel.  Using the example above, it would look something
>> like:
>>
>>    +---------------+------------------------+
>>    |  BUILD STAGE  |        TEST STAGE      |
>>    +---------------+------------------------+
>>    |   +-------+   |  +------------------+  |
>>    |   | build |   |  | boot-serial-test |  |
>>    |   +-------+   |  +------------------+  |
>>    |               |                        |
>>    |               |  +------------------+  |
>>    |               |  | cdrom-test       |  |
>>    |               |  +------------------+  |
>>    |               |                        |
>>    |               |  +------------------+  |
>>    |               |  | x86_64-pxe-test  |  |
>>    |               |  +------------------+  |
>>    |               |                        |
>>    |               |  +------------------+  |
>>    |               |  | s390x-pxe-test   |  |
>>    |               |  +------------------+  |
>>    |               |                        |
>>    +---------------+------------------------+
>>
>> Of course it would be silly to break down that job into smaller jobs that
>> would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
>> the pipeline approach is valid because:
>>
>>  * Common set of tests would run in parallel, giving a quicker result
>>    turnaround

Ok, full ack for the idea to use separate pipelines for the testing
(Philippe once showed me this idea already, too, he's using it for EDK2
testing IIRC). But the example with the build-tci is quite bad. The
single steps here are basically just a subset of "check-qtest" to skip
the tests that we are not interested in here. If we don't care about
losing some minutes of testing, we can simply replace all those steps
with "make check-qtest" again.

I think what we really want to put into different pipelines are the
sub-steps of "make check", i.e.:

- check-block
- check-qapi-schema
- check-unit
- check-softfloat
- check-qtest
- check-decodetree

And of course also the other ones that are not included in "make check"
yet, e.g. "check-acceptance" etc.

> check-unit is a good candidate for parallel tests. The others depends -
> I've recently turned most make check's back to -j 1 on travis because
> it's a real pain to see what test has hung when other tests keep
> running.

If I understood correctly, it's not about running the check steps in
parallel with "make -jXX" in one pipeline, but rather about running the
different test steps in different pipelines. So you get a separate
output for each test subsystem.

>> Current limitations for a multi-stage pipeline
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Because it's assumed that each job will happen in an isolated and
>> independent execution environment, jobs must explicitly define the
>> resources that will be shared between stages.  GitLab will make sure
>> the same source code revision will be available on all jobs
>> automatically.  Additionaly, GitLab supports the concept of artifacts.
>> By defining artifacts in the "build" stage, jobs in the "test" stage
>> can expect to have a copy of those artifacts automatically.
>>
>> In theory, there's nothing that prevents an entire QEMU build
>> directory, to be treated as an artifact.  In practice, there are
>> predefined limits on GitLab that prevents that from being possible,
>> resulting in errors such as:
>>
>>    Uploading artifacts...
>>    build: found 3164 matching files                   
>>    ERROR: Uploading artifacts to coordinator... too large archive
>>           id=xxxxxxx responseStatus=413 Request Entity Too Large
>>           status=413 Request Entity Too Large token=yyyyyyyyy
>>    FATAL: too large                                   
>>    ERROR: Job failed: exit code 1
>>
>> As far as I can tell, this is an instance define limit that's clearly
>> influenced by storage costs.  I see a few possible solutions to this
>> limitation:
>>
>>  1) Provide our own "artifact" like solution that uses our own storage
>>     solution
>>
>>  2) Reduce or eliminate the dependency on a complete build tree
>>
>> The first solution can go against the general trend of not having to
>> maintain CI infrastructure.  It could be made simpler by using cloud
>> storage, but there would still be some interaction with another
>> external infrastructure component.
>>
>> I find the second solution preferrable, given that most tests depend
>> on having one or a few binaries available.  I've run multi-stage
>> pipelines with some of those binaries (qemu-img,
>> $target-softmmu/qemu-system-$target) defined as artifcats and they
>> behaved as expected.  But, this could require some intrusive changes
>> to the current "make"-based test invocation.

I think it should be sufficient to define a simple set of artifacts like:

- tests/*
- *-softmmu/qemu-system-*
- qemu-img, qemu-nbd ... and all the other helper binaries
- Makefile*

... and maybe some more missing files. It's some initial work, but once
we have the basic list, I don't expect to change it much in the course
of time.

 Thomas

[Prev in Thread]

Current Thread

[Next in Thread]

[RFC] QEMU Gating CI, Cleber Rosa, 2019/12/02
- Re: [RFC] QEMU Gating CI, Stefan Hajnoczi, 2019/12/02
  - Re: [RFC] QEMU Gating CI, Peter Maydell, 2019/12/02
    - Re: [RFC] QEMU Gating CI, Cleber Rosa, 2019/12/02
    - Re: [RFC] QEMU Gating CI, Warner Losh, 2019/12/02
    - Re: [RFC] QEMU Gating CI, Cleber Rosa, 2019/12/02
  - Re: [RFC] QEMU Gating CI, Cleber Rosa, 2019/12/02
    - Re: [RFC] QEMU Gating CI, Stefan Hajnoczi, 2019/12/03
- Re: [RFC] QEMU Gating CI, Alex Bennée, 2019/12/03
  - Re: [RFC] QEMU Gating CI, Thomas Huth <=
  - Re: [RFC] QEMU Gating CI, Cleber Rosa, 2019/12/06
- Re: [RFC] QEMU Gating CI, Peter Maydell, 2019/12/03
  - Re: [RFC] QEMU Gating CI, Cleber Rosa, 2019/12/05

Prev by Date: [PATCH] target/i386: relax assert when old host kernels don't include msrs
Next by Date: [PATCH] Revert "qemu-options.hx: Update for reboot-timeout parameter"
Previous by thread: Re: [RFC] QEMU Gating CI
Next by thread: Re: [RFC] QEMU Gating CI
Index(es):
- Date
- Thread