qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Our use of #include is undisciplined, and what to do about


From: Markus Armbruster
Subject: [Qemu-devel] Our use of #include is undisciplined, and what to do about it
Date: Tue, 15 Mar 2016 10:29:06 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

This is kind of a meta-cover-letter for the multiple include cleanup
series I hope to post as time permits.  The first one will go out later
today.

I'm afraid it's a bit long.  You may want to skip ahead to "= What to do
about it =".

Stefan cc'ed because tracing is part of the problem.  Search for
"tracers".


= The status quo and why I hate it =

I've seen several schools of thought on use of #include.

There's the "no #include in headers" school: every .c file includes
exactly the headers it needs, and the prerequisites they need.  Cyclic
inclusion becomes impossible.  You can't sweep cyclic dependencies under
the rug.  Headers are read just once per compilation unit.  The amount
of crap you include is clearly visible.  However, maintaining the
#include directives is a drag, not least because their order matters.
Especially when headers neglect to spell out their dependencies.  Or
they do, but it's wrong.

There's the "headers must be self-contained" school: every header
includes everything it needs.  Headers can be included in any order.
Sorted #include directives are tidy and easy to navigate.  Headers can
be read multiple times, which can only hurt compilation time.  You need
to make an effort to avoid cyclic dependencies and excessive inclusion.

And then there's the school of non-thought: when it doesn't compile,
sprinkle #include on the mess semi-randomly until it does.

We do a bit of all three, but the result looks awfully close to what the
school of non-thought produces.

Every .c file includes qemu/osdep.h first.  For me, a .c file that
includes nothing but that comes out well over half a Megabyte in >23k
lines preprocessed.  Where does all this crap come from?

  #lines  KiBytes  #files  source
    5233     102       5   QEMU
    8035     159      70   system
    7915     224      73   GLib
    2458      89       1   # lines
   23641     576     149   total

"# lines" are lines added by the preprocessor so the rest of the
compiler can keep track of source locations.

Having the compiler wade through almost half a Megabyte of system+GLib
crap before it begins to consider the code we care about feels wasteful.
Perhaps we should rethink our approach to including library headers.

Of the 102K that are actually our own, just 7K come from include/.  95K
come from qapi-types.h.

Judging from the .d files in my build tree, 95% of the .c files include
qemu-common.h.  That makes things a good deal worse.  Without
NEED_CPU_H, this adds a modest 44K of our own headers, but almost 100K
of system headers:

  #lines  KiBytes  #files  source
    6938     146      16   QEMU
   11426     254      74   system
    7915     224      73   GLib
    2658     100       1   # lines
   28937     726     164   total

NEED_CPU_H adds another 120K of our own headers:

  #lines  KiBytes  #files  source
   11534     263      43   QEMU
   11548     256      78   system
    7915     225      72   GLib
    3370     138       1   # lines
   34367     883     194   total

The average size of a .c file is just over 15KiB.  To get to the actual
C code there, the compiler has to wade through at least 550-880KiB of
headers.  In other words, roughly 2% of the source comes from .c in the
best case.

But that's not even the worst part.  The worst part by far are our
"touch this and recompile the world" headers.

I find just short 4000 .d files in my build tree.  Guess how many of our
headers are listed as prerequisites in more than 90% of them (thus
touching them will recompile the .c file)?  *Twenty-two*.  Almost fifty
recompile more half of the world.

Naturally, touching osdep.h or anything it includes recompiles the
world.  These are:

    config-host.h
    include/glib-compat.h
    include/qapi/error.h
    include/qemu/compiler.h
    include/qemu/osdep.h
    include/qemu/typedefs.h
    include/sysemu/os-posix.h
    qapi-types.h

NEED_CPU_H adds

    config-target.h

Fine, except for qapi/error.h and qapi-types.h.  The latter is an itch I
need to scratch urgently.  My first patch series will take a swing at
it.

qemu-common.h without NEED_CPU_H adds

    include/fpu/softfloat.h
    include/qapi/qmp/qdict.h
    include/qapi/qmp/qlist.h
    include/qapi/qmp/qobject.h
    include/qemu-common.h
    include/qemu/atomic.h
    include/qemu/bswap.h
    include/qemu/fprintf-fn.h
    include/qemu/host-utils.h
    include/qemu/module.h
    include/qemu/option.h
    include/qemu/queue.h

Remember, these get included into 95% of all .c files.  I'm pretty sure
the fraction that actually needs QDicts or QemuOpts is much, much lower.

NEED_CPU_H further adds

    include/disas/bfd.h
    include/exec/cpu-all.h
    include/exec/cpu-common.h
    include/exec/cpu-defs.h
    include/exec/exec-all.h
    include/exec/hwaddr.h
    include/exec/memattrs.h
    include/exec/memory.h
    include/hw/hotplug.h
    include/hw/i386/apic.h
    include/hw/irq.h
    include/hw/qdev-core.h
    include/qemu/bitmap.h
    include/qemu/bitops.h
    include/qemu/int128.h
    include/qemu/log.h
    include/qemu/notify.h
    include/qemu/rcu.h
    include/qemu/thread-posix.h
    include/qemu/thread.h
    include/qom/cpu.h
    include/qom/object.h
    include/standard-headers/asm-x86/hyperv.h
    include/standard-headers/linux/types.h
    target-i386/cpu-qom.h
    target-i386/cpu.h
    target-i386/svm.h
    tcg/i386/tcg-target.h
    x86_64-softmmu/config-target.h

A fun exercise is to count occurences of each header in .d files and
multiply their number by their size.  That's the number of bytes read
from them when compiling from scratch.  Top scorers:

 size * count    size   count
    525760413  698221     753 trace/generated-tracers.h
    298039140   93723    3180 qapi-types.h
    197442619   55759    3541 include/qom/object.h
    185845916   53884    3449 include/exec/memory.h
    143750444   36878    3898 /usr/include/glib-2.0/glib/gunicode.h
    117362690   30643    3830 include/fpu/softfloat.h
    109783272   28164    3898 /usr/include/glib-2.0/glib/gregex.h
    105830700   27150    3898 /usr/include/glib-2.0/glib/gvariant.h
     92972157  123469     753 trace/generated-events.h
     88706786   22757    3898 /usr/include/glib-2.0/glib/gtestutils.h

The grand total of size * count is 5.4 GiBytes.  That's ~4600 times the
size of all .c and .h files in the repo :)


= What to do about it =

The immediately obvious thing to do is reduce "recompile the world"
headers that change frequently.  I've started to do that.

Another one is attacking widely included bulky files (see "Top
scorers").  Some can simply be included less.  Others need to be split,
in particular the generated tracers.

Yet another one is reviewing the way we include system and GLib headers.

But our root problem is our undisciplined use of #include.  Can we agree
on a sane set of rules?  Here's my proposal:

1. Have a carefully curated header that's included everywhere first.  We
   got that already thanks to Peter: osdep.h.

2. Headers should normally include everything they need beyond osdep.h.
   If exceptions are needed for some reason, they must be documented in
   the header.  If all that's needed from a header is typedefs, put
   those into qemu/typedefs.h instead of including the header.

3. Cyclic inclusion is forbidden.

Nice to have: "make check" checks 2. and 3.

Opinions?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]