qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order tr


From: Daniel P . Berrangé
Subject: Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal
Date: Thu, 7 Jul 2022 16:07:44 +0100
User-agent: Mutt/2.2.6 (2022-06-05)

On Thu, Jul 07, 2022 at 04:27:35PM +0200, Markus Armbruster wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
> > On Tue, 5 Jul 2022 at 10:54, Markus Armbruster <armbru@redhat.com> wrote:
> >>
> >> QDict is implemented as a simple hash table of fixed size.  Observe:
> >>
> >> * Slow for large n.  Not sure this matters.
> >>
> >> * A QDict with n entries takes 4120 + n * 32 bytes on my box.  Wastes
> >>   space for small n, which is a common case.
> >>
> >> * Order of traversal depends on the hash function and on insertion
> >>   order, because it iterates first over buckets, then collision
> >>   chains.
> >>
> >> * Special code ensures qdict_size() takes constant time.
> >>
> >> Replace the hash table by a linked list.  Observe:
> >>
> >> * Even slower for large n.  Might be bad enough to matter.
> >>
> >> * A QDict with n entries takes 32 + n * 24 bytes.
> >>
> >> * Traversal is in insertion order.
> >>
> >> * qdict_size() is linear in the number of entries.
> >>
> >> This is an experiment.  Do not commit to master as is.
> >>
> >> The change of traversal order affects expected test output.  I updated
> >> only the tests covered by "make check" so far.  I expect some more to
> >> hide under tests/qemu-iotests/.
> >
> > Seems to fix the 'rocker' device regression, at least in that
> > it no longer gives an error message on startup.
> >
> > The amount of patching you had to do to expected-output files
> > in 'tests' suggests we have quite a lot of test cases that
> > are currently implicitly reliant on the hash table traversal
> > order, which is not guaranteed to remain stable.
> 
> Correct.
> 
> I expect to find a few more in tests not run by "make check".
> 
> >                                                  Regardless of
> > what we do with this patch it would probably be a good idea
> > for whatever is outputting the text these tests are comparing
> > against to be made to use a stable output order (alphabetical??).
> 
> Traversal order before the patch depends on the (fixed) size of the hash
> table and the has function for (string) keys.  Both have remained
> unchanged since the initial commit (2009), which is why we've gotten
> away with relying on it in tests.

We're lucky to have got away with such a crude hash table impl without
anyone (to our knowledge) identifying a way to abuse it for a denial
of service. For robustness, string key hashing functions really ought
to be non-deterministic between invokations of the program at the very
least, even better if non-deterministic across each hashtable instance.
If hashing is deterministic then a malicious user can sometimes abuse
this to force the hash table performance to degrade from O(log(n)) to
O(n). A fast cryptographic hash function is the gold standard.

See also https://lwn.net/Articles/474912/

I recall we discussed this back in the day when that LWN article came
out and decided QEMU was safe-ish. The believe was that QEMU doesn't
take JSON input from any source that is less trustworthy than itself.
Or when it does, it is already expected that the upper level mgmt
layer will have santised input before passing it to QEMU. eg a qcow2
file can contain JSON backing store, but if an app gets a random disk
image from untrusted source, it must already validate that a backing
store absent or sane, before invoking QEMU (eg to  avoid telling
QEMU to open /dev/shadow as its backing store).

None the less, I would still be more comfortable if we have a robust
non-determinstic hashing function, if we have even the slighest feeling
that that O(1) lookup is important for any part of QEMU's QDict usage.

The QMP communication of course is all betwween QEMU and libvirt, where
QEMU is the untrusted party, so libvirt needs protection aaginst
malicious QEMU, but QEMU trustes libvirt.  QEMU doesn't talk to the
guest agent itself, so ok in that area too, and of course the CLI argv
is trusted.

Particular with the multi-process QEMU work though, and/or vhostuser
backends, etc, we are probably moving to a world where some parts of
QEMU are less trusted than other parts of QEMU. So this kind of thing
may yet become important to some degree.

> Traversal order after the patch depends on insertion order.  I think
> this is already an improvement for the tests: now the expected output
> depends on what the test does, not on how qdict.c does its job.

For machine targetted output it is a saner. For human targetted
output I'd say no significant win.

> If we think this still isn't good enough, we can investigate how to get
> test output where the keys are in alphabetical order.

This essentially comes down to the JSON serializer, which iterates over
the keys when outputting json objects. The serializer could apply sorting
of keys itself based on any criteria it chooses, regardless of what the
QDict impl provides. Probably make JSON serializer marginally less
efficient but not neccessarily the end of the world.

Co-incidentally it'd make libvirt a little happier, as we store the
output of many QMP commands for our test suite, and those output docs
can change wildly as we update them for new QEMU versions / git snapshots


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]