qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/5] q35: Remove old machines and unused comp


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v2 0/5] q35: Remove old machines and unused compat code
Date: Fri, 12 Feb 2016 11:58:19 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

"Michael S. Tsirkin" <address@hidden> writes:

> On Thu, Feb 11, 2016 at 01:51:30PM -0200, Eduardo Habkost wrote:
>> On Sat, Feb 06, 2016 at 08:34:07PM +0200, Michael S. Tsirkin wrote:
>> > On Fri, Feb 05, 2016 at 12:46:11PM -0200, Eduardo Habkost wrote:
>> > > On Fri, Feb 05, 2016 at 12:14:16AM +0200, Michael S. Tsirkin wrote:
>> > > > On Thu, Feb 04, 2016 at 05:09:44PM -0200, Eduardo Habkost wrote:
>> > > > > On Thu, Feb 04, 2016 at 08:02:30PM +0200, Michael S. Tsirkin wrote:
>> > > > > > On Thu, Feb 04, 2016 at 03:16:17PM -0200, Eduardo Habkost wrote:
>> > > > > > > On Thu, Feb 04, 2016 at 06:01:50PM +0200, Michael S. Tsirkin 
>> > > > > > > wrote:
>> > > > > > > > On Sat, Jan 23, 2016 at 02:02:08PM -0200, Eduardo Habkost 
>> > > > > > > > wrote:
>> > > > > > > > > This is another attempt to remove old q35 machine code. Now 
>> > > > > > > > > I am
>> > > > > > > > > also removing unused compat code to demonstrate the benefit 
>> > > > > > > > > of
>> > > > > > > > > throwing away the old code that nobody uses.
>> > > > > > > > 
>> > > > > > > > The same thing I said applies - we don't know that nobody uses 
>> > > > > > > > old q35
>> > > > > > > > machine types.
>> > > > > > > > We do know we don't need to migrate to/from them,
>> > > > > > > > so we can drop compat code.
>> > > > > > > > But please add aliases so people can still start these 
>> > > > > > > > machines.
>> > > > > > > 
>> > > > > > > If people use them, they can easily update their configurations.
>> > > > > > > I will copy and paste the reply Markus sent 4 months ago below.
>> > > > > > > 
>> > > > > > > On Mon, Sep 14, 2015 at 09:18:47AM +0200, Markus Armbruster 
>> > > > > > > wrote:
>> > > > > > > > We've been through this before, but we can go through it once 
>> > > > > > > > more.
>> > > > > > > > Choices:
>> > > > > > > > 
>> > > > > > > > A. Remove old machine type
>> > > > > > > > 
>> > > > > > > >    A guest using it can't be started.  Easy to understand on 
>> > > > > > > > the host.
>> > > > > > > >    An error message advising to switch to a newer machine type 
>> > > > > > > > would be
>> > > > > > > >    a nice touch.
>> > > > > > > > 
>> > > > > > > >    This is a clean break in backward compatibility.  To be 
>> > > > > > > > mentioned in
>> > > > > > > >    release notes, obviously.
>> > > > > > > > 
>> > > > > > > > B. Change old machine type in a guest-visible way
>> > > > > > > > 
>> > > > > > > >    Depending on the nature of the change and the guest, a 
>> > > > > > > > guest using it
>> > > > > > > >    either doesn't notice, copes with it successfully, or fails 
>> > > > > > > > in
>> > > > > > > >    guest-specific ways.  If the latter, the failure can be 
>> > > > > > > > "guest
>> > > > > > > >    hangs", which is much harder to figure out than A.
>> > > > > > > > 
>> > > > > > > >    Unless we can *demonstrate* that nothing bad happens for 
>> > > > > > > > all the
>> > > > > > > >    guests people actually use with the old machine types, this 
>> > > > > > > > is a
>> > > > > > > >    different kind of backward compatibility break.
>> > > > > > > > 
>> > > > > > > >    Demonstrating this is feels infeasible to me, but you're 
>> > > > > > > > welcome to
>> > > > > > > >    try.
>> > > > > > > > 
>> > > > > > > > I could call the difference between the two a tradeoff, but 
>> > > > > > > > since we've
>> > > > > > > > been through this before, I'll be more blunt: choosing B robs 
>> > > > > > > > Peter (the
>> > > > > > > > guy with guests where badness happens) to pay Paul (the guy 
>> > > > > > > > with guests
>> > > > > > > > that cope).  Paul is saved the inconvenience of having to read 
>> > > > > > > > release
>> > > > > > > > notes or his logs, and change machine types.  Peter pays for 
>> > > > > > > > that with
>> > > > > > > > figuring out WTF his guests are doing now.
>> > > > > > > > 
>> > > > > > > > As a user, I'd pick a clean break in backward compatibility 
>> > > > > > > > over a hack
>> > > > > > > > that preserves effective compatibility when it works, but 
>> > > > > > > > breaks it
>> > > > > > > > uncleanly when it doesn't.
>> > > > > > > > 
>> > > > > > > > As a developer, I'm insisting on it.
>> > > > > > > > 
>> > > > > > > > So, if you want B, the onus is on *you* to show us why nothing 
>> > > > > > > > bad will
>> > > > > > > > happen.
>> > > > > > > > 
>> > > > > > 
>> > > > > > I agree with the conclusion for option B.  But I think the correct
>> > > > > > solution is not A, it is to analyse changes, maybe even test, and 
>> > > > > > show
>> > > > > > that nothing bad can happen.
>> > > > > 
>> > > > > Do you volunteer for that work?
>> > > > 
>> > > > Nope, sorry. It's your idea, your patchset.
>> > > 
>> > > It's your idea. You are the one proposing to waste resources
>> > > keeping an old machine-type name "working" just because you don't
>> > > want users (who we don't even know if they actually exist) to
>> > > update their configurations on a QEMU upgrade.
>> > > 
>> > > I am proposing the opposite: dropping support to a feature that
>> > > people are unlikely to be using, in a very clear way.
>> > 
>> > What will happen with installed VMs? Will they break?
>> 
>> They won't start unless the QEMU command-line is changed, because
>> they are using a feature QEMU won't support anymore. Why is that
>> a problem?
>
> We don't support installing one machine type, then switching.
> So they won't start unless guest is re-installed.
> Not nice.

As Paolo said, this isn't true.  You're overstating your case.

I'm going to sound like a broken record, but here goes again: switching
machine types means switching hardware.  You have to power off to do it.
The OS will cope if the new hardware sufficiently similar.

In many cases, you can't know whether it'll cope without testing it.  We
can test only a finite set of (OS, old-hardware, new-hardware) tuples.
Our strategy to deal with this is as follows:

1. We make an effort to ensure switching between old and new machine
   types works for the most common OSes.  This effort is somewhat
   haphazard upstream, because we lack the resources for more systematic
   testing.  There's a reason enterprise downstreams exist.

2. A user who wants maximum guest ABI stability should ask for a
   specific version of the machine type.

   When a user asks for a specific machine type, we better provide
   exactly that, no ifs no buts.  Any guest-visible change is a bug.

   Your proposal to silently provide a newer version of q35 would be
   such a bug.

   We very occasionally cheat and make changes that we think are
   extremely unlikely to be noticed.  This is actually the pragmatic
   backward compatibility strategy we employ across the board: an
   incompatibility can be accepted when we're convinced users won't
   notice, and avoiding it would be too costly.

   To make use of this exception for your proposal, *you* have convince
   us that users won't notice.

   We occasionally drop features that aren't worth their keep.  If a
   replacement exists, we point to it.  Experimental features we even
   drop without prior notice.

   I'll rehash why q35 was experimental until recently below.

3. Users who can accept small ABI variations can use the latest machine
   type of a versioned set, via the version-less alias.  Not asking for
   a machine type is even easier, and gets you the latest version of the
   default machine type, but works only when there is a default machine
   type, and exposes you to a big ABI change when the default machine
   type changes.

>> Why do you want to waste so much time keeping a feature that
>> people are not even supposed to be using?
>
> Why aren't people supposed to use this feature?
>
>
>> > 
>> > > 
>> > > > I am saying, look
>> > > > for some low-hanging fruit.  Find some compat options we can
>> > > > drop without breaking guests, drop just these.  Are there
>> > > > options we need for piix anyway? No point in dropping them at
>> > > > all.
>> > > > 
>> > > > For example, the builtin AML can be dropped since we always use
>> > > > a bios with acpi support now.  It is also trivial to test.
>> > > > 
>> > > > Memory layout is probably ok to change.
>> > > > 
>> > > > Maybe more.
>> > > > 
>> > > > > > 
>> > > > > > Because A suffers from exactly the same problem if people
>> > > > > > just blindly switch to a new machine type.
>> > > > > 
>> > > > > Anything can happen if people change their configurations
>> > > > > blindly.
>> > > > > 
>> > > > > Nobody should change configuration blindly, and that's also
>> > > > > why we shouldn't run a different machine when the user is
>> > > > > asking for an old one. We don't know why the user is asking
>> > > > > for an old machine and we can't make decisions for the user.
>> > > > > Management software might know why an old machine is being
>> > > > > used and might be able to help update the config, but QEMU
>> > > > > doesn't.
>> > > > 
>> > > > What guidance do we provide? Try it and see if it works?  What
>> > > > exactly do we ask user to test? If QEMU developers can't find
>> > > > out whether switching a machine type is safe, what hope is
>> > > > there that management developers can?
>> > > 
>> > > Exactly the same guidance vendors already provide for people that
>> > > want to change machine types today. It depends on who wrote the
>> > > config files and why, and we can't and shouldn't make any
>> > > guesses.
>> > 
>> > AFAIK that's basically "don't do it".
>> 
>> So, are you saying that changing a machine is a risky thing, but
>> at the same time you are proposing that QEMU does that silently
>> for the user?
>
> We do this all the time. Any change in qemu changes something.
> Compatibility is a question of testing.

You are the one proposing to change the old machine types.  The onus is
on *you* to show that your proposed changes won't be noticed and are
worthwhile.

I have no opinion to offer on "won't be noticed", simply because I'm
convinced changing them is an egregious waste of resources.

Eduardo and I don't want to change the old machine types, we want to
*drop* them.  The onus is on us to show that dropping them won't unduly
inconvenience users.  Our arguments:

1. There is no evidence of non-experimental use.

2. There is no sane reason for non-experimental use (see below).

3. Existing users (if any) are better off moving to newer machine types.
   The old machine types provide no stability anyway, so there's no
   extra risk in moving.

4. However, moving them to newer machine types *silently* is
   inappropriate.  Users should be given the chance to assess the risk
   and plan their move.  In the meantime, they can stay on the old QEMU
   version.  Failing attempts to use the old machine types with a
   suitable error message or documentation accomplishes that.

5. Maintaining old machine types that should not be used anymore is a
   pointless waste of resources.

>> Really, just because you believe somebody out there is using an
>> experimental machine-type and is too afraid to change to a newer
>> one, you want to make QEMU developers waste their time testing
>> and maintaing old compat code and old machine-types forever?
>
> What exactly made it experimental?

I've wasted too much time in this thread for today already, so I'll be
brief:

1. Q35 barely worked.  The -hda/-cdrom command line sugar didn't work
   until 2.2.  Chipset emulation was full of bugs until 2.4.  Migration
   didn't work at all until 2.4.  That's were Eduardo proposes to make
   the cut.

2. Until 2.4, we pretty much ignored backward compatibility when
   changing Q35.  For instance, something like commit c8b5b20 would be
   unacceptable for a "real" machine type.

3. The features Q35 provides over i440FX have become usable only
   recently.

Why would anyone go to the trouble to ask for a machine type that
provides only disadvantages for any reason but experimenting with it?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]