[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
From: |
Kevin Wolf |
Subject: |
Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus |
Date: |
Wed, 23 Oct 2019 10:17:50 +0200 |
User-agent: |
Mutt/1.12.1 (2019-06-15) |
Am 23.10.2019 um 09:57 hat Peter Xu geschrieben:
> On Sat, Oct 19, 2019 at 11:41:53AM +0800, Peter Xu wrote:
> > On Wed, Oct 16, 2019 at 11:40:01AM -0300, Eduardo Habkost wrote:
> > > On Wed, Oct 16, 2019 at 10:29:29AM +0800, Peter Xu wrote:
> > > > v2:
> > > > - use uint32_t rather than int64_t [Juan]
> > > > - one more patch (patch 4) to check dup SaveStateEntry [Dave]
> > > > - one more patch to define a macro (patch 1) to simplify patch 2
> > > >
> > > > Please review, thanks.
> > >
> > > I wonder how hard it is to write a simple test case to reproduce
> > > the original bug. We can extend tests/migration-test.c or
> > > tests/acceptance/migration.py. If using -device with explicit
> > > apic-id, we probably don't even need to create >255 VCPUs.
> >
> > I can give it a shot next week. :)
>
> When trying this, I probably noticed a block layer issue: q35 seems to
> have problem on booting from a very small block device (like 512B,
> which is the image size that currently used for migration-test.c).
> For example, this cmdline can boot successfully into the test image:
>
> $qemu -M pc -m 200m -accel kvm -nographic \
> -drive file=$image,id=drive0,index=0,format=raw \
> -device ide-hd,drive=drive0
>
> While this cannot:
>
> $qemu -M q35 -m 200m -accel kvm -nographic \
> -drive file=$image,id=drive0,index=0,format=raw \
> -device ide-hd,drive=drive0
The important difference here is legacy IDE (which works) vs. AHCI
(which doesn't work). If you add a -device ahci to the -M pc case, it
starts failing, too.
Not sure why AHCI fails, but I'll just CC John who is the lucky
maintainer of this device. :-)
Kevin
> With error (BIOS debug messages on):
>
> Booting from Hard Disk..invalid basic_access:143:
> a=00000201 b=00000000 c=00000001 d=00000080 ds=0000 es=07c0 ss=d980
> si=00000000 di=00000000 bp=00000000 sp=0000fd8e cs=f000 ip=cb81 f=0202
> invalid basic_access:144:
> a=00000201 b=00000000 c=00000001 d=00000080 ds=0000 es=07c0 ss=d980
> si=00000000 di=00000000 bp=00000000 sp=0000fd8e cs=f000 ip=cb81 f=0202
> .
> Boot failed: could not read the boot disenter handle_18:
> NULL
> k
>
> This corresponds to this SeaBIOS check error:
>
> static void noinline
> basic_access(struct bregs *regs, struct drive_s *drive_fl, u16 command)
> {
> ...
> // sanity check on cyl heads, sec
> if (cylinder >= nlc || head >= nlh || sector > nls) {
> warn_invalid(regs);
> disk_ret(regs, DISK_RET_EPARAM);
> return;
> }
> ...
> }
>
> And... below cmdline will work even for q35 (as suggested by Fam when
> we talked offline):
>
> $qemu -M q35 -m 200m -accel kvm -nographic \
> -drive file=$image,id=drive0,index=0,format=raw \
> -device ide-hd,drive=drive0,secs=1,cyls=1,heads=1
>
> I think for migration test we can workaround like above, but I'm also
> curious whether this is a real bug somewhere because I don't see a
> reason for q35 to refuse to boot on a one-sector image.
>
> Thanks,
>
> --
> Peter Xu
>