Re: [Qemu-devel] [PATCH] s390-ccw: Fix alignment for CCW1

From: Eric Farman
Subject: Re: [Qemu-devel] [PATCH] s390-ccw: Fix alignment for CCW1
Date: Tue, 29 Aug 2017 14:45:51 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 08/29/2017 08:45 AM, Cornelia Huck wrote:
On Tue, 29 Aug 2017 08:39:27 -0400
Farhan Ali <address@hidden> wrote:

On 08/29/2017 08:04 AM, Cornelia Huck wrote:
On Mon, 28 Aug 2017 10:28:53 -0400
Farhan Ali <address@hidden> wrote:
On 08/28/2017 10:19 AM, Halil Pasic wrote:

On 08/28/2017 04:15 PM, Farhan Ali wrote:

On 08/28/2017 10:05 AM, Cornelia Huck wrote:
It's the alignment of the CCW which causes the problem.

The exact error message when starting the guest was:

! No virtio device found !

Since it worked for SCSI and CDL, and failed for LDL disks on that particular 
system, we are not really sure what caused the failure.
Debugging it further showed the CCW for LDL disks were not aligned at double 
word boundary.
This is really, really odd, as the low-level ccw code is the same for
any disk type...
Trying the test on a different system with LDL disks worked fine, with the 
aligned(8) fix.
Do you happen to have an old s390-ccw.img laying around in the test folder? 
QEMU might pick up
this one (e.g. when calling it without libvirt from the command line).
I explicitly mention the bios to use with '-bios' option and pick up the
latest bios. Without the aligned fix I see the error and with the fix it
works fine.
Wait, so the fix fixes it? Or am I confused now?

It fixes in my system and one other system we tried on. But fails on a system 
where this issue was first noticed.

This is very confusing. So you have tried -bios on the system
where the issue was first noticed and the issue still persists
despite of the fixed bios is specified?

The system where the issue was first noticed, applying the fix for the
bios, fixes for:

1) CDL disks
2) SCSI disks

But fails for LDL disk.

On my system and one other system, the fix works for all the disk types,
CDL, SCSI and LDL and fixes the issue.

Are you using different toolchains on the failing and the working
systems? Does it work when you copy the bios from a working system? >>>
(Clutching at straws here...)

So yesterday we realized for the failing system, the bios wasn't being
built on that system rather it was being built on a different system and
being copied over to the failing system. :/

Not sure I understand this. I thought the bios was being built on the system it would be used on, with the source residing on a shared disk mounted via NFS.

Oh dear... the system it was built on hopefully was missing the fix,
right? (I'm getting a bit paranoid here.)

I was also getting paranoid watching this. So I did some poking... It looks exactly like Peter suggested last week:


There were multiple $QEMUSRC directories on this system. At least one 2.9.xx version didn't have commit 198c0d1f9df8c4 (and thus wouldn't care about the boundary alignment), while others did. The aligned(8) fix described here was not applied universally, resulting in, uh, inconsistent results. Shared systems are fun. :)

After a little cleanup, the results from that system match what the rest of us have seen/expected.

Building the bios on the failing system with the fix, resolves the issue
and we did not see anymore failures.
So I think I can safely say this patch fixes the alignment problem.

Out of interest, which toolchain are you using? My rebuild is on F26.

F24 on the problematic system, F25 on mine, but this was a red herring.

 - Eric

