qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimizat


From: Jan Kiszka
Subject: Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
Date: Mon, 13 Feb 2012 20:22:21 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2012-02-13 19:50, Blue Swirl wrote:
> On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <address@hidden> wrote:
>> On 2012-02-11 16:25, Blue Swirl wrote:
>>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <address@hidden> wrote:
>>>> This enables acceleration for MMIO-based TPR registers accesses of
>>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>>>> either on older Intel CPUs (without flexpriority feature, can also be
>>>> manually disabled for testing) or any current AMD processor.
>>>>
>>>> The approach introduced here is derived from the original version of
>>>> qemu-kvm. It was refactored, documented, and extended by support for
>>>> user space APIC emulation, both with and without KVM acceleration. The
>>>> VMState format was kept compatible, so was the ABI to the option ROM
>>>> that implements the guest-side para-virtualized driver service. This
>>>> enables seamless migration from qemu-kvm to upstream or, one day,
>>>> between KVM and TCG mode.
>>>>
>>>> The basic concept goes like this:
>>>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>>>   irqchip) a vmcall hypercall is registered
>>>>  - VAPIC option ROM is loaded into guest
>>>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>>>  - TPR accesses are trapped and patched in the guest to call into option
>>>>   ROM instead, VAPIC support is enabled
>>>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>>>   poll for pending IRQs if required
>>>>
>>>> Signed-off-by: Jan Kiszka <address@hidden>
>>>
>>> I must say that I find the approach horrible, patching guests and ROMs
>>> and looking up Windows internals. Taking the same approach to extreme,
>>> we could for example patch Xen guest to become a KVM guest. Not that I
>>> object merging.
>>
>> Yes, this is horrible. But there is no real better way in the absence of
>> hardware assisted virtualization of the TPR. I think MS is recommending
>> this patching approach as well.
> 
> Maybe instead of routing via ROM and the hypercall, the TPR accesses
> could be handled directly with guest invisible breakpoints (like GDB
> breakpoints, but for QEMU internal use), much like other
> instrumentation could be handled.

Gleb answered it already.

>>>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>>>>  {
>>>>     APICCommonState *s = APIC_COMMON(dev);
>>>>     APICCommonClass *info;
>>>> +    static DeviceState *vapic;
>>>>     static int apic_no;
>>>>
>>>>     if (apic_no >= MAX_APICS) {
>>>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>>>>     info = APIC_COMMON_GET_CLASS(s);
>>>>     info->init(s);
>>>>
>>>> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
>>>> +    sysbus_init_mmio(dev, &s->io_memory);
>>>> +
>>>> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
>>>> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
>>>> +    }
>>>> +    s->vapic = vapic;
>>>> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {
>>>
>>> I think you should not rely on apic_report_tpr_access being in sane
>>> condition during class init.
>>
>> It is mandatory, e.g. for CPU hotplug, as reporting needs to be
>> consistent accross all VCPUs. Therefore it is a static global, set to
>> false initially. However, you are right, we lack proper clearing of  the
>> access report feature on reset, not only in this variable.
> 
> I'd also set it to false initially.

It's a global variable, thus initialized to false by definition.

>>>> +
>>>> +#define VAPIC_CPU_SHIFT         7
>>>> +
>>>> +#define ROM_BLOCK_SIZE          512
>>>> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
>>>> +
>>>> +typedef struct VAPICHandlers {
>>>> +    uint32_t set_tpr;
>>>> +    uint32_t set_tpr_eax;
>>>> +    uint32_t get_tpr[8];
>>>> +    uint32_t get_tpr_stack;
>>>> +} QEMU_PACKED VAPICHandlers;
>>>> +
>>>> +typedef struct GuestROMState {
>>>> +    char signature[8];
>>>> +    uint32_t vaddr;
>>>
>>> This does not look 64 bit clean.
>>
>> It's packed.
> 
> I meant "virtual address could be 64 bits on a 64 bit host", not
> structure packing.

This is for 32-bit guests only. 64-bit Windows doesn't access the TPR
via MMIO, thus is not activating the VAPIC.

>>>> +    uint32_t state;
>>>> +    uint32_t rom_state_paddr;
>>>> +    uint32_t rom_state_vaddr;
>>>> +    uint32_t vapic_paddr;
>>>> +    uint32_t real_tpr_addr;
>>>> +    GuestROMState rom_state;
>>>> +    size_t rom_size;
>>>> +} VAPICROMState;
>>>> +
>>>> +#define TPR_INSTR_IS_WRITE              0x1
>>>> +#define TPR_INSTR_ABS_MODRM             0x2
>>>> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
>>>> +
>>>> +typedef struct TPRInstruction {
>>>> +    uint8_t opcode;
>>>> +    uint8_t modrm_reg;
>>>> +    unsigned int flags;
>>>> +    size_t length;
>>>> +    off_t addr_offset;
>>>> +} TPRInstruction;
>>>
>>> Also here the order is pessimized.
>>
>> Don't see the gain here, though.
> 
> There are two bytes' hole between modrm_reg and flags, maybe also 4
> bytes between length and addr_offset (if size_t is 32 bits but off_t
> 64 bits). I'd reverse the order so that members with largest alignment
> needs come first.

Well, but this won't make the struct smaller. I prefer to keep the
ordering in which we also initialize it.

> 
>>>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
>>>> +{
>>>> +    target_phys_addr_t paddr;
>>>> +    target_ulong addr;
>>>> +
>>>> +    if (s->state == VAPIC_ACTIVE) {
>>>> +        return 0;
>>>> +    }
>>>> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) 
>>>> {
>>>> +        paddr = cpu_get_phys_page_debug(env, addr);
>>>> +        if (paddr != APIC_DEFAULT_ADDRESS) {
>>>> +            continue;
>>>> +        }
>>>> +        s->real_tpr_addr = addr + 0x80;
>>>> +        update_guest_rom_state(s);
>>>> +        return 0;
>>>> +    }
>>>
>>> This loop looks odd, what should it do, probe for unused address?
>>
>> Seems to deserve a comment: We have to scan for the guest's mapping of
>> the APIC as we enter here without a hint from an TPR accessing
>> instruction. So we probe the potential range, trying to find the page
>> that maps to that known physical address (known in the sense that
>> Windows does not remap the APIC physically - nor does QEMU support that
>> so far).
> 
> Yes, more comments would be nice, especially on theory of operation.
> 
>>>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
>>>> +                                    target_ulong *pip, int access)
>>>> +{
>>>> +    const TPRInstruction *instr;
>>>> +    target_ulong ip = *pip;
>>>> +    uint8_t opcode[2];
>>>> +    uint32_t real_tpr_addr;
>>>> +    int i;
>>>> +
>>>> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 
>>>> 0xe0000000) {
>>>
>>> The constants should be using ULL suffix because target_ulong could be
>>> 64 bit, though maybe this is more optimal.
>>
>> target_ulong is 64-bit unconditionally on x86. I'll add this.
> 
> No, target_phys_addr_t is now 64 bits, but target_ulong (register
> size) is 32 bits for i386-softmmu.

Ah, right.

> 
>>>> +
>>>> +/*
>>>> + * Tries to read the unique processor number from the Kernel Processor 
>>>> Control
>>>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be 
>>>> accessed
>>>> + * or is considered invalid.
>>>> + */
>>>
>>> Horrible hack. Is guest OS type or version checked somewhere?
>>
>> This is all about hacking Windows 32-bit. And this check encodes that
>> even stronger. The other important binding is the expected virtual
>> address of the ROM mapping under Windows. I would have preferred
>> checking the version directly, but no one has a complete list of
>> supported guests and their codes.
> 
> Then it would be nice to only enable this with a command line switch,
> so that some random poor non-Windows OS is not patched incorrectly.

I had the same concern, but there is no need to worry, we have
sufficient checks that no other guest is affected. And we do have a
switch as well, the APIC property. But this is better left on by default
to please our guests with optimal performance.

> 
>>>
>>>> +static int get_kpcr_number(CPUState *env)
>>>> +{
>>>> +    struct kpcr {
>>>> +        uint8_t  fill1[0x1c];
>>>> +        uint32_t self;
>>>> +        uint8_t  fill2[0x31];
>>>> +        uint8_t  number;
>>>> +    } QEMU_PACKED kpcr;
>>>
>>> KPCR. Pointers to Windows documentation would be nice.
>>
>> Oops, yes.
>>
>> Unfortunately, this is only an internal structure, not officially
>> documented by MS. However, all supported OS versions a legacy by now, no
>> longer changing its structure.
> 
> This and a note about the supported OS versions could be added as comment.

OK.

For the folks that developed it in qemu-kvm: This targets Windows XP,
Vista and Server 2003, all 32-bit, right?

Jan


-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



reply via email to

[Prev in Thread] Current Thread [Next in Thread]