On 26/09/19 11:35, Li Qiang wrote:
> So without unrestrict guest the mainline is this: KVM set guest's
> rflag bit X86_EFLAGS_VM, so when the guest enter guest mode, it is in
> vm86 mode. In this mode, the CPU will access the address like in
> real mode(seg*4+offset), this address is linear address. And in fact,
> the vm86 is still in protected, so the linear address will be
> translated to gpa by the identity mapping table. Then goes to EPT
> table?
Yes.
> ... as soon as the guest tries to enter protected mode, it will get into
> a situation which is not real mode but doesn't have the segment
> registers properly loaded with selectors.
>
> Therefore, it will either
> hack things together (enter_pmode) or emulate instructions until the
> state is accepted even without unrestricted guest support.
>
> Could you please explain this situation more detailed? Why this happen?
Protected mode entry looks like this:
mov %cr0, %eax
or $1, %al
mov %eax, %cr0
# [1] now in 16-bit protected mode
lgdtl gdt32
ljmpl $8, 2f
# [2] now in 32-bit protected mode
2:
.code32
mov $16, %ax
mov %ax, %ds
mov %ax, %es
mov %ax, %fs
mov %ax, %gs
mov %ax, %ss
# [3] now everything is okay
Between [1] and [3] the vmentry could fail if not in unrestricted mode.
For example (see checks on guest segment registers in the SDM):
- "CS. Type must be 9, 11, 13, or 15 (accessed code segment)." CS in
real-mode is a RW data segment, not a code segment. This applies
between [1] and [2].
- "SS. If the guest will not be virtual-8086 and the “unrestricted
guest” VM-execution control is 0, the RPL (bits 1:0) must equal the RPL
of the selector field for CS." This may not be the case if the segment
register still holds real-mode values (which are not selectors, just
base >> 4). This applies between [1] and [3].
- "DS, ES, FS, GS. The DPL cannot be less than the RPL in the selector
field" Again, the real-mode DPL is zero but the RPL makes no sense if
the segment registers hold a real-mode value.
You can find more about these checks in guest_state_valid(); look at the
"else" branch of that function, the "then" branch is for pmode->rmode
transitions. When any of the checks fail, KVM emulates instructions
instead of using VMX non-root mode (usually it's just a handful of them,
as in the case above).
Thanks so much for your explanation. I will read the code more to strengthen my understanding.
Thanks,
Li Qiang
Paolo