qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet


From: Robert Henry
Subject: [Bug 1866892] Re: guest OS catches a page fault bug when running dotnet
Date: Tue, 24 Mar 2020 20:43:23 -0000

I've stepped/nexted from the helper_iret_protected, going deep into the
bowels of the TLB, MMU and page table engine.  None of which I
understand. The helper_ret_protected faults in the first POPQ_RA.  I'll
investigate the value of sp at the time of the POPQ_RA.

Here's the POPQ_RA in i386/seg_helper.c:2140

    sp = env->regs[R_ESP];
    ssp = env->segs[R_SS].base;
    new_eflags = 0; /* avoid warning */
#ifdef TARGET_X86_64
    if (shift == 2) {
        POPQ_RA(sp, new_eip, retaddr);
        POPQ_RA(sp, new_cs, retaddr);
        new_cs &= 0xffff;
        if (is_iret) {
            POPQ_RA(sp, new_eflags, retaddr);
        }

and here's the stack.  Note some of the logical intermediate frames are
optimized out due to -O3 and inline. (the value of env-errorcode is 1)

0  0x0000555555a370c0 in raise_interrupt2
    (env=env@entry=0x5555566ef200, intno=14, is_int=is_int@entry=0, 
error_code=1, next_eip_addend=next_eip_addend@entry=0, 
retaddr=retaddr@entry=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/include/exec/cpu-all.h:426
#1  0x0000555555a377f9 in raise_exception_err_ra
    (env=env@entry=0x5555566ef200, exception_index=<optimized out>, 
error_code=<optimized out>, retaddr=retaddr@entry=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/excp_helper.c:127
#2  0x0000555555a37d69 in x86_cpu_tlb_fill
    (cs=0x5555566e69a0, addr=140727872411616, size=<optimized out>, 
access_type=MMU_DATA_LOAD, mmu_idx=0, probe=<optimized out>, 
retaddr=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/excp_helper.c:697
#3  0x0000555555952295 in tlb_fill
    (cpu=0x5555566e69a0, addr=140727872411616, size=8, 
access_type=MMU_DATA_LOAD, mmu_idx=0, retaddr=140736367565663)
    at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1017
#4  0x0000555555956320 in load_helper
    (full_load=0x555555956140 <helper_le_ldq_mmu>, code_read=false, op=MO_64, 
retaddr=93825010692608, oi=48, addr=140727872411616, env=0x5555566ef200) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/include/exec/cpu-all.h:426
#5  0x0000555555956320 in helper_le_ldq_mmu
    (env=env@entry=0x5555566ef200, addr=addr@entry=140727872411616, 
oi=oi@entry=48, retaddr=retaddr@entry=140736367565663)
    at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1688
#6  0x0000555555956dc0 in cpu_load_helper
    (full_load=0x555555956140 <helper_le_ldq_mmu>, op=MO_64, 
retaddr=140736367565663, mmu_idx=<optimized out>, addr=140727872411616, 
env=0x5555566ef200) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1752
#7  0x0000555555956dc0 in cpu_ldq_mmuidx_ra
    (env=env@entry=0x5555566ef200, addr=addr@entry=140727872411616, 
mmu_idx=<optimized out>, ra=ra@entry=140736367565663)
--Type <RET> for more, q to quit, c to continue without paging--
    at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1799
#8  0x0000555555a4ff09 in helper_ret_protected
    (env=env@entry=0x5555566ef200, shift=shift@entry=2, 
is_iret=is_iret@entry=1, addend=addend@entry=0, retaddr=140736367565663)
    at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/seg_helper.c:2140
#9  0x0000555555a50ff5 in helper_iret_protected (env=0x5555566ef200, shift=2, 
next_eip=-999377888)
    at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/seg_helper.c:2363
#10 0x00007fffbd321b5f in code_gen_buffer ()

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
    -m size=4096 \
    -smp cpus=1 \
    -machine type=pc-i440fx-5.0,accel=tcg \
    -cpu Skylake-Server-v1 \
    -nographic \
    -bios OVMF-pure-efi.fd \
    -drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
    -device virtio-blk,drive=hd0 \
    -drive if=none,id=cloud,file=linux_cloud_config.img \
    -device virtio-blk,drive=cloud \
    -netdev user,id=user0,hostfwd=tcp::2223-:22 \
    -device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 00007fffffffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfffffe0000000000 (limit=0xfff) GDT: 0xfffffe0000001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfffffe0000003000 limit=0x206f
  [ 2834.030395] PGD 80000000360d0067 P4D 80000000360d0067 PUD 36105067 PMD 
36193067 PTE 8000000076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G      D           
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x1555547eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:00007fffffffc2c0 EFLAGS: 00000202
  [ 2834.076507] RAX: 0000000000000000 RBX: 00001554b401af38 RCX: 
0000000000000001
  [ 2834.080832] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
00007fffffffcfb0
  [ 2834.085010] RBP: 00007fffffffd730 R08: 0000000000000000 R09: 
00007fffffffd1b0
  [ 2834.089184] R10: 0000155555331dd5 R11: 00001555553ad8d0 R12: 
0000000000000002
  [ 2834.093350] R13: 0000000000000001 R14: 0000000000000001 R15: 
00001554b401d388
  [ 2834.097309] FS:  0000155554fa5740 GS:  0000000000000000
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 00007fffffffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:00007ffddb4e2980 EFLAGS: 00000202
  [ 2834.144615] RAX: 0000000000000000 RBX: 000014d6f402acb8 RCX: 
0000000000000002
  [ 2834.148943] RDX: 0000000001cd6950 RSI: 0000000000000000 RDI: 
00007ffddb4e3670
  [ 2834.153335] RBP: 00007ffddb4e3df0 R08: 0000000000000001 R09: 
00007ffddb4e3870
  [ 2834.157774] R10: 000014d793da9dd5 R11: 000014d793e258d0 R12: 
0000000000000002
  [ 2834.162132] R13: 0000000000000001 R14: 0000000000000001 R15: 
000014d6f402d040
  [ 2834.166239] FS:  0000155554fa5740(0000) GS:ffff97213ba00000(0000) 
knlGS:0000000000000000
  [ 2834.170529] CS:  0033 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 2834.174751] CR2: 000014d793262eb0 CR3: 0000000036130000 CR4: 
00000000007406f0
  [ 2834.178892] PKRU: 55555554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the app shows that no alt stacks come anywhere near
  the failing address; all alt stacks are in the heap, as expected.
  None of the mmap/mprotect/munmap syscalls were given arguments in the
  high memory 0x7fffffff0000 and up.

  gdb (with default signal stop/print/pass semantics) does not report
  any signals prior to the kernel bug being tripped, so I doubt the
  alternate signal stack is actually used.

  When I run the same dotnet binary on the host (eg, on "bare metal"),
  the host kernel seems happy and dotnet runs as expected.

  I have not tried different qemu or guest or host O/S.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1866892/+subscriptions



reply via email to

[Prev in Thread] Current Thread [Next in Thread]