[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-arm] Expensive emulation of CPU condition flags

From: Shuang Zhai
Subject: [Qemu-arm] Expensive emulation of CPU condition flags
Date: Thu, 30 Jun 2016 18:13:56 +0000

Hi everyone.

In running an ARMv7 guest on an x86 host, we observed that a guest instruction affecting condition flags is often translated into 10+ host instructions. The reason seems to be the way that the frontend emulates the condition flags. For instance:

Target ARM instruction:

cmp  r9, 0x21 ;

IR instruction:

movi_i32 tmp5,$0x21

sub_i32 NF,r9,tmp5

mov_i32 ZF,NF

setcond_i32 CF,r9,tmp5,geu

xor_i32 VF,NF,r9

xor_i32 tmp7,r9,tmp5

and_i32 VF,VF,tmp7

Host x86 instruction:

sub    $0x21,%ebx

mov    %ebx,0x208(%r14)

mov    %ebx,%r12d

mov    %r12d,0x20c(%r14)

cmp    $0x21,%ebp

setae  %r13b

movzbl %r13b,%r13d

mov    %r13d,0x200(%r14)

xor    %ebp,%ebx

xor    $0x21,%ebp

and    %ebp,%ebx

mov    %ebx,0x204(%r14)

Imaging in a tight loop where a cmp instruction is used to compute the termination condition, this can be pretty expensive. And lazy evaluation seems not to help here.

We wonder if there exists any optimization, e.g., directly mapping the frontend flags to that of the backend? Any suggestions are appreciated.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]