Re: [RFC] Reducing NEED_CPU

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Reducing NEED_CPU_H usage

From:	Richard Henderson
Subject:	Re: [RFC] Reducing NEED_CPU_H usage
Date:	Tue, 10 Jan 2023 11:56:50 -0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 12/28/22 08:16, Alessandro Di Federico wrote:

## `target_ulong`

`target_ulong` is `uint32_t` in 32-bit targets and `uint64_t` in 64-bit
targets.

Problem: This is used in many many places to represent addresses in
code that could become target-independent.

Proposed solution: we can convert it to:

     typedef uint64_t target_address;


We have other typedefs that are better for this, e.g. vaddr.

However, at some point we do want to keep some target addresses in the proper size. Forinstance within the softmmu tlb, where CPUTLBEntry is either 16 or 32 bytes, depending.

(On the other hand, if we drop support for 32-bit hosts, as we keep threatening to do,then CPUTLB is always 32 bytes, and we might as well use vaddr there too. But not until32-bit hosts are gone.)


The problem with this is that, if arithmetic operations are performed
on it, we might get undesired results:

     // Was: char load_data(target_ulong address)
     char load_data(target_address address) {
       char *base_address = get_base_address();
       // On a 32-bits target this would overflow, it doesn't with
       // uint64_t
       target_address real_address = address + 1;
       return *(base_address + real_address);
     }

Doesn't, or shouldn't matter, because we should never do anything like this in genericcode. Note that


    vaddr ptr = ...;
    cpu_ldl_le_data(env, ptr + offset)

does not have the problem you describe, because any overflow is truncated within the loadfunction.

## `abi_ulong`

Similar to `target_ulong`, but with alignment info.


Pardon?  There's no alignment info in abi_ulong.

The difference is that 'target_ulong' is the size of the target register, and 'abi_ulong'is the 'unsigned long' in the target's C ABI. Consider e.g. x32 (x86_64 with ilp32 abi),for which target_ulong is 64-bit but abi_ulong is 32-bit.


This only applies to user-only, and should not matter for this project.

There *is* an 'abi_ptr' type, which is shared between softmmu and user-only, which mightbe able to be replaced by 'vaddr'. Or 'typedef vaddr abi_ptr' in softmmu mode. I haven'tdone a survey on that to be certain.

## `TCGv`

`TCGv` is a macro for `TCGv_i32` for 32-bit targets and `TCGv_i64`
for 64-bit targets.

The idea is that this macro should only be visible to target-specific code, and the macroprovides the swizzling/encoding to the concrete type functions.

Problem: it makes `tcg-op.h`


This is fine.

and, more importantly, `tcg-op.c`

This one requires some work within tcg/ to handle two target address sizes simultaneously.It should not be technically difficult to solve, but it does involve adding a few TCGopcodes and adjusting all tcg backends.

Solution: transform current functions using them into target-specific
wrappers that dispatch to target-agnostic functions that accept
`TCGv_dyn` instead of `TCGv`:

     typedef struct {
         union {
             TCGv_i32 i32;
             TCGv_i64 i64;
         };
         bool is_64;
     } TCGv_dyn;

This forgets that both TCGv_i32 and TCGv_i64 are represented by TCGTemp, which contains'TCGType type' to discriminate. This is not exposed to target/, but it's there.


Anyway, there's no need for this.

## `TARGET_` macros

These are macros that provide target-specific information.

Problem: they need to be abandoned in translation units that need to
become target agnostic.

Solution: promote them to fields of a `struct`.
Current ideas:

     TARGET_TB_PCREL -> TranslationBlock.pc_rel


I'd been thinking a bit on the cpu, but a CF_* bit works well.
It gets initialized for each TB from CPUState.tcg_cflags.
TBD where we'd initialize the new bit for each cpu...

     TARGET_PAGE_BITS -> TranslationBlock.page_bits
     TARGET_PAGE_MASK -> TranslationBlock.page_mask

You need to look at how TARGET_PAGE_BITS_VARY works. The memory subsystem needs rewritingif we were to support multiple page sizes. What we can support now is one single globalpage size, selected at startup.

     TARGET_PAGE_ALIGN -> CPUArchState.page_align
                       -> DisasContextBase.page_align


This remains a trivial macro based on the variable TARGET_PAGE_MASK.

     TARGET_LONG_BITS -> TCGContext.long_bits


Yes.

I've been considering how to generalize this to arbitrary address widths, in order tobetter support ARM top-byte-ignore and RISC-V J extension (pointer masking). But in theshort term I'm happy with this number being exactly 64 or 32.

     TARGET_PAGE_SIZE -> ???


Remains a trivial macro based on the variable TARGET_PAGE_MASK.

     TCG_OVERSIZED_GUEST -> ???

Goes away if we drop support for 32-bit hosts, or restrict 32-bit hosts to 32-bit guests.I have no other good ideas.

     TARGET_FMT_lx -> ???


VADDR_PRIx, mostly.  May need resolving on a case-by-case basis.

     CPU_RESOLVING_TYPE -> ???


Would need to be part of the per-target shared library interface.

## `CPUState`

`CPUState` is a target-agnostic `struct` representing common information
of `ArchCPU`.

Problem: given an (opaque) pointer to `CPU${Arch}State`,
target-agnostic code often wants to reach `CPUState`, but this requires
knowledge of the layout of `ArchCPU`, which is target-dependent.

Solution: have a target-specific function to obtain the pointer to
`CPUState` given `CPU${Arch}State`. Where this function would go, and
how it would be retrieved, needs more consideration.
I imagine a table of target-specific function pointers.

For the most part, generic code should be converted to use CPUState as much as possible,and only target-specific code should deal with CPUArchState.

However, generic code and the tcg backend need to be able to find CPUNegativeOffsetStatefrom CPUArchState. This is (and must be) a per-target constant; we cannot allow fullflexibility of function pointers.

We currently have CPUState.env_ptr; we can add 'CPUNegativeOffsetState *neg_ptr' to match,then the per-target constant is the difference between the two pointers.

## `CPUNegativeOffsetState`

`CPUNegativeOffsetState` is a `struct` placed in `ArchCPU` before
`CPU${Arch}State`.

     struct ArchCPU {
         /*< private >*/
         CPUState parent_obj;
         /*< public >*/

CPUNegativeOffsetState neg;

         CPUAlphaState env;

/* This alarm doesn't exist in real hardware; we wish it did.

     */ QEMUTimer *alarm_timer;
     };

Problem: used in several parts of the code that need to become
target-agnostic.

Solution: make it target-agnostic and push it into `CPUState`, which is
the place for target-agnostic stuff.

I'm not sure why this isn't already the case. I'm probably missing
something here, so feedback is welcome.

For code generation quality, we need small negative displacements from env to findicount_decr (used at the start of each TB), and CPUTLBDescFast[n]. Host specificaddressing modes (arm32, arm64, riscv) require the maximum of these displacements to be >=-(1 << 11).

Thus CPUNegativeOffsetState, and the documented (but not enforced) requirement that 'neg'immediately precede 'env'. The alignment requirements of env mean that there may be somesmall amount of padding between the two.

Before CPUNegativeOffsetState, we had all of those variables within CPUState, and evencomments that they should remain at the end of the struct. But those comments wereignored and one day icount_decr was too far away from env for easy addressing on arm host.Luckily there was an assert that fired, but it didn't get caught right away.

As for NB_MMU_MODES, which CPUTLB depends on, we could fix this at 16. This is largerthan all current targets (arm, 12; ppc 10), and is also the maximum currently supported bythe softmmu tlb api (uint16_t idxmap).

Once upon a time it was quite expensive to have many mmu modes, as we directly allocatedtheir storage within CPUState. But now we dynamically resize softmmu tlbs, so theoverhead of an unused mmu index is fairly low (sizeof(CPUTLBDesc) + sizeof(CPUTLBDescFast)).

# Current status

We currently have a branch where we can build (but not link nor run) a
`x86_64-linux-user` configuration where `NEED_CPU_H` is defined only
for translation units in `target/` and `linux-user/`.


This effort should be exclusive to system mode -- don't consider *-user at all.

Each linux-user target bakes in not just target architecture parameters, which arecomplicated enough, but also C ABI, and kernel API. Moreover, the most common use casefor linux-user is a statically linked binary that operates within a chroot.

All you need for *-user is to make sure you don't accidentally break them while doingsystem mode cleanup.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC] Reducing NEED_CPU_H usage, Richard Henderson <=
- Re: [RFC] Reducing NEED_CPU_H usage, Alessandro Di Federico, 2023/01/12
  - Re: [RFC] Reducing NEED_CPU_H usage, Richard Henderson, 2023/01/12
  - Re: [RFC] Reducing NEED_CPU_H usage, Richard Henderson, 2023/01/14

Prev by Date: Re: [PATCH 1/1] hw/arm/sbsa-ref.c: Start APs powered off
Next by Date: Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
Previous by thread: [PATCH 0/2] linux-user: revert previous workaround for glibc >= 2.36
Next by thread: Re: [RFC] Reducing NEED_CPU_H usage
Index(es):
- Date
- Thread