qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/5] plugins/cache: implement unified L2 cache emulation


From: Alex Bennée
Subject: Re: [PATCH 2/5] plugins/cache: implement unified L2 cache emulation
Date: Fri, 08 Oct 2021 16:44:49 +0100
User-agent: mu4e 1.7.0; emacs 28.0.60

Mahmoud Mandour <ma.mandourr@gmail.com> writes:

> This adds an implementation of a simple L2 configuration, in which a
> unified L2 cache (stores both blocks of instructions and data) is
> maintained for each core separately, with no inter-core interaction
> taken in account. The L2 cache is used as a backup for L1 and is only
> accessed if the wanted block does not exist in L1.
>
> In terms of multi-threaded user-space emulation, the same approximation
> of L1 is done, a static number of caches is maintained, and each and
> every memory access initiated by a thread will have to go through one of
> the available caches.
>
> An atomic increment is used to maintain the number of L2 misses per
> instruction.
>
> The default cache parameters of L2 caches is:
>
>     2MB cache size
>     16-way associativity
>     64-byte blocks
>
> Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
> ---
>  contrib/plugins/cache.c | 256 +++++++++++++++++++++++++++-------------
>  1 file changed, 175 insertions(+), 81 deletions(-)
>
> diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
> index a255e26e25..908c967a09 100644
> --- a/contrib/plugins/cache.c
> +++ b/contrib/plugins/cache.c
> @@ -82,8 +82,9 @@ typedef struct {
>      char *disas_str;
>      const char *symbol;
>      uint64_t addr;
> -    uint64_t dmisses;
> -    uint64_t imisses;
> +    uint64_t l1_dmisses;
> +    uint64_t l1_imisses;
> +    uint64_t l2_misses;
>  } InsnData;
>  
>  void (*update_hit)(Cache *cache, int set, int blk);
> @@ -93,15 +94,20 @@ void (*metadata_init)(Cache *cache);
>  void (*metadata_destroy)(Cache *cache);
>  
>  static int cores;
> -static Cache **dcaches, **icaches;
> +static Cache **l1_dcaches, **l1_icaches;
> +static Cache **l2_ucaches;
>  
> -static GMutex *dcache_locks;
> -static GMutex *icache_locks;
> +static GMutex *l1_dcache_locks;
> +static GMutex *l1_icache_locks;
> +static GMutex *l2_ucache_locks;

Did you experiment with keeping a single locking hierarchy? I measured
quite a high contention with perf while running on system emulation.
While splitting locks can reduce contention I suspect the pattern of
access might just lead to 2 threads serialising twice in a row and
therfore adding to latency.

It might be overly complicated by the current split between i and d
cache for layer 1 which probably makes sense.

Otherwise looks reasonable to me:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]