qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TB Cache size grows out of control with qemu 5.0


From: Alex Bennée
Subject: Re: TB Cache size grows out of control with qemu 5.0
Date: Fri, 17 Jul 2020 15:27:41 +0100
User-agent: mu4e 1.5.4; emacs 28.0.50

Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:

> On Thu, Jul 16, 2020 at 6:27 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>>
>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>>
>> > On Wed, Jul 15, 2020 at 5:58 PM BALATON Zoltan <balaton@eik.bme.hu>
>> wrote:
>> >
>> >> See commit 47a2def4533a2807e48954abd50b32ecb1aaf29a and the next two
>> >> following it.
>> >>
>> >
>> > Thank you Zoltan for pointing out this commit, I agree that this seems
>> to be
>> > the trigger for the issues I'm seeing. Unfortunately the common CI host
>> size
>> > is 1-2G. For example on Ubuntu Autopkgtests 1.5G.
>> > Those of them running guests do so in 0.5-1G size in TCG mode
>> > (as they often can't rely on having KVM available).
>> >
>> > The 1G TB buffer + 0.5G actual guest size + lack of dynamic downsizing
>> > on memory pressure (never existed) makes these systems go OOM-Killing
>> > the qemu process.
>>
>> Ooops. I admit the assumption was that most people running system
>> emulation would be doing it on beefier machines.
>>
>> > The patches indicated that the TB flushes on a full guest boot are a
>> > good indicator of the TB size efficiency. From my old checks I had:
>> >
>> > - Qemu 4.2 512M guest with 32M default overwritten by ram-size/4
>> > TB flush count      14, 14, 16
>> > - Qemu 5.0 512M guest with 1G default
>> > TB flush count      1, 1, 1
>> >
>> > I agree that ram/4 seems odd, especially on huge guests that is a lot
>> > potentially wasted. And most environments have a bit of breathing
>> > room 1G is too big in small host systems and the common CI system falls
>> > into this category. So I tuned it down to 256M for a test.
>> >
>> > - Qemu 4.2 512M guest with tb-size 256M
>> > TB flush count      5, 5, 5
>> > - Qemu 5.0 512M guest with tb-size 256M
>> > TB flush count      5, 5, 5
>> > - Qemu 5.0 512M guest with 256M default in code
>> > TB flush count      5, 5, 5
>> >
>> > So performance wise the results are as much in-between as you'd think
>> from a
>> > TB size in between. And the memory consumption which (for me) is the
>> actual
>> > current issue to fix would be back in line again as expected.
>>
>> So I'm glad you have the workaround.
>>
>> >
>> > So on one hand I'm suggesting something like:
>> > --- a/accel/tcg/translate-all.c
>> > +++ b/accel/tcg/translate-all.c
>> > @@ -944,7 +944,7 @@ static void page_lock_pair(PageDesc **re
>> >   * Users running large scale system emulation may want to tweak their
>> >   * runtime setup via the tb-size control on the command line.
>> >   */
>> > -#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
>> > +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (256 * MiB)
>>
>> The problem we have is any number we pick here is arbitrary. And while
>> it did regress your use-case changing it again just pushes a performance
>> regression onto someone else.
>
>
> Thanks for your feedback Alex!
>
> That is true "for you" since 5.0 is released from upstreams POV.
> But from the downstreams POV no 5.0 exists for Ubuntu yet and I'd break
> many places releasing it like that.
> Sadly the performance gain to the other cases will most likely go unnoticed.
>
>
>> The most (*) 64 bit desktop PCs have 16Gb
>> of RAM, almost all have more than 8gb. And there is a workaround.
>>
>
> Due to our work around virtualization the values representing
> "most 64 bit desktop PCs" aren't the only thing that matters :-)
>
> ...
>
>
>> > This is a bit more tricky than it seems as ram_sizes is no more
>> > present in that context but it is enough to discuss it.
>> > That should serve all cases - small and large - better as a pure
>> > static default of 1G or always ram/4?
>>
>> I'm definitely against re-introducing ram_size into the mix. The
>> original commit (a1b18df9a4) that broke this introduced an ordering
>> dependency which we don't want to bring back.
>>
>
> I agree with that reasoning, but currently without any size dependency
> the "arbitrary value" we picked to be 1G is even more fixed than it was
> before.
> Compared to pre v5.0 for now I can only decide to
> a) tune it down -> performance impact for huge guests
> b) keep it at 1G -> functional breakage with small hosts
>
> I'd be more amenable to something that took into account host memory and
>> limited the default if it was smaller than a threshold. Is there a way
>> to probe that that doesn't involve slurping /proc/meminfo?
>>
>
> I agree that a host-size dependency might be the better way to go,
> yet I have no great cross-platform resilient way to get that.
> Maybe we can make it like "if I can get some value consider it,
> otherwise use the current default".
> That would improve many places already, while keeping the rest at the
> current behavior.
>
>
>> >
>> > P.S. I added Alex being the Author of the offending patch and
>> Richard/Paolo
>> > for being listed in the Maintainers file for TCG.
>>
>>
>>
> From Zoltan (unifying the thread a bit):
>
>> I agree that this should be dependent on host memory size not guest
>> ram_size but it might be tricky to get that value because different host
>> OSes would need different ways.
>
> Well - where it isn't available we will continue to take the default
> qemu 5.0 already had. If required on other platforms as well they can add
> their way of host memory detection into this as needed.
>
>> Maybe a new qemu_host_mem_size portability
>> function will be needed that implements this for different host OSes.
>> POSIX may or may not have sysconf _SC_PHYS_PAGES and _SC_AVPHYS_PAGES
>
> We should not try to get into the business of _SC_AVPHYS_PAGES and
> try to understand/assume what might be cache or otherwise (re)usable.
> Since we only look for some alignment to hosts size _SC_PHYS_PAGES should
> be good enough and available in more places than the other options.
>
>> and linux has sysinfo but don't know how reliable these are.
>
> sysconf is slightly more widely available than sysinfo and has enough for
> what we need.
>
>
> I have combined the thoughts above into a patch and it works well in
> my tests.
>
> 32G Host:
> pages 8187304.000000
> pagesize 4096.000000
> max_default 4191899648
> final tb_size 1073741824
>
> (qemu) info jit
> Translation buffer state:
> gen code size       210425059/1073736659
> TB count            368273
> TB avg target size  20 max=1992 bytes
> TB avg host size    330 bytes (expansion ratio: 16.1)
> cross page TB count 1656 (0%)
> direct jump count   249813 (67%) (2 jumps=182112 49%)
> TB hash buckets     197613/262144 (75.38% head buckets used)
> TB hash occupancy   34.15% avg chain occ. Histogram: [0,10)%|▆ █
>  ▅▁▃▁▁|[90,100]%
> TB hash avg chain   1.020 buckets. Histogram: 1|█▁▁|3
>
> Statistics:
> TB flush count      1
> TB invalidate count 451673
> TLB full flushes    0
> TLB partial flushes 154819
> TLB elided flushes  191627
> [TCG profiler not compiled]
>
> => 1G TB size not changed compared to v5.0 - as intended
>
>
> But on a small 1.5G Host it now works without OOM:
>
> pages 379667.000000
> pagesize 4096.000000
> max_default 194389504
> final tb_size 194389504
>
> (qemu) info jit
> Translation buffer state:
> gen code size       86179731/194382803
> TB count            149995
> TB avg target size  20 max=1992 bytes
> TB avg host size    333 bytes (expansion ratio: 16.5)
> cross page TB count 716 (0%)
> direct jump count   98854 (65%) (2 jumps=74962 49%)
> TB hash buckets     58842/65536 (89.79% head buckets used)
> TB hash occupancy   51.46% avg chain occ. Histogram: [0,10)%|▃ ▇
>  █▂▆▁▄|[90,100]%
> TB hash avg chain   1.091 buckets. Histogram: 1|█▁▁|3
>
> Statistics:
> TB flush count      10
> TB invalidate count 31733
> TLB full flushes    0
> TLB partial flushes 180891
> TLB elided flushes  244107
> [TCG profiler not compiled]
>
> => ~185M which is way more reasonable given the host size
>
> The patch will have a rather large comment in it, I'm not sure if the full
> comment is needed, but I wanted to leave a trace what&why is going
> on in the code for the next one who comes by.
>
> Submitting as a proper patch to the list in a bit ...

Ahh did you not get CC'ed by my patch this morning?

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]