[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v17 6/8] softmmu/dirtylimit: Implement virtual CPU throttle
|
From: |
Peter Xu |
|
Subject: |
Re: [PATCH v17 6/8] softmmu/dirtylimit: Implement virtual CPU throttle |
|
Date: |
Mon, 13 Jun 2022 11:58:57 -0400 |
On Mon, Jun 13, 2022 at 09:03:24PM +0530, manish.mishra wrote:
>
> On 13/06/22 8:03 pm, Peter Xu wrote:
> > On Mon, Jun 13, 2022 at 03:28:34PM +0530, manish.mishra wrote:
> > > On 26/05/22 8:21 am, Jason Wang wrote:
> > > > On Wed, May 25, 2022 at 11:56 PM Peter Xu <peterx@redhat.com> wrote:
> > > > > On Wed, May 25, 2022 at 11:38:26PM +0800, Hyman Huang wrote:
> > > > > > > 2. Also this algorithm only control or limits dirty rate by guest
> > > > > > > writes. There can be some memory dirtying done by virtio based
> > > > > > > devices
> > > > > > > which is accounted only at qemu level so may not be accounted
> > > > > > > through
> > > > > > > dirty rings so do we have plan for that in future? Those are not
> > > > > > > issue
> > > > > > > for auto-converge as it slows full VM but dirty rate limit only
> > > > > > > slows
> > > > > > > guest writes.
> > > > > > >
> > > > > > From the migration point of view, time spent on migrating memory
> > > > > > is far
> > > > > > greater than migrating devices emulated by qemu. I think we can do
> > > > > > that when
> > > > > > migrating device costs the same magnitude time as migrating memory.
> > > > > >
> > > > > > As to auto-converge, it throttle vcpu by kicking it and force it to
> > > > > > sleep
> > > > > > periodically. The two seems has no much difference from the
> > > > > > perspective of
> > > > > > internal method but the auto-converge is kind of "offensive" when
> > > > > > doing
> > > > > > restraint. I'll read the auto-converge implementation code and
> > > > > > figure out
> > > > > > the problem you point out.
> > > > > This seems to be not virtio-specific, but can be applied to any
> > > > > device DMA
> > > > > writting to guest mem (if not including vfio). But indeed virtio can
> > > > > be
> > > > > normally faster.
> > > > >
> > > > > I'm also curious how fast a device DMA could dirty memories. This
> > > > > could be
> > > > > a question to answer to all vcpu-based throttling approaches
> > > > > (including the
> > > > > quota based approach that was proposed on KVM list). Maybe for kernel
> > > > > virtio drivers we can have some easier estimation?
> > > > As you said below, it really depends on the speed of the backend.
> > > >
> > > > > My guess is it'll be
> > > > > much harder for DPDK-in-guest (aka userspace drivers) because IIUC
> > > > > that
> > > > > could use a large chunk of guest mem.
> > > > Probably, for vhost-user backend, it could be ~20Mpps or even higher.
> > > Sorry for late response on this. We did experiment with IO on virtio-scsi
> > > based disk.
> > Thanks for trying this and sharing it out.
> >
> > > We could see dirty rate of ~500MBps on my system and most of that was not
> > > tracked
> > >
> > > as kvm_dirty_log. Also for reference i am attaching test we used to avoid
> > > tacking
> > >
> > > in KVM. (as attached file).
> > The number looks sane as it seems to be the sequential bandwidth for a
> > disk, though I'm not 100% sure it'll work as expected since you mmap()ed
> > the region with private pages rather than shared, so after you did I'm
> > wondering whether below will happen (also based on the fact that you mapped
> > twice the size of guest mem as you mentioned in the comment):
> >
> > (1) Swap out will start to trigger after you read a lot of data into the
> > mem already, then old-read pages will be swapped out to disk (and
> > hopefully the swap device does not reside on the same virtio-scsi
> > disk or it'll be even more complicated scenario of mixture IOs..),
> > meanwhile when you finish reading a round and start to read from
> > offset 0 swap-in will start to happen too. Swapping can slow down
> > things already, and I'm wondering whether the 500MB/s was really
> > caused by the swapout rather than backend disk reads. More below.
> >
> > (2) Another attribute of private pages AFAICT is after you read it once
> > it does not need to be read again from the virtio-scsi disks. In
> > other words, I'm thinking whether starting from the 2nd iteration
> > your program won't trigger any DMA at all but purely torturing the
> > swap device.
> >
> > Maybe changing MAP_PRIVATE to MAP_SHARED can emulate better on what we want
> > to measure, but I'm also not 100% sure on whether it could be accurate..
> >
> > Thanks,
> >
> Thanks Peter, Yes agree MAP_SHARED should be used here, sorry i missed that 😁.
>
> Yes, my purpose of taking file size larger than RAM_SIZE was to cause
>
> frequent page cache flush and re-populating page-cache pages, not to
>
> trigger swaps. I checked on my VM i had swapping disabled, may be
>
> MAP_PRIVATE did not make difference because it was read-only.
Makes sense. And yeah I overlooked the RO part - indeed page cache will be
used for RO pages as long as never written. So it'll behave like shared.
Otherwise for a swap-all-off you should have have hit OOM anyway and the
process probably will get killed sooner or later. :)
>
> I tested again with MAP_SHARED it comes around ~500MBps.
Makes sense. I'd guess that's the limitation of the virtio-scsi backend,
IOW the logical limitation of device dirtying memory could be unlimited
(e.g., when we put the virtio backend onto a ramdisk).
--
Peter Xu