qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 00/16] Postcopy: Hugepage support


From: Alexey Perevalov
Subject: Re: [Qemu-devel] [PATCH v2 00/16] Postcopy: Hugepage support
Date: Tue, 14 Feb 2017 19:22:05 +0300
User-agent: Mutt/1.5.21 (2010-09-15)

Hi David,

Thank your, now it's clear.

On Mon, Feb 13, 2017 at 06:16:02PM +0000, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (address@hidden) wrote:
> >  Hello David!
> 
> Hi Alexey,
> 
> > I have checked you series with 1G hugepage, but only in 1 Gbit/sec network
> > environment.
> 
> Can you show the qemu command line you're using?  I'm just trying
> to make sure I understand where your hugepages are; running 1G hostpages
> across a 1Gbit/sec network for postcopy would be pretty poor - it would take
> ~10 seconds to transfer the page.

sure
-hda ./Ubuntu.img -name PAU,debug-threads=on -boot d -net nic -net user
-m 1024 -localtime -nographic -enable-kvm -incoming tcp:0:4444 -object
memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages -mem-prealloc
-numa node,memdev=mem -trace events=/tmp/events -chardev
socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control
> 
> > I started Ubuntu just with console interface and gave to it only 1G of
> > RAM, inside Ubuntu I started stress command
> 
> > (stress --cpu 4 --io 4 --vm 4 --vm-bytes 256000000 &)
> > in such environment precopy live migration was impossible, it never
> > being finished, in this case it infinitely sends pages (it looks like
> > dpkg scenario).
> > 
> > Also I modified stress utility
> > http://people.seas.harvard.edu/~apw/stress/stress-1.0.4.tar.gz
> > due to it wrote into memory every time the same value `Z`. My
> > modified version writes every allocation new incremented value.
> 
> I use google's stressapptest normally; although remember to turn
> off the bit where it pauses.

I decided to use it too
stressapptest -s 300 -M 256 -m 8 -W

> 
> > I'm using Arcangeli's kernel only at the destination.
> > 
> > I got controversial results. Downtime for 1G hugepage is close to 2Mb
> > hugepage and it took around 7 ms (in 2Mb hugepage scenario downtime was
> > around 8 ms).
> > I made that opinion by query-migrate.
> > {"return": {"status": "completed", "setup-time": 6, "downtime": 6, 
> > "total-time": 9668, "ram": {"total": 1091379200, "postcopy-requests": 1, 
> > "dirty-sync-count": 2, "remaining": 0, "mbps": 879.786851, "transferred": 
> > 1063007296, "duplicate": 7449, "dirty-pages-rate": 0, "skipped": 0, 
> > "normal-bytes": 1060868096, "normal": 259001}}}
> > 
> > Documentation says about downtime field - measurement unit is ms.
> 
> The downtime measurement field is pretty meaningless for postcopy; it's only
> the time from stopping the VM until the point where we tell the destination it
> can start running.  Meaningful measurements are only from inside the guest
> really, or the place latencys.
>

Maybe improve it by receiving such information from destination?
I wish to do that.
> > So I traced it (I added additional trace into postcopy_place_page
> > trace_postcopy_place_page_start(host, from, pagesize); )
> > 
> > postcopy_ram_fault_thread_request Request for HVA=7f6dc0000000 
> > rb=/objects/mem offset=0
> > postcopy_place_page_start host=0x7f6dc0000000 from=0x7f6d70000000, 
> > pagesize=40000000
> > postcopy_place_page_start host=0x7f6e0e800000 from=0x55b665969619, 
> > pagesize=1000
> > postcopy_place_page_start host=0x7f6e0e801000 from=0x55b6659684e8, 
> > pagesize=1000
> > several pages with 4Kb step ...
> > postcopy_place_page_start host=0x7f6e0e817000 from=0x55b6659694f0, 
> > pagesize=1000
> > 
> > 4K pages, started from 0x7f6e0e800000 address it's
> > vga.ram, /address@hidden/acpi/tables etc.
> > 
> > Frankly saying, right now, I don't have any ideas why hugepage wasn't
> > resent. Maybe my expectation of it is wrong as well as understanding )
> 
> That's pretty much what I expect to see - before you get into postcopy
> mode everything is sent as individual 4k pages (in order); once we're
> in postcopy mode we send each page no more than once.  So you're
> huge page comes across once - and there it is.
> 
> > stress utility also duplicated for me value into appropriate file:
> > sec_since_epoch.microsec:value
> > 1487003192.728493:22
> > 1487003197.335362:23
> > *1487003213.367260:24*
> > *1487003238.480379:25*
> > 1487003243.315299:26
> > 1487003250.775721:27
> > 1487003255.473792:28
> > 
> > It mean rewriting 256Mb of memory per byte took around 5 sec, but at
> > the moment of migration it took 25 sec.
> 
> right, now this is the thing that's more useful to measure.
> That's not too surprising; when it migrates that data is changing rapidly
> so it's going to have to pause and wait for that whole 1GB to be transferred.
> Your 1Gbps network is going to take about 10 seconds to transfer that
> 1GB page - and that's if you're lucky and it saturates the network.
> SO it's going to take at least 10 seconds longer than it normally
> would, plus any other overheads - so at least 15 seconds.
> This is why I say it's a bad idea to use 1GB host pages with postcopy.
> Of course it would be fun to find where the other 10 seconds went!
> 
> You might like to add timing to the tracing so you can see the time between 
> the
> fault thread requesting the page and it arriving.
>
yes, sorry I forgot about timing
address@hidden:postcopy_ram_fault_thread_request Request for HVA=7f0280000000 
rb=/objects/mem offset=0
address@hidden:qemu_loadvm_state_section 8
address@hidden:loadvm_process_command com=0x2 len=4
address@hidden:qemu_loadvm_state_section 2
address@hidden:postcopy_place_page_start host=0x7f0280000000 
from=0x7f0240000000, pagesize=40000000

1487084823.315919 - 1487084818.270993 = 5.044926 sec.
Machines connected w/o any routers, directly by cable.

> > Another one request.
> > QEMU could use mem_path in hugefs with share key simultaneously
> > (-object 
> > memory-backend-file,id=mem,size=${mem_size},mem-path=${mem_path},share=on) 
> > and vm
> > in this case will start and will properly work (it will allocate memory
> > with mmap), but in case of destination for postcopy live migration
> > UFFDIO_COPY ioctl will fail for
> > such region, in Arcangeli's git tree there is such prevent check
> > (if (!vma_is_shmem(dst_vma) && dst_vma->vm_flags & VM_SHARED).
> > Is it possible to handle such situation at qemu?
> 
> Imagine that you had shared memory; what semantics would you like
> to see ?  What happens to the other process?

Honestly, initially, I thought to handle such error, but I quit forgot
about vhost-user in ovs-dpdk.

> Dave
> 
> > On Mon, Feb 06, 2017 at 05:45:30PM +0000, Dr. David Alan Gilbert wrote:
> > > * Dr. David Alan Gilbert (git) (address@hidden) wrote:
> > > > From: "Dr. David Alan Gilbert" <address@hidden>
> > > > 
> > > > Hi,
> > > >   The existing postcopy code, and the userfault kernel
> > > > code that supports it, only works for normal anonymous memory.
> > > > Kernel support for userfault on hugetlbfs is working
> > > > it's way upstream; it's in the linux-mm tree,
> > > > You can get a version at:
> > > >    git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> > > > on the origin/userfault branch.
> > > > 
> > > > Note that while this code supports arbitrary sized hugepages,
> > > > it doesn't make sense with pages above the few-MB region,
> > > > so while 2MB is fine, 1GB is probably a bad idea;
> > > > this code waits for and transmits whole huge pages, and a
> > > > 1GB page would take about 1 second to transfer over a 10Gbps
> > > > link - which is way too long to pause the destination for.
> > > > 
> > > > Dave
> > > 
> > > Oops I missed the v2 changes from the message:
> > > 
> > > v2
> > >   Flip ram-size summary word/compare individual page size patches around
> > >   Individual page size comparison is done in ram_load if 'advise' has been
> > >     received rather than checking migrate_postcopy_ram()
> > >   Moved discard code into exec.c, reworked ram_discard_range
> > > 
> > > Dave
> > 
> > Thank your, right now it's not necessary to set
> > postcopy-ram capability on destination machine.
> > 
> > 
> > > 
> > > > Dr. David Alan Gilbert (16):
> > > >   postcopy: Transmit ram size summary word
> > > >   postcopy: Transmit and compare individual page sizes
> > > >   postcopy: Chunk discards for hugepages
> > > >   exec: ram_block_discard_range
> > > >   postcopy: enhance ram_block_discard_range for hugepages
> > > >   Fold postcopy_ram_discard_range into ram_discard_range
> > > >   postcopy: Record largest page size
> > > >   postcopy: Plumb pagesize down into place helpers
> > > >   postcopy: Use temporary for placing zero huge pages
> > > >   postcopy: Load huge pages in one go
> > > >   postcopy: Mask fault addresses to huge page boundary
> > > >   postcopy: Send whole huge pages
> > > >   postcopy: Allow hugepages
> > > >   postcopy: Update userfaultfd.h header
> > > >   postcopy: Check for userfault+hugepage feature
> > > >   postcopy: Add doc about hugepages and postcopy
> > > > 
> > > >  docs/migration.txt                |  13 ++++
> > > >  exec.c                            |  83 +++++++++++++++++++++++
> > > >  include/exec/cpu-common.h         |   2 +
> > > >  include/exec/memory.h             |   1 -
> > > >  include/migration/migration.h     |   3 +
> > > >  include/migration/postcopy-ram.h  |  13 ++--
> > > >  linux-headers/linux/userfaultfd.h |  81 +++++++++++++++++++---
> > > >  migration/migration.c             |   1 +
> > > >  migration/postcopy-ram.c          | 138 
> > > > +++++++++++++++++---------------------
> > > >  migration/ram.c                   | 109 ++++++++++++++++++------------
> > > >  migration/savevm.c                |  32 ++++++---
> > > >  migration/trace-events            |   2 +-
> > > >  12 files changed, 328 insertions(+), 150 deletions(-)
> > > > 
> > > > -- 
> > > > 2.9.3
> > > > 
> > > > 
> > > --
> > > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> > > 
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK
> 

-- 

BR
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]