qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-c


From: Peter Maydell
Subject: Re: [Qemu-devel] [PULL 42/57] Page request: Consume pages off the post-copy queue
Date: Thu, 12 Nov 2015 12:57:28 +0000

On 12 November 2015 at 12:23, Dr. David Alan Gilbert
<address@hidden> wrote:
> * Peter Maydell (address@hidden) wrote:
>> On 12 November 2015 at 12:04, Dr. David Alan Gilbert
>> <address@hidden> wrote:
>> > * Peter Maydell (address@hidden) wrote:
>> >> On 10 November 2015 at 14:25, Juan Quintela <address@hidden> wrote:
>> >> > From: "Dr. David Alan Gilbert" <address@hidden>
>> >> >
>> >> > When transmitting RAM pages, consume pages that have been queued by
>> >> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
>> >> >
>> >> > Note:
>> >> >   a) After a queued page the linear walk carries on from after the
>> >> > unqueued page; there is a reasonable chance that the destination
>> >> > was about to ask for other closeby pages anyway.
>> >> >
>> >> >   b) We have to be careful of any assumptions that the page walking
>> >> > code makes, in particular it does some short cuts on its first linear
>> >> > walk that break as soon as we do a queued page.
>> >> >
>> >> >   c) We have to be careful to not break up host-page size chunks, since
>> >> > this makes it harder to place the pages on the destination.
>> >> >
>> >> > Signed-off-by: Dr. David Alan Gilbert <address@hidden>
>> >> > Reviewed-by: Juan Quintela <address@hidden>
>> >> > Signed-off-by: Juan Quintela <address@hidden>
>> >>
>> >> I've just discovered that this is causing 'make check' failures on
>> >> my OSX host (unfortunately something in my setup is causing
>> >> 'make check' failures to not always cause a build failure, so I
>> >> didn't notice earlier):
>> >
>> > It's only failing on OSX? Every time or only sometimes?
>>
>> Only OSX, and always. I think OSX is pickier about mutexes really
>> needing to be initialized before use.
>
> OK, at least an 'always' should be easier to debug.
>
>> > If you can find a way to get a backtrace off that qemu_mutex_lock case
>> > that would be great; I'd assume the later errors are the fall out from 
>> > that.
>>
>> I'll have a look after lunch, but it's usually painful to get a
>> backtrace out of this kind of qtest, because it's clearly starting
>> a whole pile of QEMUs and there's no way I know of to say "only
>> run a few of these tests, not the whole huge pile".
>
> You could add an abort/assert into util/qemu-thread-posix.c qemu_mutex_lock
> in the error path.

abort/assert doesn't print a backtrace.

I added some OSX backtrace-gathering/printing functions to the errorpath,
and got this:

0   qemu-system-x86_64                  0x000000010c66d203 qemu_mutex_lock + 83
1   qemu-system-x86_64                  0x000000010c2ac7af unqueue_page + 47
2   qemu-system-x86_64                  0x000000010c2ac386 get_queued_page + 54
3   qemu-system-x86_64                  0x000000010c2ac135
ram_find_and_save_block + 165
4   qemu-system-x86_64                  0x000000010c2ab5a2
ram_save_iterate + 130
5   qemu-system-x86_64                  0x000000010c2afa2e
qemu_savevm_state_iterate + 302
6   qemu-system-x86_64                  0x000000010c53acbb
migration_thread + 571
7   libsystem_pthread.dylib             0x00007fff9146c05a _pthread_body + 131
8   libsystem_pthread.dylib             0x00007fff9146bfd7 _pthread_body + 0
9   libsystem_pthread.dylib             0x00007fff914693ed thread_start + 13


>
> Could you also add:
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 9bd2ce7..85e5766 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -93,6 +93,7 @@ MigrationState *migrate_get_current(void)
>      };
>
>      if (!once) {
> +        fprintf(stderr,"migrate_get_current do init of current_migration 
> %d\n", getpid());
>          qemu_mutex_init(&current_migration.src_page_req_mutex);
>          once = true;
>      }
> diff --git a/migration/ram.c b/migration/ram.c
> index 4266687..72b46f2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1036,6 +1036,7 @@ static RAMBlock *unqueue_page(MigrationState *ms, 
> ram_addr_t *offset,
>  {
>      RAMBlock *block = NULL;
>
> +    fprintf(stderr,"unqueue_page %d\n", getpid());
>      qemu_mutex_lock(&ms->src_page_req_mutex);
>      if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
>          struct MigrationSrcPageRequest *entry =
>
>
> and make sure that the init happens before the first unqueue (you'll get
> loads of calls to unqueue).

With that change, plus the backtracing:

/x86_64/ahci/flush/retry: OK
/x86_64/ahci/flush/migrate: migrate_get_current do init of
current_migration 60427
migrate_get_current do init of current_migration 60428
unqueue_page 60427
0   qemu-system-x86_64                  0x0000000101a751c3 qemu_mutex_lock + 83
1   qemu-system-x86_64                  0x00000001016b4749 unqueue_page + 89
2   qemu-system-x86_64                  0x00000001016b42f6 get_queued_page + 54
3   qemu-system-x86_64                  0x00000001016b40a5
ram_find_and_save_block + 165
4   qemu-system-x86_64                  0x00000001016b3512
ram_save_iterate + 130
5   qemu-system-x86_64                  0x00000001016b79be
qemu_savevm_state_iterate + 302
6   qemu-system-x86_64                  0x0000000101942c7b
migration_thread + 571
7   libsystem_pthread.dylib             0x00007fff9146c05a _pthread_body + 131
8   libsystem_pthread.dylib             0x00007fff9146bfd7 _pthread_body + 0
9   libsystem_pthread.dylib             0x00007fff914693ed thread_start + 13
qemu: qemu_mutex_lock: Invalid argument
qemu-system-x86_64:Broken pipe
 Not a migration stream
qemu-system-x86_64: load of migration failed: Invalid argument

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]