[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation
From: |
Juan Quintela |
Subject: |
Re: [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation |
Date: |
Tue, 03 Nov 2015 19:32:50 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
"Dr. David Alan Gilbert" <address@hidden> wrote:
> * Juan Quintela (address@hidden) wrote:
>> "Dr. David Alan Gilbert (git)" <address@hidden> wrote:
>> > From: "Dr. David Alan Gilbert" <address@hidden>
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <address@hidden>
>> > Reviewed-by: David Gibson <address@hidden>
>> > Reviewed-by: Amit Shah <address@hidden>
>>
>> > +/*
>> > + * At the end of migration, undo the effects of init_range
>> > + * opaque should be the MIS.
>> > + */
>> > +static int cleanup_range(const char *block_name, void *host_addr,
>> > + ram_addr_t offset, ram_addr_t length, void
>> > *opaque)
>> > +{
>> > + MigrationIncomingState *mis = opaque;
>> > + struct uffdio_range range_struct;
>> > + trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
>> > +
>> > + /*
>> > + * We turned off hugepage for the precopy stage with postcopy enabled
>> > + * we can turn it back on now.
>> > + */
>> > +#ifdef MADV_HUGEPAGE
>> > + if (madvise(host_addr, length, MADV_HUGEPAGE)) {
>> > + error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
>> > + return -1;
>> > + }
>> > +#endif
>>
>> this should be the same than:
>>
>> qemu_madvise(host_addr, lenght, QEMU_MADV_HUGEPAGE);
>>
>> Only problem I can see, is that there is no way to differentiate that
>> madvise() has given one error or that MADV_HUGEPAGE is not defined.
>>
>> If we really want that:
>>
>> if (QEMU_MADV_HUGEPAGE != QEM_MADV_INVALID) {
>> if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
>> error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
>> return -1;
>> }
>>
>> But I am not sure if we want it.
>
> Yes, so what I've currently got is:
>
> if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
> error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
> return -1;
> }
>
> I'm tempted to add that if check, but the other similar case
> is where you have headers that define HUGEPAGE, but a kernel built without
> it, and in that case the madvise fails, which is a shame, since if the
> kernel hasn't actually got transparent hugepages, then we don't
> care if it failed to turn them on/off - but there doesn't seem
> to be a good way to tell that.
>
>> > + /*
>> > + * We can also turn off userfault now since we should have all the
>> > + * pages. It can be useful to leave it on to debug postcopy
>> > + * if you're not sure it's always getting every page.
>> > + */
>> > + range_struct.start = (uintptr_t)host_addr;
>> > + range_struct.len = length;
>> > +
>> > + if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
>> > + error_report("%s: userfault unregister %s", __func__,
>> > strerror(errno));
>> > +
>> > + return -1;
>> > + }
>> > +
>> > + return 0;
>> > +}
>>
>>
>> I still think that exposing the userfault API all around is a bad idea,
>> that it would be easier to just export:
>>
>> qemu_userfault_register_range(addr, lenght);
>> qemu_userfault_unregister_range(addr, lenght);
>>
>> And hide the details on a header file.
>
> That only hides a tiny bit of the detail;
> for example the ioctl's for UFFDIO_COPY and UFFDIO_ZEROPAGE, have semantics
> associated with them (that they also wake the waiting process for example)
> it's not obvious that another OS would implement it in a similar way
> or what the constraints on it would be. Indeed the previous kernel API
> we had, meant I had to do a lot more work with a similar set of calls
> in userspace. Most of the places where we pull this out into separate
> headers/libraries is where we have an interface that's the same across
> a bunch of different OSs but the detail is different. Currently we only
> have one interaface and no idea what the commonality would be, or how
> much of the semantics that's in postcopy-ram.c would need to move with
> that interface as well.
ok, if it is too difficult (I didn't knew about the associated
semantics), we will wait until something else implemented a similar
interface.
Thanks, Juan.