qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead o


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
Date: Mon, 28 Jan 2019 17:14:59 +0000

28.01.2019 19:53, Max Reitz wrote:
> On 28.01.19 17:44, Vladimir Sementsov-Ogievskiy wrote:
>> 28.01.2019 18:59, Max Reitz wrote:
>>> On 28.01.19 12:29, Vladimir Sementsov-Ogievskiy wrote:
>>>> 18.01.2019 17:56, Max Reitz wrote:
>>>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> 
> [...]
> 
>>>>>> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error 
>>>>>> **errp)
>>>>>>                 if (alloced < 0) {
>>>>>>                     ret = alloced;
>>>>>>                 } else {
>>>>>> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
>>>>>> +                    trace_backup_do_cow_skip(job, offset);
>>>>>> +                    continue; /* already copied */
>>>>>> +                }
>>>>>> +                if (!lock) {
>>>>>> +                    lock = bdrv_co_try_lock(s->source, offset, 
>>>>>> s->cluster_size);
>>>>>> +                    /*
>>>>>> +                     * Dirty bit is set, which means that there are no 
>>>>>> in-flight
>>>>>> +                     * write requests on this area. We must succeed.
>>>>>> +                     */
>>>>>> +                    assert(lock);
>>>>>
>>>>> What if I have a different parent node for the source that issues
>>>>> concurrent writes?  This used to work fine because the before_write
>>>>> notifier would still work.  After this patch, that would be broken
>>>>> because those writes would not cause a CbW.
>>>>
>>>> But haw could you have this different parent node? After appending filter,
>>>> there should not be such nodes.
>>>
>>> Unless you append them afterwards:
>>>
>>>> And I think, during backup it should be
>>>> forbidden to append new parents to source, ignoring filter, as it 
>>>> definitely
>>>> breaks what filter does.
>>>
>>> Agreed, but then this needs to be implemented.
>>>
>>>> And it applies to other block-job with their filters.
>>>> If we appended a filter, we don't want someone other to write omit our 
>>>> filter.
>>>>
>>>>>
>>>>> That's not so bad because we just have to make sure that all writes go
>>>>> through the backup-top node.  That would make this assertion valid
>>>>> again, too.  But that means we cannot share PERM_WRITE; see [1].
>>>>
>>>> But we don't share PERM_WRITE on source in backup_top, only on target.
>>>
>>> Are you sure?  The job itself shares it, and the filter shares it, too,
>>> as far as I can see.  It uses bdrv_filter_default_perms(), and that does
>>> seem to share PERM_WRITE.
>>
>> And in bdrv_Filter_default_perms it does "*nshared = *nshared | 
>> BLK_PERM_WRITE"
>> only for child_file, it is target. Source is child_backing.
> 
> Hm?  bdrv_filter_default_perms() does this, unconditionally:
> 
>>      *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) |
>>                 (c->shared_perm & DEFAULT_PERM_UNCHANGED);
> 
> The backup_top filter does what you describe, but it just leaves
> *nshared untouched after bdrv_filter_default_perms() has done the above.


Oops, yes, I meant my backup_top_child_perm(), not bdrv_filter_default_perms(), 
which it calls
too. So, OK, will try to fix it. Sorry for that long extra arguing :(

> 
> [...]
> 
>>>>>> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, 
>>>>>> BlockDriverState *bs,
>>>>>>     
>>>>>>         copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>>>>>>     
>>>>>> -    /* job->len is fixed, so we can't allow resize */
>>>>>> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
>>>>>> -                           BLK_PERM_CONSISTENT_READ,
>>>>>> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>>>>> -                           BLK_PERM_WRITE_UNCHANGED | 
>>>>>> BLK_PERM_GRAPH_MOD,
>>>>>> -                           speed, creation_flags, cb, opaque, errp);
>>>>>> -    if (!job) {
>>>>>> +    /*
>>>>>> +     * bdrv_get_device_name will not help to find device name starting 
>>>>>> from
>>>>>> +     * @bs after backup-top append,
>>>>>
>>>>> Why not?  Since backup-top is appended, shouldn't all parents of @bs be
>>>>> parents of @backup_top then?  (Making bdrv_get_parent_name() return the
>>>>> same result)
>>>>
>>>> bdrv_get_device_name goes finally through role->get_name, and only root 
>>>> role has
>>>> this handler. After append we'll have backing role instead of root.
>>>
>>> Ah, I see, I asked the wrong question.
>>>
>>> Why is block_job_create() called on bs and not on backup_top?  mirror
>>> calls it on mirror_top_bs.
>>
>> Good question. I don't exactly remember, may be there are were more troubles 
>> with
>> permissions or somthing. So, I've to try it again..
>>
>> What is more beneficial?
>>
>> My current approach, is that job and filter are two sibling users of source 
>> node,
>> they do copying, they are synchronized. And in this way, it is better to 
>> read from
>> source directly, to not create extra intersection between job and filter..
>>
>> On the other hand, if we read through the filter, we possible should do the 
>> whole
>> copy operation through the filter..
>>
>> What is the difference between guest read and backup-job read, in filter 
>> POV? I think:
>>
>> For guest read, filter MUST read (as we must handle guest request), and 
>> than, if
>> we don't have too much in-flight requests, ram-cache is not full, etc, we 
>> can handle
>> already read data in terms of backup, so, copy it somewhere. Or we can drop 
>> it, if
>> we can't handle it at the moment..
>>
>> For job read, we even MAY not read, if our queues are full, postponing job 
>> request.
>>
>> So
>>
>> Guest read: MUST read, MAY backup
>> Job read: MAY read and backup
>>
>> So, reading through filter has a possibility of common code path + native 
>> prioritization
>> of copy operations. This of course will need more refactoring of backup, and 
>> may be done
>> as a separate step, but definitely, I have to at least try to create job 
>> above the filter.
> 
> Well, as far as I see it, right now backup_top's read function is just a
> passthrough.  I don't see a functional difference between reading from
> backup_top and source, but the fact that you could save yourself the
> trouble of figuring out the job ID manually.
> 
> As for the RAM cache, I thought it was just a target like any other and
> backup_top wouldn't need to care at all...?

Not sure.. We can have RAM cache, backed by disk cache together with actual 
backup target
(nbd, for ex.).. And how should they all be finally connected is a question; I 
think, it
would be simpler to figure it out, when I start implementing it.

I hope that RAM cache should somehow be an alternative to inflight copy 
requests. RAM cache
is better, as it's data may be clearly reused by guest reads.. Hmm, may be 
we'll finish with
two jobs, one copying from active disk to RAM cache, and one from RAM cache to 
target? Or
RAM cache would be inside one backup job?

So again, I'd prefer to return to these questions after backup_top completed, 
as it's simpler
to discuss something that works.


-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]