duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Fwd: AssertionError on every attempt


From: Bruce Merry
Subject: Re: [Duplicity-talk] Fwd: AssertionError on every attempt
Date: Tue, 16 Jun 2015 13:34:39 +0200

I've made a go at the ID cache approach I suggested below. I'm
actually not sure if it's needed, since I can't find any docs that
indicate what consistency guarantees are in Drive (as opposed to Cloud
Storage), but it may also speed up operations (or not - it always
conservatively validates the metadata). Does anyone have any evidence
either way? At the moment the cache is just in-memory rather than
persistent.

I've also incorporated some of the ideas from Rupert's patch when the
cache is missed i.e. querying by title instead of getting a full
listing and filtering on the client side, and also the fix to allow
files to be overwritten instead of creating a new file with the same
filename. I haven't added any facilities to delete multiple
identically-named files - I think it will either delete one copy (if
found in the cache) or error out (if not), but it's untested.

The code is at https://code.launchpad.net/~bmerry/duplicity/pydrive-id-cache.
If the opinion is that this approach is usable I can work on the
shortcomings mentioned above.

Regarding the discussion on _delete, I've left it as a warning if the
file doesn't exist rather than a BackendException, pending Michael's
suggested changes to the delete_list wrapper.

Bruce

On 10 June 2015 at 20:10, Bruce Merry <address@hidden> wrote:
> On 10 June 2015 at 16:16, Tim Fletcher <address@hidden> wrote:
>> I suspect that this is due to the Google storage back-end having difference
>> constancy guarantees for single objects vs directory listings.
>>
>> See https://cloud.google.com/storage/docs/concepts-techniques#consistency
>
> Thanks, that's an interesting link. It's describing Cloud rather than
> Drive, but it wouldn't surprise me if Drive is similar i.e. an object
> store with a filesystem duct-taped on.
>
> That makes me think that maximum robustness would be achieved by
> having duplicity reference IDs internally and only use filenames for
> presentation to the user. That sounds like it would need major
> architectural changes though, since the list of IDs forming a backup
> set would need to be recorded as part of the backup, instead of being
> discovered from a directory listing.
>
> A halfway point might be to have the client keep its own filename<->ID
> cache in the Duplicity cache directory. Operations would need to query
> the object by ID to validate the cache entry, but I think this would
> allow for strong consistency in cases where the same client is doing
> the accesses (as is the case when an upload is immediately followed by
> a query - different clients are more likely to be separated in time).
>
> I can probably have a go at implementing that the next week or two.
> Are there helper functions I should look at for the backend to
> discover where the cache directory for the backup lives? And any
> preferences for the format of the cache file? My personal inclination
> would be to go for sqlite to get all the nice safety guarantees that
> gives over just a pickle/yaml/json/xml/whatever file, but that would
> introduce a dependency.
>
> Bruce
> --
> Dr Bruce Merry
> bmerry <@> gmail <.> com
> http://www.brucemerry.org.za/
> http://blog.brucemerry.org.za/



-- 
Dr Bruce Merry
bmerry <@> gmail <.> com
http://www.brucemerry.org.za/
http://blog.brucemerry.org.za/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]