[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#33899] [PATCH 0/5] Distributing substitutes over IPFS
From: |
Ludovic Courtès |
Subject: |
[bug#33899] [PATCH 0/5] Distributing substitutes over IPFS |
Date: |
Fri, 18 Jan 2019 10:52:49 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Hello,
Hector Sanjuan <address@hidden> skribis:
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, January 14, 2019 2:17 PM, Ludovic Courtès <address@hidden> wrote:
[...]
>> Yes, I’m well aware of “unixfs”. The problems, as I see it, is that it
>> stores “too much” in a way (we don’t need to store the mtimes or
>> permissions; we could ignore them upon reconstruction though), and “not
>> enough” in another way (the executable bit is lost, IIUC.)
>
> Actually the only metadata that Unixfs stores is size:
> https://github.com/ipfs/go-unixfs/blob/master/pb/unixfs.proto and by all
> means the amount of metadata is negligible for the actual data stored
> and serves to give you a progress bar when you are downloading.
Yes, the format I came up with also store the size so we can eventually
display a progress bar.
> Having IPFS understand what files are part of a single item is important
> because you can pin/unpin,diff,patch all of them as a whole. Unixfs
> also takes care of handling the case where the directories need to
> be sharded because there are too many entries.
Isn’t there a way, then, to achieve the same behavior with the custom
format? The /api/v0/add entry point has a ‘pin’ argument; I suppose we
could leave it to false except when we add the top-level “directory”
node? Wouldn’t that give us behavior similar to that of Unixfs?
> When the user puts the single root hash in ipfs.io/ipfs/<hash>, it
> will display correctly the underlying files and the people will be
> able to navigate the actual tree with both web and cli.
Right, though that’s less important in my view.
> Note that every file added to IPFS is getting wrapped as a Unixfs
> block anyways. You are just saving some "directory" nodes by adding
> them separately.
Hmm weird. When I do /api/v0/add, I’m really just passing a byte
vector; there’s no notion of a “file” here, AFAICS. Or am I missing
something?
>> > It will probably need some trial an error to get the multi-part right
>> > to upload all in a single request. The Go code HTTP Clients doing
>> > this can be found at:
>> > https://github.com/ipfs/go-ipfs-files/blob/master/multifilereader.go#L96
>> > As you see, a directory part in the multipart will have the content-type
>> > Header
>> > set to "application/x-directory". The best way to see how "abspath" etc is
>> > set
>> > is probably to sniff an `ipfs add -r <testfolder>` operation
>> > (localhost:5001).
>> > Once UnixFSv2 lands, you will be in a position to just drop the sexp file
>> > altogether.
>>
>> Yes, that makes sense. In the meantime, I guess we have to keep using
>> our own format.
>>
>> What are the performance implications of adding and retrieving files one
>> by one like I did? I understand we’re doing N HTTP requests to the
>> local IPFS daemon where “ipfs add -r” makes a single request, but this
>> alone can’t be much of a problem since communication is happening
>> locally. Does pinning each file separately somehow incur additional
>> overhead?
>>
>
> Yes, pinning separately is slow and incurs in overhead. Pins are stored
> in a merkle tree themselves so it involves reading, patching and saving. This
> gets quite slow when you have very large pinsets because your pins block size
> grow. Your pinset will grow very large if you do this. Additionally the
> pinning operation itself requires global lock making it more slow.
OK, I see.
> But, even if it was fast, you will not have a way to easily unpin
> anything that becomes obsolete or have an overview of to where things
> belong. It is also unlikely that a single IPFS daemon will be able to
> store everything you build, so you might find yourself using IPFS Cluster
> soon to distribute the storage across multiple nodes and then you will
> be effectively adding remotely.
Currently, ‘guix publish’ stores things as long as they are requested,
and then for the duration specified with ‘--ttl’. I suppose we could
have similar behavior with IPFS: if an item hasn’t been requested for
the specified duration, then we unpin it.
Does that make sense?
Thanks for your help!
Ludo’.