guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#52555] [RFC PATCH v2 0/5] Decentralized substitute distribution wit


From: Maxime Devos
Subject: [bug#52555] [RFC PATCH v2 0/5] Decentralized substitute distribution with ERIS
Date: Wed, 02 Feb 2022 16:27:52 +0100
User-agent: Evolution 3.38.3-1

pukkamustard schreef op wo 02-02-2022 om 12:42 [+0000]:
> > E.g., if the client cannot download the data in the range [start,
> > end]
> > because the corresponding block has disappeared, can it not simply
> > download that range from https://ci.guix.gnu.org/nar/[...]
> > (not sure about the URI) using a HTTP range request?
> 
> This does not work as the mapping from block reference to location in
> NAR can not be known by the client who only holds the ERIS
> URN.

The client not only knows the ERIS URN, it also knows the location of
the nar (over classical HTTP) because it's in the narinfo.

> Furthermore, some blocks will be intermediary nodes - they hold
> references to content blocks (or other intermediary nodes) but not
> content itself.

If an intermediary node (responsible for, say, bytes 900--10000)
is missing, then the bytes 900--10000 could be downloaded via HTTP.
Whether the node is close to the top, or close to the bottom, in ERIS'
variant of Merkle trees, doesn't matter much.

Granted, if the nar is, say, 1 GiB, and the top-level block is missing,
then we'll have to download 1 GiB over HTTP, even if most lower blocks
exist on IPFS/GNUnet/whatever, which isn't really great.

We could also do some combination of the GDBM database and HTTP
Content-Range requests: most nodes are leaf nodes (*).  Instead of
representing all nodes in the database, we could include only
(intermediate) nodes responsible for data of size, say, 4MiB.

(*) At least, that's the case for binary trees, presumably something
similar holds for ERIS.

I don't know the specifics for ERIS, but for (balanced) binary trees,
not storing the leaf nodes would save about 50% (**), which is a rather
nice space saving.

(**) This assumes the ‘block size’ is the size for storing two pointers
to the children, but in practice the block size would be quite a bit
larger, so there would be more space savings?

Perhaps we are overthinking things and the GDBM (***) database isn't
overly large, or perhaps missing blocks are sufficiently rare such that
we could simply download the _entire_ nar from classical HTTP in case
of missing blocks ...

(***) Guix uses SQlite databases, so I would use SQLite instead of GDBM
unless there's a compelling reason to use GDBM instead.

Greetings,
Maxime.

Attachment: signature.asc
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]