[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encoding for Robust Immutable Storage (ERIS)
Re: Encoding for Robust Immutable Storage (ERIS)
Sun, 26 Jul 2020 19:28:49 +0200
mu4e 1.4.10; emacs 26.3
Thank you for your comments!
For my taste, the block size is much too small. I understand 4k
sense for page tables and SATA, but looking at benchmarks 4k is
too small to maximize SATA throughput. I would also worry about
4k for a
request size in any database or network protocol. The overheads
request are still too big for modern hardware. You could easily
8k, which could be justified with 9k jumbo frames for Ethernet
at least also utilitze all of the bits in your paths. The 32k
are close to the 64k which are reportedly the optimum for modern
media. IIRC Torrents even use 256k.
I agree that increasing block size makes sense for improving
in storage and transport.
The overhead from padding may be
large for very small files if you go beyond 4k, but you should
think in terms of absolute overhead: even a 3100% overhead
change the fact that the absolute overhead is tiny for a 1k
The use-case I have in mind for ERIS is very small pieces of data
even small files). Examples include ActivityStreams objects or
Apparently the average size of individual ActivityStreams objects
less than 1kB (unfortunately I don't have the data to back this
I agree that the overhead of 3100% for a single 1kB object is
acceptable. But I would argue that an overhead of 3100% for very
1kB objects is not. The difference might be a 32 GB database
a 1 GB database.
Furthermore, you should consider a trick we use in GNUnet-FS,
that we share *directories*, and for small files, we simply
full file data in the meta data of the file that is stored with
directory or search result. So you can basically avoid having to
download tiny files as separate entities, so for files <32k we
overhead this way.
That makes a lot of sense.
But packing multiple objects into a single transport packet or
for storage on disk/in database works for small block sizes as
optimization just happens at a "different layer".
The key value I see in having small block sizes is that tiny
data can be individually referenced and used (securely).
I'd be curious to see how much the two pass encoding costs in
-- it might be less expensive than ECRS if you are lucky
big block being cheaper than many small hash operations), or
expensive if you are unlucky (have to actually read the data
disk). I am not sure that it is worth it merely to reduce the
hashes/keys in the non-data blocks. Would be good to have some
this, for various file sizes and platforms (to judge IO/RAM
effects). As I said, I can't tell for sure if the 2nd pass is
free or quite expensive -- and that is an important detail.
with a larger block size, the overhead of an extra key in the
blocks could be quite acceptable.
I think the cost of the two-pass encoding in ERIS is quite
Considering that the hash of the individual blocks also needs to
computed (as reference in parent nodes), I think ECRS will always
Maybe the answer is not ECRS or ERIS but ECRS and ERIS. ECRS for
pieces of data, where it makes more sense to have large block size
single-pass encoding. And ERIS for (very many) small pieces of
where a 3100% overhead is too much but the performance penalty is
acceptable and size of data is much smaller than memory.
There might be some heuristic that says: If data is larger than
ECRS, else use ERIS and you get the verification capability.
If using ECRS, you can add the verification capability by encoding
list of all the hash references to the ECRS block with ERIS. The
read capability of this list of ECRS block is enough to verify the
integrity of the original ECRS encoded content (without revealing
What do you think?
For 3.4 Namespaces, I would urge you to look at the GNU Name
(GNS). My plan is to (eventually, when I have way too much time
could actually re-do FS...) replace SBLOCKS and KBLOCKS of ECRS
basically only GNS.
I have been looking into it. It does seem to be a perfect
The crypto is way above my head and using readily available and
implemented primitives would make implementation much easier for
I understand the need for "non-standard" crypto and am following
On 7/10/20 8:59 AM, pukkamustard wrote:
I'd like to request feedback, questions and comments on an
content very much inspired by ECRS that I have been working on:
for Robust Immutable Storage (ERIS)
The motivation is to use the encoding in a social network like
where short messages and interactions are encoded using ERIS
There is one major difference to ECRS (and a couple smaller
ones) that I
would like to highlight:
** Verification capability
ERIS adds a verification capability. Holders of the
capability can enumerate all blocks required to decode the
verify integrity of the blocks without being able to decode the
This enables peers to cache the entire content without being
read the content.
The verification capability is enabled by using two keys:
1. A read key to encode the blocks holding content.
2. A verification key (which is deterministically derived from
key) to encode the intermediary nodes of the Merkle tree.
This makes the scheme slightly more complicated than ECRS and
requires a two-pass encoding (when using convergent
Nevertheless I believe this is a very important feature that
results in a better privacy/complexity/availability trade-off
to in a previous thread
** Block size
Block size is chosen to be 4kB. This an optimization towards
content (short messages and social interactions).
Encoded content can be referred to by a URN making it usable
existing Web (and RDF) settings. This could be added to ECRS.
** No namespacing / keyword search
There are currently no SBlock or KBlock like features. The idea
these features can be built on-top of the base encoding
SBlock and KBlock).
https://openengiadina.gitlab.io/js-eris/ . As well as
I'd be very happy for your insight and feedback.