[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] A Graph Database for secushare

From: amirouche
Subject: Re: [GNUnet-developers] A Graph Database for secushare
Date: Mon, 25 Mar 2019 00:55:56 +0100
User-agent: Roundcube Webmail/1.3.8

tl;dr: yes, I would like to join the secushare effort.

On 2019-03-24 16:16, carlo von lynX wrote:
On Fri, Mar 22, 2019 at 07:40:15PM +0100, amirouche wrote:
This is a follow up thread about my attempt to build a peer-to-peer graph
database on top of gnunet. My initial thoughts are documented at [0].

Oh, couple of years ago!

I did not notice the date. Time keeps rolling!

We too have recently been thinking of using a graph db

Yes, memes (file + keywords) describe a graph. Where a file
is a vertex (also called node) and keywords are edges (also
called links).

for our social graph in secushare rather than any other
data storage method,

There might be a confusion between the graph built out of memes
stored in gnu network and local storage. This is disconnected
somewhat. I have investigated so called graph databases also
known as graphdb namely neo4j and tinkerpop an those are not
suitable for building social network applications. I am not
even sure they are useful for something. They gained popularity
because nosql, because big data, because lots of graph things
floating around and because massive marketing from neo4j.

Simply said graphdbs are not general purpose databases.

Anyway, after much investigations I built a (new) kind of database
that is inspired mostly from datomic that is itself inspired by
Resource Description Framework.

If you read about datomic, do not loose time on transactor,
datalog et al. The most interesting stuff is at [0] where
you learn that Entity-Attribute-Value model is not that bad.


The difference with datomic is that (it is written in Scheme
instead of clojure and) it rely on an embedded database library
called wiredtiger written in C. Instead of a single branch
history, it use a direct-acyclic-graph like git. That is, it is
a branch-able database that you could push to and pull from.
Even if that is not implemented, yet.

Later, one can imagine a mode of operation where not everything
is stored in the versioned database, somewhat similar to private
mode in web browsers. See [1] for ideas on how to be GDPR compliant.


Based on nano benchmarks, performance are not bad, when the data
is in case, around 20ms, two to five times faster than similar
non versioned triple store written in java. Space wise, I don't
know exactly, but it is rather hungry (more that 4 times the size
of the stored data). There is optimization that could be done but
I am not sure, it depends on how full-text and spatio-temporal
indexing is handled. One need to answer the question:

Is it possible to query revised aka. historic aka. old data along time and space?

Long story short, I am not using a graphdb, I am will use a
versioned quad store that preserves lexicographic ordering.

My current idea is to use gnunet fs to share small files(around 1MB) that would be called meme (or atom or something else, I am not sure about the naming). And rely on keywords to link it to others. Keywords would be the gnunet URI of other
meme files.

Our idea is to feed the social graph with events coming down many
pubsub multicast trees.

I read about "pubsub multicast trees" on the secushare website.
What is it?

An architecture which should provide better realtimeness as we
would need, given we want to replace Facebook, Whatsapp, Skype
and such.

Almost perfect! You forget to mention wikipedia and wikidata :)

But still it's just social data coming from the network and
needing to go into a graphdb, so much of the work may actually
be identical?

Yes. We need to figure how to cooperation will happen, like explained
above I pored lot of time on the storage and I really don't want
to leave that behind to use a neo4j or sqlite. That said, I might
not have enough clue about the project and cooperation could happen
without involving my database or scheme.

Also, a social network running on gnunet-fs could already serve
some purposes or turn out useful in addition to the push-oriented
delivery method.

I don't understand what it means.

It could be enough for a first secushare release, with a kind warning
not to expect realtimeness,

I don't expect realtimeness for the time being. What I want
to do first is a clone of quora or stackoverflow because that
is a thing that doesn't exist in the fediverse.

but it sure helps in dealing with upload filters.  :D

What are "upload filters"?

So there is two ways to discover new content in this database.

In our social graph use case the discovery runs over several
layers of "friendship" connections. Say you're using secushare
for the first time - somebody talked you into trying it out
so you have that person as your first contact. Your secushare
implementation will automatically collect basic knowledge on
your contacts first and maybe second degree of contacts.

Like I said, previously, I am also looking forward implementing
a basic full text search. Actually, I already have a prototype,
it required some more love to handle structured query. Think
like elastic search but embedded in the versioned quadstore.

Once all of that info has become available on your computer
it contains all kinds of hints about content, like your
contact's friend Richie is a DJ and has published some mixtapes.

Yes. It requires structured search too be able to query things like
give me all *mixtapes* that are about *freedom* from *2018*.

The search operation becomes a localhost traversal of the
graphdb, so it is cheap and easy.

Traversal is very expensive actually if you want to
implement full-text search directly in the graphdb. One can
alsouse sqlite fts extension. That is exactly the problem
with existing graphdb, if you need to index a specific
datatype like time segment or geographical or text you need
to rely on elasticsearch or else.

There is also ArangoDB which might be of some help. I dislike
their query language (otherwise said I prefer scheme). Maybe
it is possible to use it as a library, embedded in the
processus? MongoDB added that feature recently.

Also you can just go and browse the graphdb for interesting content,
like a flea market of friends' friends.

An improvement over this setup will be to make use of pub-sub (via cadet?) to live stream changes to the database. I am not there yet. First, I want the meme network
to be a useable slow network (like a newsgroup or mailing list).

So you thought the same way. Well then, welcome to the
secushare working group maybe?

Yes, definitely. Can you tell me more about psyc, I read it is similar
to XMPP and ActivityStream and that it replace XML and JSON or something
like that?

What I was thinking today, is that I need to define some higher level
primitives that would allow to more easily implement social applications,
and ease cooperation between them.

I (we?) need to:

A) Settle on a file format to distribute memes on gnunet-fs.
   I don't have strong opinion on the subject. I like scheme
   expressions :) but I think, as of yet, there is not secure
   s-expr parser out there

B) Define the vocabulary. I think it is better to re-use
   an existing vocabulary, maybe ActivityStream is a good
   vocabulary. I am clueless about the subject! Maybe it
   is better to start without vocabulary and rework the thing
   if a vocabulary is good enough?

C) How to link a meme to a gnunet identiy? Does it means that
   the public key is published in the meme? IIUC it will then
   allow to grab the GNS records associated with that identity.

D) Define methods on memes. I think about:

   - publish, that would allow to seed a meme on the network
     with the original meme keywords (included in the

   - flag, that would publish a specific meme that gives some
     credit to a given meme. It can be bad it can be good e.g.
     fake news, offensive, SFW, NSFW, like etc... This could be
     added to the set of gnunet-fs keywords that are published
     as part of the original meme. The meme flag, could be shared
     or not.

   - share, that would publish the meme in the identity public
     profile. and publish it.

   - unindex, like the existing gnunet-unindex feature. (I am not
     sure how this will work with the versioned database. It is
     possible to remove things from history but that would require
     to rewrite the database history where the meme is involved.
     I would have liked to avoid rewriting history feature.)

   - subscribe, would allow to stay aware of any activity regarding
     the meme. Like new memes published and linked to it. This would
     trigger a gnunet-search call.

E) Identities methods:

   - download, would fetch recursively the identity map and all
     its content that is not marked as 'no-follow' (or something)
     to avoid to automatically downloading very big files.

   - subscribe, will connect to identity cadet (or pubsub) to be
     notified of new memes.

   - flag, similar to the meme flag but about the identity's meme.

   - follow: share a meme on public profile that you are following
     that identity. It will be part of the identities map.

Also, I must keep in mind that the system must be able to work
offline and synchronized when Internet is back.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]