[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [GNUnet-developers] A Graph Database for secushare
carlo von lynX
Re: [GNUnet-developers] A Graph Database for secushare
Wed, 27 Mar 2019 20:33:48 +0100
On Mon, Mar 25, 2019 at 12:55:56AM +0100, amirouche wrote:
> tl;dr: yes, I would like to join the secushare effort.
good summary, as it is a long read below :D
sorry for delay... other things on my mind...
other secusharors are welcome to chip into this conversation!!
> There might be a confusion between the graph built out of memes
> stored in gnu network and local storage. This is disconnected
> somewhat. I have investigated so called graph databases also
Yes, we seem to be thinking of graphdb on different layers.
Sounds like topic for debate in mumble and gitwiki.
> known as graphdb namely neo4j and tinkerpop an those are not
> suitable for building social network applications. I am not
We too got the impression that several products are actually
not suitable. We haven't picked any tool yet.
> Simply said graphdbs are not general purpose databases.
Some can be.. like the graphdb on top of postgresql.
You can walk the graph, then access elements the old way.
> Anyway, after much investigations I built a (new) kind of database
> that is inspired mostly from datomic that is itself inspired by
> Resource Description Framework.
One more thing to examine.
> Long story short, I am not using a graphdb, I am will use a
> versioned quad store that preserves lexicographic ordering.
Beyond me right now.
> >Our idea is to feed the social graph with events coming down many
> >pubsub multicast trees.
> I read about "pubsub multicast trees" on the secushare website.
> What is it?
If https://secushare.org/pubsub is not an answer, what can be
a more precise question?
> >An architecture which should provide better realtimeness as we
> >would need, given we want to replace Facebook, Whatsapp, Skype
> >and such.
> Almost perfect! You forget to mention wikipedia and wikidata :)
Or anything that is centralised and cloud-based...
> >But still it's just social data coming from the network and
> >needing to go into a graphdb, so much of the work may actually
> >be identical?
> Yes. We need to figure how to cooperation will happen, like explained
> above I pored lot of time on the storage and I really don't want
> to leave that behind to use a neo4j or sqlite. That said, I might
> not have enough clue about the project and cooperation could happen
> without involving my database or scheme.
Ok. Let's talk.
> >Also, a social network running on gnunet-fs could already serve
> >some purposes or turn out useful in addition to the push-oriented
> >delivery method.
> I don't understand what it means.
Pubsub is a push paradigm: you get things immediately as they
happen. You don't remain stuck on some old cache of things
or see the updates only as they slowly propagate on demand.
> >It could be enough for a first secushare release, with a kind warning
> >not to expect realtimeness,
> I don't expect realtimeness for the time being. What I want
> to do first is a clone of quora or stackoverflow because that
> is a thing that doesn't exist in the fediverse.
Yeah, would be cool if gnunet-fs delivers that!
> >but it sure helps in dealing with upload filters. :D
> What are "upload filters"?
Sarcastic allusion to the copywrong reform the European
parliament has ratified yesterday.
> Like I said, previously, I am also looking forward implementing
> a basic full text search. Actually, I already have a prototype,
> it required some more love to handle structured query. Think
> like elastic search but embedded in the versioned quadstore.
In our idea, all the information to look for is already available
in local memory. Did you do something that searches the gnunet,
somewhat like Grothoff's 'regex' ?
> Yes. It requires structured search too be able to query things like
> give me all *mixtapes* that are about *freedom* from *2018*.
Yes, that's what I would want from a localhost graphdb.
> >The search operation becomes a localhost traversal of the
> >graphdb, so it is cheap and easy.
> Traversal is very expensive actually if you want to
> implement full-text search directly in the graphdb. One can
No, we need a thing that can be
1. walked along the graph, like when you click on friends of a friend
2. find spots in the graph based on full text index
> Yes, definitely. Can you tell me more about psyc, I read it is similar
> to XMPP and ActivityStream and that it replace XML and JSON or something
> like that?
PSYC is the combination of actual multicast distribution (which
XMPP and ActivityStream dramatically lack) with an efficient
syntax that doesn't get in the way like XML or JSON.
> What I was thinking today, is that I need to define some higher level
> primitives that would allow to more easily implement social
> and ease cooperation between them.
> I (we?) need to:
> A) Settle on a file format to distribute memes on gnunet-fs.
> I don't have strong opinion on the subject. I like scheme
> expressions :) but I think, as of yet, there is not secure
> s-expr parser out there
I would use PSYC packet format, as gnunet-fs is just a
different transport than psyc-multicast. Some examples
in that benchmark doc.
> B) Define the vocabulary. I think it is better to re-use
> an existing vocabulary, maybe ActivityStream is a good
> vocabulary. I am clueless about the subject! Maybe it
> is better to start without vocabulary and rework the thing
> if a vocabulary is good enough?
We have been using PSYC vocabulary for >20 years now..
it can be updated and integrated with ActivityStreams.
This is the easy part of the job, really. Piece of cake.
> C) How to link a meme to a gnunet identiy? Does it means that
> the public key is published in the meme? IIUC it will then
> allow to grab the GNS records associated with that identity.
We have discussed that in depth in our gitwiki. Since our
system is privacy-oriented, data is NOT published in the
clear to the network - it is encrypted in such a way that
only the intended subscribers can see it. Then, within that
encryption it may of course be signed by the author, unless
authorship is already implicit by the distribution method.
> D) Define methods on memes. I think about:
> - publish, that would allow to seed a meme on the network
> with the original meme keywords (included in the
Anything that isn't an (un)subscribe is a publish, no?
> - flag, that would publish a specific meme that gives some
> credit to a given meme. It can be bad it can be good e.g.
> fake news, offensive, SFW, NSFW, like etc... This could be
> added to the set of gnunet-fs keywords that are published
> as part of the original meme. The meme flag, could be shared
> or not.
If used in the clear this already exposes too much information
about the content. Meta information is either sent over the
same channel as the original data, or in a different channel
"commenting" the original channel.
> - share, that would publish the meme in the identity public
> profile. and publish it.
All pubsubs belong to someone. Depending on which role they
have, things published in them appear somewhere, like they
may define what's written in a person's profile, or they may
be status updates, or they may be info where this person is
GPS-located (best friends only).
> - unindex, like the existing gnunet-unindex feature. (I am not
> sure how this will work with the versioned database. It is
> possible to remove things from history but that would require
> to rewrite the database history where the meme is involved.
> I would have liked to avoid rewriting history feature.)
Oh okay, now I get it. PSYC has operators for what you describe.
Things only exist in a message history by default, which expires
over time or requirements or wishes, and *within* these messages
some parts only exist for a specific message whereas others
make changes to the persistent storage with '+' and '-' operations.
So if you want to persist your phone number, it may be a
+_contact_telephone setting being sent over an @profile channel.
Whoever is entitled to receive that channel can then see your
phone number on your social profile. Whoever isn't, has a
different view of your profile.
> - subscribe, would allow to stay aware of any activity regarding
> the meme. Like new memes published and linked to it. This would
> trigger a gnunet-search call.
There are no one-time uses and peeks into PSYC channels, since
they build incrementally on the history of message to create the
channel state. It could be emulated, but the regular procedure is
to subscribe if you are entitled to. Meta-information on channels
that may be of interest is elsewhere.
> E) Identities methods:
> - download, would fetch recursively the identity map and all
> its content that is not marked as 'no-follow' (or something)
> to avoid to automatically downloading very big files.
Doesn't compute for me.
> - subscribe, will connect to identity cadet (or pubsub) to be
> notified of new memes.
For us there is only subscribe.
> - flag, similar to the meme flag but about the identity's meme.
> - follow: share a meme on public profile that you are following
> that identity. It will be part of the identities map.
Your profile can show who you are connected to or following,
just as it can show tons of other things.
> Also, I must keep in mind that the system must be able to work
> offline and synchronized when Internet is back.
That's why we want to have a localhost graphdb, so that we can
walk the social graph just like we do on Friendster when we start
digging into friends of friends. Or whatever social network we
may be using now. We don't need to consult gnunet-fs to navigate
our social network. At least that's the idea with pubsub and