[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

various suggestions (was Re: [GNUnet-developers] useless crap??)

From: Wayne Scott
Subject: various suggestions (was Re: [GNUnet-developers] useless crap??)
Date: Mon, 29 Apr 2002 21:32:30 -0500 (EST)

[[ I figured I should change the subject.  It wasn't intended as an editorial
   comment. ]]

From: Christian Grothoff <address@hidden>
> That's the thing. The GNUnet 'AND' mechanism is not intended to sort
> out datatypes. Obtaining 5,000 replies for a generic mp3's search is already 
> really bad practice on gnutella, I did not want to encourage people to do 
> this. Too generic keywords also void the goal of deniability (people can 
> blacklist the query 'mp3'). Thus I decided not to put the 'mp3' keyword as 
> default for gnunet-insert-mp3. You can of course manually specify it to 
> gnunet-insert.

So how does AND work?  Do you do two full searches and then take the
union, or can you do one search and then filter it with the later
keywords.  It it is the former, then I agree with you.

Is it possible to just search for the content checksum?  People have
built a number of interesting applications on top of freenet without a
search capability.  However a number of them cause a huge "query"
overhead as applications probe for new data that might be there.

> > But for testing, you need a standard test file that is very likely to
> > be on every node.  
> True - especially once things are working :-)

You don't need to wait. 

> We are aiming for a generic keyword extraction API. Splitting filenames
> would definitely be a reasonable choice.
> > I actually find it somewhat surprising that filename is NOT stored.
> > It could be encoded as part of the description. If I understand your
> > arch correctly, you can't do a partial string search on filenames, but
> > it would still save the user alot of work.  I would like to be able to
> > just use the "standard" name for a file when extracting it.
> We could change the format of the root-node to include a default
> filename. Sounds reasonable. Any objections/concerns/suggestions?
> I could see splitting the 'description' field in 3 parts:
> mime, filename and description (each variable length, preceeded
> by short indicating length). Any other ideas/suggestions/improvements we 
> should make to the RBlocks?

Sounds good to me, but you probably want other people who have been
hacking and understand your architecture better. Consider a how you
would add future extensions.

> them). Thus having keywords for a file that are hard to obtain (i.e. not 
> automatically from the file/RNode) is usually a good thing (TM). This may 
> actually be a reason for *not* supplying a filename (or at least not one
> that was used for keyword extraction). 

Ok.  Sounds like I should read the paper.  I think it sounds like a
reason for a disclaimer on the insert tool explaining the using the
auto keyword features make censoring a given file easier.  Filenames
are too useful to omit.  Besides the current insert-mp3 tool has this
problem because if someone fetches the mp3 they could find the full
list of keywords from the id3 stuff.

> > The thing I see right now is that ~/.gnunet/data/content is a flat
> > directory.  In most filesystems, directories are NOT indexed and you
> > have to do a linear scan any time you want to find a file.  So you
> > should do like every one else and add a couple more directory levels.
> > (~/.gnunet/data/content/FE/4F/FE4F8155230050000000000065100000C79CA8BA)
> > This way the directory is not too big.  I don't know the ideal number
> > of levels or number of bits at each level, but I KNOW a flat directory
> > will be really slow on ext2.
> That's exactly what I also thought. I'm just not sure that spliting
> the directory like that is the best idea (I'm still pondering the
> issue, until I have a really good solution, it'll probably just stay in 'slow 
> mode'). 

How many files do you expect to see here?  The default config file is
512MB's of 1k files.  That is 524288 files in one directory.  That is
really slow.  To me that suggests you want to take 2 levels of 4 bits
for a average of 2048 files per directory with the default config.
Even that is insane. (see below)

> And of course, using a better FS (reiser, ext3, xfs) is recommended. 
> It would be nice to have some profiling code to actually evaluate different 
> approaches/filesystems in order to give (educated) advice to users which FS 
> to use. 

It it quite presumptious to assume that people will select their
filesystem based on gnunet. :-) Or even that they will have it on a
seperate partition.

> What do you mean by 'a large file based hash'? 

One file that allows efficent hash based lookups based on key and
returns a 1k data block.  Like ndbm only faster.  I can possibily
provide the mdbm library from bk.  It is way faster that ANY
filesystem and stores the data packed tightly.  You would just need to
sync the file periodicly so that the changes are persistant.

This seems way simpler that dealing with the fact that a default linux
installation will use way to much diskspace and will probably run out
of inodes.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]