[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A niche for the Hurd - next step: reality check

From: Arne Babenhauserheide
Subject: Re: A niche for the Hurd - next step: reality check
Date: Wed, 12 Nov 2008 18:27:55 +0100
User-agent: KMail/1.10.3 (Linux/2.6.25-gentoo-r7; KDE/4.1.3; x86_64; ; )

Am Samstag 08 November 2008 03:05:39 schrieb olafBuddenhagen@gmx.net:

> > - You have 1.000.000 files to store, which makes a simple "ls" damn

> > slow. - So you develop a simple container format with reduced metadata

> > and specialized access characteristics. - Now you want to make that

> > cotainer accessible via the normal filesystem.

> >

> > Please check the two attached presentations to see the pain this

> > causes using Linux.

> I must admit that I fail to read the "pain" in these presentations...


> The only problem with FUSE in this context seems to be performance. I

> wonder whether Hurd translators would do better on that score.

The pain is that if you run the container program directly, you get really good performance, and as soon as you run it via FUSE it's ten times slower.

> I realized at some point that for some hurdish applications, we need a

> way to store fine-grained structured data. What is the best approach for

> that?


> One way is to put it into a file, using some structured file format.

> (XML, s-expr, or the likes.) The problem is that changing

> (adding/removing/replacing) individual pieces of data inmidst of the

> file is both awkward and inefficient: It requires rewriting the file

> starting from the affected region up to the end. Also, accessing

> individual data items is quite complicated, as it always requires a

> parser for the respective format.


> Storing as a large directory tree on the other hand allows for very easy

> and efficient access and updates of individual items. However, it takes

> a lot of disk space. (Due to redundant file metadata like permissions

> etc., and also the internal structure of the filesystem imposing a lot

> of overhead with many tiny files.) And working with a whole set of data

> items at once (e.g. copying a subtree, or replacing a whole subtree)

> becomes quite awkward.


> First I was thinking of some kind of DB translator, which stores the

> data in a normal file, but instead of storing the contents linearily,

> uses some internal allocation mechanism -- just like a full-blown DBMS.


> I soon realized though that this would be too rigid in many cases: Often

> it is useful to access the *same* data in different ways, depending on

> context. The storage mechanism and the access method are indeed quite

> orthogonal -- what we really want is the ability to access *any* data

> both through a directory tree or through a structured file interface as

> needed. Whether the data is actually stored in individual files, or in a

> container, should be totally transparent.


> So on the frontend we want a dual interface that allows accessing the

> data either as directory trees or as structured files. On the backend, a

> normal filesystem, with the aid of containers where appropriate, could

> serve as a temporary solution -- but in the long run, we probably want a

> special filesystem, allowing both efficient storage of complex

> structures and efficient access/update of individual items at the same

> time. I wonder whether this could be implemented as an extension of some

> existing filesystem, or rather some completely new approach is

> required...

I think this whole quoted section wants to go into your Blog :)

And wouldn't this option of accessing a file in two ways be an ideal candidate for namespace based translator selection?

An example:

$ ls blah,,dir/

$ nano blah,,xml

Btw. the main idea about containers was to reduce the metadata for each single file, so that common operations would get faster - for example doing an "ls" on a directory with 1.000.000 files. In a normal filesystem with complete per-file metadata you will have to wait minutes to get a result because the metadata of every file needs to be checked.

The container filesystem allows for more efficient access to the files by

a) reducing the amount of metadata per file (shared metadata) and

b) using a structure which is optimized on this kind of usecase.

Best wishes,



-- My stuff: http://draketo.de - stories, songs, poems, programs and stuff :)

-- Infinite Hands: http://infinite-hands.draketo.de - singing a part of the history of free software.

-- Ein W├╝rfel System: http://1w6.org - einfach saubere (Rollenspiel-) Regeln.

-- PGP/GnuPG: http://draketo.de/inhalt/ich/pubkey.txt

Attachment: signature.asc
Description: This is a digitally signed message part.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]