[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What shall the filter do to bottommost translators

From: Sergiu Ivanov
Subject: Re: What shall the filter do to bottommost translators
Date: Tue, 13 Jan 2009 22:55:05 +0200

On Fri, Jan 9, 2009 at 9:01 AM, <olafBuddenhagen@gmx.net> wrote:
On Wed, Dec 31, 2008 at 02:42:21PM +0200, Sergiu Ivanov wrote:
> On Mon, Dec 29, 2008 at 8:25 AM, <olafBuddenhagen@gmx.net> wrote:
> > On Mon, Dec 22, 2008 at 07:19:50PM +0200, Sergiu Ivanov wrote:

> > The most radical approach would be to actually start a new nsmux
> > instance for each filesystem in the mirrored tree. This might in
> > fact be easiest to implement, though I'm not sure about other
> > consequences... What do you think? Do you see how it could work? Do
> > you consider it a good idea?
> >
> I'm not really sure about the details of such implementation, but when
> I consider the recursive magic stuff, I'm rather inclined to come to
> the conclusion that this will be way too much...

Too much what?...

Too many instances of nsmux, too many context switches, too much
resources consumed for things that could possibly be implemented to
work faster with less effort, this was what I meant.

However, now that I've read your mail, only the relatively large
number of processes troubles me. I remember you saying that it is not
a problem in Hurd and that this is a common situation, but still I'm
afraid there can be too many processes, what do you think?

> Probably, this variant could work, but I'll tell you frankly: I cannot
> imagine how to do that, although I feel that it's possible.

Well, it's really almost trivial...

Let's take a simple example. Say the directory tree contains a file
"/a/b/c". Say nodes "a" and "b" are served by the root filesystem, but
"b" is translated, so "b'" (the effective, translated version of "b")
and "c" are served by something else.

Let's assume we have an nsmux set up at "n", mirroring the root
filesystem. Now a client looks up "/n/a/b/c". (Perhaps indirectly by
looking up "/a/b/c" in a session chroot-ed to "/n".) This results in a
lookup for "a/b/c" on the proxy filesystem provided by nsmux.

Yes, I think I can understand pretty well what your are talking about.

nsmux forwards the lookup to the real directory tree it mirrors, which
is the root filesystem in our case. The root filesystem will look up
"a/b", and see that "b" is translated. It will obtain the root of the
translator, yielding the translated node "b'", and return that as the
retry port, along with "c" as the retry name. (And RETRY_REAUTH I

Indeed that's how it works. As for the retry type, I think we have
agreed before in this thread that RETRY_REAUTH is returned only when a
``..'' is requested on the root of the translator. That is, in this
case RETRY_NORMAL will occur.

Now in the traditional "monolithic" implementation, nsmux will create a
proxy node for "a/b'", and pass on a port for this proxy node to the
client as retry port, along with the other retry parameters. (Which are
passed on unchanged.)

The client will then do the retry, finishing the lookup.

Yep, that's right, to my knowledge.
Now what about the control ports? The client could do a lookup for
"/n/a/b" with O_NOTRANS for example, and then invoke
file_get_translator_cntl() to get the control port of the translator
sitting on "b". nsmux in this case forwards the request to the original
"/a/b" node, but doesn't pass the result to the client directly.
Instead, it creates an new port: a proxy control port to be precise. The
real control port is stored in the port structure, and the proxy port is
passed to the client. Any invocations the client does on this proxy
control port are forwarded to the real one as appropriate.

An implementation of this functionality that would require least
effort would mean that when the client does
file_get_translator_cntl(), nsmux will create a new instance of
libnetfs's struct node, create a port to it and give this port to the
client. Of course, this instance of struct node need not contain all
the fields required in other instances which are meant to mirror
*filesystem* nodes. In this way, the existence of the special instance
of struct node I'm talking about would by no means violate any
concepts (it seems to me).

As I've already said, I'm strongly inclined to perceive a libnetfs
node in a more general meaning than a filesystem node.
With the "distributed" nsmux, things would work a bit differently. Again
there is a lookup for "/n/a/b/c", i.e. "a/b/c" on the proxy filesystem
provided by nsmux; again it is forwarded to the root filesystem, and
results in "a/b'" being returned along with a retry notification. The
distributed nsmux now creates a proxy node of "a/b" (note: the
untranslated b). It starts another nsmux instance, mirroring "/a/b'",
and attaches this to the "a/b" proxy node.

Again, the client will finish the lookup. (By doing the retry on the new
nsmux instance.)

Aha, sounds great! I've had no inkling even as to such possibility :-)
Very beautiful idea!

When the client invokes file_get_translator_cntl() on "/n/a/b", the main
nsmux sees that there is a translator attached to this proxy node
(namely the second nsmux), and returns its control port to the client --
just like any normal filesystem would do. When the client does
file_getcontrol() on "/n/a/b/c", the second nsmux will simply return its
own control port, again like any normal filesystem.

Clear, that's pretty nice and easy to implement and handle.
Now unfortunately I realized, while thinking about this explanation,
that returning the real control port of nsmux, while beautiful in its
simplicity, isn't really useful... If someone invoked fsys_goaway() on
this port for example, it would make the nsmux instance go away, instead
of the actual translator on the mirrored tree.

And we can't simply forward requests on the nsmux control port to the
mirrored filesystem: The main nsmux instance must handle both requests
on it's real control port itself (e.g. when someone does "settrans -ga
/n"), and forward requests from clients that did fsys_getcontrol on one
of the proxied nodes to the mirrored filesystem. So we can't do without
passing some kind of proxy control port to the clients, rather than the
nsmux control port.
Hm... I'm thinking of the following thing: can we make nsmux behave
differently in different contexts? Namely, normally requests like
fsys_goaway should go to the translator in the real filesystem,
however, at some point, nsmux will treat them as directed to itself.

One possibility to implement this would be adding a special command
line option which would tell nsmux to forward RPCs to the real
translator. Note that there are also runtime options available for a
translator (and these options are classically the same as simple
command line options) and we can modify them via
fsys_{get,set}_options. When a new instance of nsmux is started by an
already-existing instance, it will be started with this special
command line switch. All meaningful RPCs will be forwarded to the
translator in the real tree. When the parent instance of nsmux would
want to shut down a child, it would just reset the special option in
the child (fsys_set_options) and do an fsys_goaway on the child's
control port.

OTOH, when a *user* (not the parent instance of nsmux) would like to
shutdown a child instance (which, BTW, may not be desirable, what do
you think?), they can use the fsysopts command to remove the
corresponding option and then do settrans -ga, for instance, to
achieve the shutdown.

The problem of this approach is the fact that additional operations
will be required to shut down a child instance of nsmux, but I believe
that shutting down will not be required very often and can always be
run in the background (a different ``garbage collector'' thread, I

Considering the we need the proxy control ports anyways, the whole idea
of the distributed nsmux seems rather questionable now... Sorry for the
noise :-)

Your ideas are always very interesting, and it's always a pleasure to
digest them :-)

> > But let's assume for now we stick with one nsmux instance for the
> > whole tree. With trivfs, it's possible for one translator to serve
> > multiple filesystems -- I guess netfs can do that too...
> Could you please explain what do you mean by saying this in a more
> detailed way?

Well, normally each translator is attached to one underlying node. This
association is created by fsys_startup(), which is usually invoked
through trivfs_startup() for filesystems using libtrivfs.

However, a translator can also create another control port with
trivfs_create_control(), and attach it to some filesystem location
manually with file_set_translator(). (This is not very common, but the
term server for example does it, so the same translator can be attached
both to the pty master and corresponding pty slave node.) We get a
translator serving multiple filesystems.

Hm, that's interesting... I've never come across this function...

I assume that this is possible with libnetfs as well...

Anyways, this variant would probably behave exactly the same as the
"distributed" one, only that a single process would serve all nsmux
instances, instead of a new process for each instance. It would have the
same problems... So we probably don't need to consider it further.

Some grepping for ``control'' in the source directory of libnetfs gave
me nothing. I could possibly try to borrow the idea of implementation
from the corresponding libtrivfs function, if needed. However, I like
the multiprocess version better :-)

> > The alternative would be to override the default implementations of
> > some of the fsys and other RPCs dealing with control ports, so we
> > would only serve one filesystem from the library's point of view,
> > but still be able to return different control ports.
> >
> > As we override the standard implementations, it would be up to us
> > how we handle things in this case. Easiest probably would be to
> > store a control port to the respective real filesystem in the port
> > structure of every proxy control port we return to clients.
> This is variant I was thinking about: custom implementations of some
> RPCs is the fastest way. At least I can imagine quite well what is
> required to do and I can tell that this variant will be, probably, the
> least resource consuming of all.

Well, if this is the variant you feel most comfortable with, it's
probably best to implement this one :-)

We can still change it if we come up with something better later on...

Yes, indeed. Although, I would like a ``distributed'' nsmux
better... ;-)
> > > As for the ``dotdot'' node, nsmux usually knows who is the parent
> > > of the current node; if we are talking about a client using nsmux,
> > > it is their responsibility to know who is the parent of the
> > > current node.
> > >
> > > OTOH, I am not sure at all about the meaning of this argument,
> > > especially since it is normally provided in an unauthenticated
> > > version.
> >
> > AIUI it is returned when doing a lookup for ".." on the node
> > returned by fsys_getroot(). In other words, it normally should be
> > the directory in which the translated node resides.
> Yep, this is my understanding, too. I guess I have to take a glimpse
> into the source to figure out *why* this argument is required...

Well, there must be a way for the translator to serve lookups for the
".." node... So we need to tell it what the ".." node is at some point. 
One might think that a possible alternative approach would be to provide
it once at translator startup, instead of individually on each root
lookup. The behaviour would be different however: Imaginge a translator
sitting on a node that is hardlinked (or firmlinked) from several
directories. You can look it up from different directories, and the ".."
node should be different for each -- passing it once on startup wouldn't
work here.

Ah, really... I should have thought better before asking this question
:-) Thank you for explanation :-)

> > As the authentication is always specific to the client, there is no
> > point in the translator holding anything but an unauthenticated port
> > for "..".
> Sorry for the offtopic, but could be please explain what do you mean
> by authentication here? (I would just like to clear out some issues in
> my understanding of Hurd concepts)

I wish someone would clear up *my* understanding of authentication a
bit... ;-)

Anyways, let's try. File permissions are generally checked by the
filesystem servers. The permission bits and user/group fields in the
inode determine which user gets what kind of access permissions on the
file. To enforce this, the filesystem server must be able to associate
the client's UID and GID with the UIDs and GIDs stored in the inode.

So, how does the filesystem server know that a certain UID capability
presented by the user corresponds to say UID 1003?

This is done through the authentication mechanism. I'm not sure about
the details. AIUI, on login, the password server asks the auth server to
create an authentication token with all UIDs and GIDs the user posesses.
(Well, one UID actually :-) ) This authentication token is (normally)
inherited by all processes the user starts.

Now one of the user processes contacts the filesystem server, and wants
to access a file. The filesystem server must be told by the auth server
what UIDs/GIDs this process has. This is what happens during the
reauthentication: A bit simplyfied, the client process presents its
authentication token to the auth server, and tells it to inform the
filesystem server which UIDs/GIDs it conveys. From now on, the
filesystem server knows which UIDs/GIDs correspond to port (protid) the
client holds.

(The actual reauthentication process is a bit tricky, because it is a
three-way handshake...)

Wow! That's complex... Thank you for the explanation :-) It helped me
a lot!

It doesn't at all look as if something in authentication is unclear to
you ;-)
This process needs to be done for each filesystem server the client
contacts. This is why a reauthentication needs to be done after crossing
a translator boundary -- or at least that is my understanding of it. The
retry port returned to the client when a translator is encountered
during lookup, is obtained by the server containing the node on which
the translator sits, and obviously can't have the client's
authentication; the client has to authenticate to the new server itself.

Hm... Again, the problem about RETRY_REAUTH, which seems to happen
only when looking up ``..''...
> > > Well, setting translators in a simple loop is a bit faster, since,
> > > for example, you don't have to consider the possibility of an
> > > escaped ``,,'' every time
> >
> > Totally neglectible...
> >
> Of course :-) I've got a strange habit of trying to reduce the number
> of string operations...

I also suffer from this unhealthy tendency to think too much about
pointless micro-optimizations... The remedy is to put things in
perspective: In an operation that involves various RPCs, a process
startup etc., some string operations taking some dozens or hundreds of
clock cycles won't even show up in the profile...

Indeed... I'll try to stick with this approach :-)
Code size optimisations are a different thing of course: Not only are
ten bytes saved usually much more relevant than ten clock cycles saved
(except in inner loops of course); but also it tends to improve code
quality -- if you manage to write something with less overall
operations, less special cases, less redundancy etc., it becomes much
more readable, considerably less error-prone, much easier to modify, and
alltogether more elegant...

Of course... I've always tried to keep the code as easier to maintain
as possible... Hence the excessive comments (I hope I've already given
up the habit).
> The main reason that makes me feel uneasy is the fact that retries
> actually involve lookups. I cannot really figure out for now what
> should be looked up if I want to add a new translator in the dynamic
> translator stack...

The not yet processed rest of the file name -- just like on any other

These can be additional suffixes, or also additional file name
components if dealing with directories. When looking up
"foo,,x,,y/bar,,z" for example, the first lookup will process "foo,,x"
and return ",,y/bar,,z" as the retry name; the second will process ",,y"
and return "bar,,z"; the third will process "bar,,z" and return ""; and
the last one will finish by looking up "" (i.e. only get a new port to
the same node, after reauthenticating).

Oh yeah! It's only now that I can finally understand your idea!
Everything is clear now, I'll start coding as soon as I have a spare
quantum :-)
I'm not entirely sure, but I think the retry is actually unavoidable for
correct operation, as reauthentication should be done with each new

It seems to me, anyways, that in the standard implementation of
netfs_S_dir_lookup RETRY_REAUTH happens when looking up ``..'' on the
root node of the filesystem. Do you think we have to abandon this
tactic and make nsmux do a RETRY_REAUTH any time its encounters (or
starts) a translator?
BTW, I just realized there is one area we haven't considered at all so
far: Who should be able to start dynamic translators, and as which user
should they run?...

nsmux starts dynamic translators using fshelp_start_translator, which
invokes fshelp_start_translator_long. This latter function creates a
new task and makes it a *child* of the process in which it
runs. Therefore, dynamic translators are children of nsmux and can do
anything the user who starts nsmux can do.

Is this OK? Or are you thinking of the possibility that nsmux be
started at startup with root priveleges?..

> > That's the definition of dynamic translators: They are visible only
> > for their own clients, but not from the underlying node. (No matter
> > whether this underlying node is served by another dynamic
> > translator.)
> That seems clear. What makes we wonder, however, is how a filter will
> traverse a dynamic translator stack, if it will not be able to move
> from a dynamic translator to the one which is next in the dynamic
> translator stack?

Well, note that the filter is started on top of the stack. It is a
client of the shadow note created when starting it, which is a client of
the topmost dynamic translator in the stack, which is a client of its
shadow node, which is a client of the second-to-top dynamic
translator... So fundamentally, there is no reason why it wouldn't be
able to traverse the stack -- being an (indirect) client of all the
shadow nodes in the stack, it can get access to all the necessary

The question is how it obtains that information... And you are right of
course: I said in the past that the translator sitting on a shadow node
doesn't see itself, because it has to see the other translators instead
(so the filter can work). This is obviously a contradiction.

OK, it means that I understand things correctly so far :-)

This is a bit tricky, and I had to think about it for a while. I think
the answer is that a dynamic translator should see both the underlying
translator stack, *and* itself on top of it. For this, the shadow node
needs to proxy all translator stack traversal requests (forwarding them
to the underlying stack), until a request arrives for the node which the
shadow node mirrors, in which case it returns the dynamic translator
sitting on the shadow node.

Let's look at an example. Say we have a node "file" with translators
"a", "b" and "c" stacked on it. The effective resulting node translated
trough all three is "file'''". When we access it through nsmux, we get a
proxy of this node.

Now we use a filter to skip "c", so we get a proxy node of "file''" --
the node translated through "a" and "b" only. Having that, we set a
dynamic translators "x" on top of it. nsmux creates a shadow node
mirroring "file''", and sets the "x" translator on this shadow node.
Finally, it obtains the root node of "x" -- this is "file'',,x" -- and
returns an (ordinary non-shadow) proxy node of that to the client.

(To be clear: the "'" are not actually part of any file name; I'm just
using them to distinguish the node translated by static translators from
the untranslated node.)

And now, we use another filter on the result. This is the interesting

First another shadow node is set up, mirroring "file'',,x"; and the
filter attached to it. (So temporarily we get something like
"file'',,x,,f".) Now the filter begins its work: It starts by getting
the untranslated version of the underlying node. The request doing this
(fys_startup() IIRC?) is sent to the underlying node, i.e. to the second
shadow node, mirroring (the proxy of) "file'',,x". This "x" shadow node
forwards it to the node it mirrors: which is the aforementioned
(non-shadow) proxy node of "file'',,x".

nsmux may be able to derive the completely untranslated "file" node from
that, but this would be rushing it: It would put the first, "file''"
shadow node out of the loop. (Remember that conceptually, we treat the
shadow nodes as if they were provided by distinct translators...) So
what it does instead is obtaining the first shadow node (mirroring the
proxy of "file''"): This one is the underlying node of the "x"
translator, and is considered the untranslated version of the node
provided by "x". It asks this shadow node for the untranslated version
of the node it shadows...

So the "file''" shadow node in turn forwards the request to the proxy of
"file''". This node is handled by nsmux directly, without any further
shadow nodes; so nsmux can directly get to the untranslated "file" node,
and return a proxy of that.

This new proxy node is returned to the first shadow node. The
(conceptual) shadow translator now creates another shadow node: this one
shadowing (the proxy of) the untranslated "file". (Oh no, even more of
these damn shadow nodes!...) A port for this new shadow node is then
returned to the requestor.

The requestor in this case was the "file'',,x" proxy node, which also
sets up a new proxy node, and passes the result (the proxy of the shadow
of the proxy of untranslated "file"...) on to the second (conceptual)
shadow translator, mirroring "file'',,x". This one also creates a new
shadow node, so we get a shadow of a proxy of a shadow of the "file"
proxy node... And this is what the filter sees.

Hm... That's clear (at least it seems to me to be so). I do have
several questions, but I'll put them down in the end of the mail.
In the next step, the filter invokes file_get_translator_cntl() on this
shadow-proxy-shadow-proxy-"file". The request is passed through the
second new shadow node (let's call it the "x" shadow, as it is derived
from the original "file'',,x" shadow node), and the new "x" proxy node
(derived from the "file'',,x" proxy), and the first new shadow node (the
"file" shadow), and through the primary "file" proxy node finally
reaches the actual "file". There it gets the control port of "a". A
proxy of this control port is created, and passed to the "file" shadow
translator, which creates a shadow control... (ARGH!)

This is in turn passed to the intermediate "x" proxy, which creates
another proxy control port -- only to return that to the "x" shadow
translator, which shadows the whole thing again. A
shadow-proxy-shadow-proxy-control port for the "a" translator. Lovely.

Here I'm in the moods for putting various smileys ;-)
Now the filter launches fsys_getroot() on this, again passing through
shadow and proxy and shadow and proxy, finally looking up the root node
of "a" (i.e. "file'"), which is passed up again -- we get

Again file_get_translator_cntl() resulting in
shadow-proxy-shadow-proxy-control of "b", and again fsys_getroot()
yielding shadow-proxy-shadow-proxy-"file''". Then yet another
file_get_translator_cntl(), giving us... Can you guess it?

No-no, I was looking forward to some surprise at this step :-P

Wait, not so hasty :-) This time actually something new happens,
something exciting, something wonderful: The request is passed down by
the "x" shadow and the "x" proxy as usual, but by the time it reaches
the "file''" shadow (the one to which "x" is attached), the routine is
broken: The shadow node realizes that the request is not for some
faceless node further down, but indeed for the very "file''" node it is
shadowing! So instead of passing the request down (which would get the
control port of "c"), the shadow node handles the request itself,
returning the control port of the translator attached to it: i.e. the
control port of "x".

This is again proxied by the "x" proxy, and shadowed by the "x" shadow

Oh yeah, this is what I was expecting ;-)
The following fsys_getroot() is again passed down by the "x" shadow and
the "x" proxy, and handled by the "file''" shadow: It returns the root
node of "x". The "x" proxy creates another proxy node; the "x" shadow
translator creates another shadow node.

The filter invokes file_get_translator_cntl() yet another time, this
time handled by the top-most (now only active) shadow node directly (the
"x" shadow, to which the filter is attached), returning the control port
of the filter itself. We are there at last.

Doesn't all this shadow-proxy-shadow-proxy-fun make your head spin, in
at least two directions at the same time? If it does, I might say that I
made a point... ;-)

It does ;-) I'm trying to determine the pitch and the yaw of the
container of my brain :-)

While I believe that this approach would work, I'm sure you will agree
that it's horribly complicated and confusing. If the shadow nodes were
really implemented by different translators, it would be terribly
inefficient too, with all this proxying...

Indeed... Hard to disagree...
And it is actually not entirely correct: Let's say we use a filter to
skip "b" (and the rest of the stack on top of it). The filter would
traverse the stack through all this shadow machinery, and would end up
with a port to the root node of "a" that is still proxied by the various
shadow nodes. The filter should return a "clean" node, without any
shadowing on top of it; but the shadow nodes don't know when the filter
is done and passes the result to the client, so they can't put themselfs
out of the loop...

While this may be a bit of offtopic, I would like to ask you what the
issue with correctness is? I can see only the sophisticated shadow
machinery in it's complicated action, but everything seems all right
about the way it handles information: nothing gets lost.
All these problems (both the correctness issue and the complexity)
result from the fact that the stack is traversed bottom-to-top: First
all underlying nodes need to be traversed, and only then the shadow node
takes action -- that's why it needs to proxy the other operations, so
it's still in control when its time finally arrives.

The whole thing would be infinitely simpler if we traversed the stack
top to bottom: The shadow node would just handle the first request, and
as soon as the client asks for the next lower translator in the stack,
it would just hand over to that one entirely, not having any reason to
interfere anymore.

And this is what I meant above about making a point: It seems that my
initial intuition about traversing top to bottom being simpler/more
elegant, now proves to have been merited...

Yes, this is really right :-)

Note that unlike what I originally suggested, this probably doesn't even
strictly need a new RPC implemented by all the translators, to ask for
the underlying node directly: As nsmux inserts proxy nodes at each level
of the translator stack, nsmux should always be able to provide the
underlying node, without the help of the actual translator...

(In fact I used this possibility in the scenario described above, when
initially wading down through the layers of dynamic translators, until
we finally get to the static stack...)

Yes, this is right, nsmux can do that. However, how do you suggest the
top-to-bottom traversal should take place (in terms of RPCs)? Shall we
reuse some existing RPCs?.. The top-to-bottom approach is indeed the
only reasonable choice in this situation, and if we sort out this
question, I think I will be able to start implementing this

Hm, and another (quite important) question: what do we to static
translator stacks? There are no shadow nodes between static
translators, so nsmux has no chance of traversing the static stack in
top-to-bottom fashion...

As for the questions about the bottom-to-top approach I'm very
interested to know the following: did you manage to design all this
shadow-proxy machinery just in your mind, or did you put something
down on paper? :-) I'm asking because I had to redraw the same graph
for about five times before I got the idea :-)

Ah, and another thing: as far as I could notice from your
descriptions, the role of proxy nodes is rather humble, so my question
is: what is the reason of keeping them at all? I'd rather not include
proxy nodes in between dynamic translators. When a client invokes an
RPC on its port to a proxy node, nsmux knows (obviously) to which
shadow node the proxy node is connected, so nsmux can ignore the proxy
node in the matter of setting translators. My idea is to keep proxy
nodes as a kind of ``handles'' which are meant only to be seen to the
client and shall not be employed in the internal work.

PS. Your mails contain both plain text and HTML versions of the content.
This unnecessarily bloats the messages, and is generally not seen
favourably on mailing lists. Please change the configuration of your
mail client to send plain text only.
I'm sorry, it wasn't intended. I'm using web interface for mail
management, and when I recently tried to switch browsers, the change
has influenced my messages. However, I've come back to my usual
Firefox, so no more things like that should occur.

Actually, I was not even aware that the other browser was putting
garbage in my mail :-(

And I'm sorry for the delay: I wasn't quite at home these two days...


reply via email to

[Prev in Thread] Current Thread [Next in Thread]