[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interface for SCSI transactions ?

From: olafBuddenhagen
Subject: Re: Interface for SCSI transactions ?
Date: Sun, 2 Oct 2011 03:50:53 +0200
User-agent: Mutt/1.5.21 (2010-09-15)


On Thu, Sep 29, 2011 at 09:11:54PM +0200, Thomas Schmitt wrote:

> it looks like we need two fundamental decisions now, or else the
> contraints become too fuzzy for developing the desired feature:
> It makes no sense to start working on them before a decision is made.

Indeed, it's an unfortunate situation. Email has too long turnaround
times for this -- I'm sure we could reach a conclusion much quicker in a
realtime discussion on IRC...

I could try to reach a consensus with Samuel, and present it to you as a
fixed decision -- but I'd really prefer to actually discuss this with
your participation :-)

> 1: Shall the new call be implemented on the kernel side as a new
> method of struct device_emulation_ops ?

That's what I was assuming all the way -- regardless whether we go with
a SG-specific call, or something more generic... More on this later.

> 2: Shall the parameters be bundled in serialized structs as far as
> technically feasible ? Alternative: Each primitive data field shall be
> transferred as a separate RPC parameter.

Actually, we discussed three options here: a) explicit RPC parameters
for everything; b) transfer all fixed size fields in a single struct
represented as a flat byte array, and all variable-size fields as
separate parameters; or c) do custom marshalling (RPC-over-RPC).

a) is the cleanest approach, though a bit cumbersome. b) is not as
clean, but there is precedence in some Hurd interfaces, so probably
acceptable. c) is very ugly IMHO; would create various anomalies in
handling; and is very complex.

> Yes on 2: New struct members could be added in a binary compatible
> way.

That's actually not true. It would only shift the problem. You could
keep using the same RPC ID (and MIG stub code), but you would have to
manually match version numbers and do case-specific handling in your
custom RPC-over-RPC handling code. I see no benefit in this...

> Olaf Buddenhagen wrote:

> > Yes, it's not possible to pass a complete structure as a sigle
> > parameter, except by treating it as a flat byte (or int) array...
> > [...] The *right* way of doing it is to list the elements of the
> > structure as individual RPC parameters.
> But this will cause a giant prototype. Very ugly to its users.

I wouldn't consider it ugly...

I grant you though that if the same struct is ultimately used on both
sides anyways, extracting and reassembling the fields by hand is a bit
cumbersome. More on this later.

(On the other hand it's still much simpler than implementing custom RPC
marshalling... :-) )

> My current sketch has 9 members in the input struct and 18 in the
> output one. The leaner it is made, the more risk arises that it will
> later need to be enlarged. (I am much in favor of a stable ABI.)

Well, I supported the idea of using a future-proof structure under the
assumption that we can simply reuse the one from Linux (or some other
modern kernel), so we don't need Hurd-specific handling, and can be
reasonably confident that it won't change any time soon... However,
that's apparently not the case. And in fact I realised that even if we
can use a future-proof structure, we would still have to check what the
current driver actually implements, and use different client-side
handling depending on this. Thus it's pretty irrelevant whether the RPC
definition will remain the same; or entirely changes in the future; or
will be pseudo-stable (i.e. using an unchanging MIG definition, but
actually transferring different data by means of a custom marshalling
layer) -- in either case, we will have to handle it separately in the
client code. ABI stability is totally irrelevant here: there is no way
on earth the client code will work without change -- and thus without a

With that in mind, I now think that indeed we should go with an RPC
definition having only the minimal parameters necessary to expose the
capabilities of the current driver; and simply introduce a new RPC once
we actually have a better driver.

(Admittedly, that pretty much voids my argument about being able to
reuse the same interface in a userspace driver in the future... In view
of that, I'm considerably less set on having an elegant one for the
kernel implementation. Still, the generic device_transact() approach
looks *very* ugly after seeing the complexity actually necessary here
:-) )

> I do not understand yet the "RPC-over-RPC" aspect, resp. why it is so
> bad.

That's like loading your car onto a truck and driving the truck, because
you think the radio in your car is not good enough... It might help you
avoid the "problem" -- but it's clearly not an appropriate measure :-)

> To my understanding these calls transport generally defined parameters
> to particular drivers which apply their particular interpretation to
> these parameters. This corresponds to quite common patterns of object
> orientation: polymorphism and inheritance. Are these disliked ?

Well, I must admit that I'm not a big fan of OOP... :-) (I believe the
cases where an object model is really useful are actually quite rare,
and even a complex piece of software won't have more than a handful of
them. Forcing the whole structure of a program into an object model is
just crazy IMHO.)

But anyways, I'm pretty sure that trying to force vastly different
method invocations through a fixed common prototype, would be considered
very ugly in *any* object-oriented design. There are good reasons why
child classes are not only allowed to specialize methods of the base
class, but also to add specific methods with their own specific
parameters... Common methods are only useful when they are invoked from
common code, and thus obviously with common parameters. It makes no
sense whatsoever to unify calls that only work in the context of their
respective subclass, and thus will *never* be invoked from common code.
(Such as an SG transaction.)

Polymorphism doesn't come in here at all BTW.

> The generic nature of my current plan is a direct consequence of
> Samuel's proposal/prescription to hook the call into the instances of
> struct device_emulation_ops.

I don't see any reason why the specific SG transaction call can't be
hooked into device_emulation_ops() just as well. There is precedence for
having a device-specific call in the generic device interface:
device_set_filter() is used only for network interfaces.

(I guess device_set_filter() *could* in theory apply to other devices as
well, if we went down the exokernel route... But that's academic. For
all practical purposes, it only exists for network devices.)

> My current understanding is that a call in
> gnumach/linux/dev/glue/block.c would also work without having a slot
> in struct device_emulation_ops.

Might be possible; but I don't see much use in bypassing the ordinary
route here. It would be pretty ugly IMHO -- introducing confusion,
anomalies, and potential for additional bugs.

> > > Astounding that Linux SG_IO totally hides the auto sense
> > > mechanism,
> > Hm... I don't know anything about this stuff; but my intuition is
> > that if this *can* be hidden from users (without adverse effects),
> > it should... Is it actually useful to have this data?
> The user in this case is a device driver in userspace.

I think we need to be clear about what we mean by "driver" in this
context... There is the part that actually talks to the hardware, and
sits on the server side of the RPC. (I.e. in Mach for now, and a
dedicated driver process in the future.) There is also the other part,
which assembles the SCSI commands to be sent, and lives on the client

Is the sense information really something that needs to be communicated
between these two parts, rather than being abstracted by the hardware
access layer? I don't know -- this is your area of expertise :-) But
apparently the Linux developers didn't think it's helpful to expose
this... Would be good to investigate the rationale behind this: trying
to find the relevant discussion(s) on LKML for example.

> > I'm very sceptical of any interface that doesn't use MIG's abilities
> > here, instead doing the marshalling by hand.
> /rant mode on/ I'd rather call it MIG's disabilities here.

Well, if you think the RPC mechanism used throughout the Hurd is
insufficient, the right answer is not to implement something else on top
of it, but rather to propose improvements -- of course followed by
patches... ;-)

None of the arguments you present here seem to be specific to SG
transactions or device handling in any way.

Keep in mind that this is free software. You should never take problems
for granted -- you always have the option to improve whatever you are
building upon, so it really suits your needs :-) That's something free
software developers tend to forget way too often. The free software
stack would be in a much better shape today, if people made good use of
the possibilities we have here...

(The Hurd makes such improvements much easier than monolithic kernels in
many areas -- which is precisely what makes it interesting :-) )

Sorry -- that was me ranting now :-) Actually I think the problem here
is a different one: the impression I get from your statements is that
you have some serious misunderstanding about how Mach IPC and MIG are
supposed to be used...

You might not be aware that the whole *idea* behind the original Mach
research was allowing transparent network IPC between processes running
on machines with possibly different architecture. The Mach designers
definitely put a *lot* of thought into this, probably more than anything
else. So before denying the feasibility of doing cross-architecture
communication with Mach, you might want to consider the possibility that
it might be your own mental model which doesn't fit the Mach approach

I for my part consider the network transpacency a legacy I'd prefer to
get rid of (just like the actual network IPC code we already purged), as
I believe it's responsible for a substantial part of the complexity of
Mach RPC. (And possibly other design shortcomings as well.)

We never used this possibility; and we do not believe that it's
realistic or useful to keep network transparency at such a low level.
(Nor does anyone else anymore for all I know...)

In the same vein, the possibility of passing a struct as a plain array
is *not* something endorsed by the Mach designers -- but it's something
we can do in the Hurd when it seems convenient, as we simply do not care
about network-transparent IPC.

It's still a bit ugly and less robust of course, and thus I'm rather
ambivalent about this option myself. In the end I chose to do it like
that in my KGI port, but it was mostly "meh, I won't bother, it's just a
proof-of-concept anyways"... If KGI hadn't been dead by the time I did
the port, and I actually wanted it to go upstream, I might have decided
differently. Hard to say.

> It has neither means for aggregation,

That's true. MIG has hardly any abstraction -- it mostly just describes
what is ultimately contained in the Mach IPC message almost 1:1.

It might be useful to introduce abstract types at the IDL level, which
MIG would translate into code that can automatically extract the
individual fields before sending, and reassemble them after receive.

As long as both sender and receiver handle them as a structure anyways,
this shouldn't pose any overhead. It would prevent the possibility for
either of the sides directly to work with the individual fields though,
without assembling a struct. That would introduce overhead in some
cases. This *might* be a reason why the MIG designers refrained from

(Another problem might be that to allow network-transparent IPC, MIG
would have to guess the appropriate machine-independent types to safely
hold the value of each field... Not sure whether this is really
something that can be automated in a reasonable fashion. However, did I
mention yet that we do not care about network transparency?... ;-) )

> nor for a neutral representation of data during transmission.

What is a neutral representation?

> Representing arbitrary C-pointer graphs is quite impossible for an RPC
> system.

Actually, if the relations of these pointers are known in advance, I
don't see why they couldn't be expressed in the interface definition,
and automatically serialised by the generated code... *If* you want to
introduce this kind of abstraction, do it properly ;-)

> But MIG could at least provide what SUN RPC resp. XDR does.

I'm not familiar with those.

> > > So i think it is appropriate to prescribe an explizit
> > > representation layer for the structs and their components.
> > ...such as MIG .defs? ;-)
> I would wholeheartedly agree, if MIG wasn't so dull. A representation
> of struct components in the definition language would allow
> conversions, if they become necessary in a distributed system.

I hope this is clear by now, but let me say it again just to be sure:
use individual parameters for all the components, and it would work just
fine. Would -- if the distributed systems actually existed ;-)

Of course it would be more convenient to specify the structs directly,
and let MIG generate code to extract/reassemble the fields. (See above.)
But I think that's really not a big deal -- and technically, the result
is exactly the same anyways...

(In either case it's less efficient than simply passing the structure as
a plain byte array. That's the price you have to pay for network
transparency at IPC level. Considering what I said about network IPC
above, I'm obviously not too keen on paying this price... But the SG
transaction is not performance-critical anyways, so I wouldn't make the
price part of the consideration in this case.)

> The assumption of identical data representation on both sides of an

Another thing that should be clear by now, but just to be sure: there is
no such assumption in either Mach or MIG.

Mach *does* require the same sizes for the individual parameters on both
sides; so if you care about network transparency, you have to use
machine-independent types for the RPC parameters. It's up to you to
assign them from/to machine-specific types in your server and client
code -- doing so is trivial, and much more sane than changing the size
of an IPC message during transfer would be. There is nothing more you
have to do to allow network-transparent communication between processes
running on different architectures.

(Yes, byte order swapping is something Mach takes care of itself -- or
rather would, if the code existed. This is why we have to transfer
explicit type information for each individual field in an IPC message.
This is why I'd rather be rid of these remnants of network

> lets me rise the question why to use RPCs at all and not shared memory

Apart from the network transparency thing: communication through shared
memory is inevitably much more complex, and almost universally less
efficient than RPC.

> or even a shared address space. 

That's a good question actually... There has been some research on SASOS
(Single Address Space Operating Systems) in the 90s; but it doesn't seem
to have gained any traction. No idea why -- perhaps it just doesn't
offer enough benefits.

As for Mach, that was obviously not an option, considering its original

> To my mind come at least two contemporary forms of mixed
> architectures: The Cloud,

Well, "The Cloud" doesn't really tell anything, as everyone has a
different idea of what it means...

However, none of the variants I'm familiar with includes a scenario
where processes would do transparent IPC with processes on machines
having other architecture... Care to elaborate?

> and using graphics boards as number crunchers.

I never thought about using Mach IPC for such things... It might be
actually possible.

But there is nothing *transparent* about the communication between
processes running on CPUs and GPUs; so the problem clearly doesn't arise
here at all -- explicit conversion of data representations is always
possible, and doesn't need any support from Mach or MIG.

> > The Linux sg_io structure seems to have a number of pointers to
> > variable-sized arrays 
> Only one of them, .dxferp, is really of large size and would best be
> represented outside the struct. (In my sketch as two parameters
> "in_data" and "out_data". In the CD use case they are mutally
> exclusive.)
> The others, which i deem useful, can be represented as small byte
> arrays of fixed length,

That would be very un-hurdish ;-)

> or mapped into the struct serialization as small byte arrays of
> variable size with few cost of copying.

...or we can make them separate RPC parameters, so we get it for free!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]