[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interface for SCSI transactions ?

From: olafBuddenhagen
Subject: Re: Interface for SCSI transactions ?
Date: Fri, 7 Oct 2011 08:39:34 +0200
User-agent: Mutt/1.5.21 (2010-09-15)


On Sun, Oct 02, 2011 at 01:44:56PM +0200, Thomas Schmitt wrote:
> Olaf Buddenhagen wrote:

> > I could try to reach a consensus with Samuel, and present it to you as a
> > fixed decision 
> In the end it will happen that way.
> After all it is you who have the overall interest in Hurd, whereas
> i am mainly striving for getting GNU xorriso work on GNU/Hurd with
> similar capabilities as on GNU/Linux.

Well, the problem is that if we decide on a solution you don't like, you
won't be very motivated to follow it :-) That's why I'd prefer a
solution in consensus between all involved parties, i.e. You, Samuel,
and me.

> I still think that the time is ripe to decide my two questions.

> - Really implement a new method of struct device_emulation_ops ?

> - Transmit parameters as structs
>   (rather than fan out all parameters to the RFC definition) ?

The fact that we are still discussing implications (and coming up with
new twists) shows that we are not really ready for a decision yet... :-(

> but what about overall Hurd architecture ?)

We are actually talking about a Mach interface here. As I already
pointed out a couple of times, the Mach device interface in particular
is not very hurdish at all. Furthermore considering that the kernel
driver interface will be temporary anyways, I'm not really concerned
about compromising the Hurd architecture... (As I already explained, I
have given up on the idea of reusing the same interface for userspace
drivers, which will use a real hurdish interface of course.) I now
mostly want to avoid overengineering, *especially* since this is
temporary and peripherial.

> > Actually, we discussed three options here: a) explicit RPC parameters
> > for everything; b) transfer all fixed size fields in a single struct
> > represented as a flat byte array, and all variable-size fields as
> > separate parameters; or c) do custom marshalling (RPC-over-RPC).
> I dropped (c) for expected performance problems.

Hm, right... I failed to include the fourth variant in my list: the
interface you proposed -- that is, payload as explicit parameter, but
everything else (including variable-length bits) flattened -- is
actually something in between (b) and (c). Let's call it (b+) :-)

Also, (b) actually allows two distinct variations: naive flattening, or
explicit manual serialization. (Like the one used for (b+) and (c), but
simpler without variable-length fields.) For completeness, let's use
(b') for the variant with explicit serialisation.

> > > Yes on 2: New struct members could be added in a binary compatible
> > > way.
> > That's actually not true. 
> > [... ] you would have to
> > manually match version numbers and do case-specific handling in your
> > custom RPC-over-RPC handling code.
> Believe me that i have some experience with keeping ABIs stable
> while not hampering their evolution. It would work with (b) and (c)
> (as well as with my idea of (b) being capable of mixed processing
> architectuires). 
> Adding a new member to the end of a struct is easy. One only needs a
> version indicator among the members of the first version.

I don't question your experience :-) Admittedly this wasn't very clear
in my last mail: my point is simply that while it certainly *is*
possible to keep using the RPC with the same formal prototype (as long
as we do not properly expose all the fields as individual parameters),
this is not really useful: we would need separate handling of the
variants in the client and server code anyways -- so we can just as well
explicitly introduce a new RPC with different parameters, while keeping
the original one unchanged. That's the proper way to provide
client/server compatibility in the Hurd. It wouldn't be the first time
we do this.

A more interesting question is: in what direction we keep compatibility?
I don't think it's necessary to arbitrarily allow mixing old/new servers
and clients. I would probably keep the handling of the old variant along
with the newer ones in the client library (i.e. libburn) as long as old
drivers (servers) are likely to be in use; while the servers would
always expose only the newest variant, best matching their actual

But that's not really relevant for now. It's enough to know that we can
easily keep compatibility at the RPC level whenever we want/need to --
without any manual marshalling ugliness :-)

> (Does MIG tolerate 20+ parameters ?)

I don't think we have any RPC with more than 20 parameters so far... So
I can't say for sure. But I would consider it a very serious bug if
there was any such limitation.

>   /* NOTE: This is an embedded array to allow for answer (x,yes).
>            If fanned out parameters are chosen, then this can be 
>                unsigned char * sbp;
>            like in Linux and can be transmitted as variable length array.
>    */        
>   unsigned char sbp[264];     /* Replies the sense data which indicate errors
>                                  or noteworthy drive conditions.
>                                  >>>
>                                  One could reduce the size of this field to 14
>                                  because one is mainly interested in the
>                                  triplet of KEY, ASC, ASCQ which characterizes
>                                  SCSI errors.
>                                  But MMC-6 allows up to 255+7 bytes in fixed
>                                  format (0x70, 0x71) and guarantees that only
>                                  these fixed formats are emitted by the drive.
>                                */

Well, if we know for sure that it will never be larger than this, I
think it might be acceptable to make it a fixed-size array.

As there are no other variable-sized arrays beside the actual payload,
(b') would actually become the same as (b+) in this case...

Of course I still consider (a) or (b) the best options :-)

>   unsigned int duration;      /* Time taken by cmd (unit: millisec)
>                                  (0xffffffff = time measurement not valid)
>                                */

I'm sure you actually mean "(unsigned int)-1" -- which is only
equivalent to 0xffffffff on machines with 32 bit int :-)

> > I now think that indeed we should go with an RPC
> > definition having only the minimal parameters necessary to expose the
> > capabilities of the current driver; and simply introduce a new RPC once
> > we actually have a better driver.
> It would be less work for me that way.
> But i think it is not a good idea to plan for incompatible re-implemetation
> already now, while we plan for a first implementation.

My point is that we shouldn't actually plan for any reimplementation in
advance at all -- I'm invoking YAGNI :-) We should think about a better
interface, only when we actually introduce a new driver that needs it.

> > Still, the generic device_transact() approach
> > looks *very* ugly after seeing the complexity actually necessary here :-) )
> I agree. If it is a generic device_transact() call, then it should
> have generic parameters. (Or may i say: "must have" ?)
> If we decide for specific parameters, e.g. by exposing them to the RPC
> definition, then we can hardly give sense to a generic call.

Well, obviously it's not a generic device_transact() call if we use
SG-specific parameters, i.e. variant (a) or (b)/(b'). But also for (b+),
I think it would be quite a stretch to claim it's generic -- even when
ignoring the fact that it's not likely ever actually to be used for
anything else. That's why IMHO it would be ugly to go with (b+) while
still calling it "device_transact()". I'd say the only variant that can
even pretend to be generic is (c) -- but as you pointed out, this would
be somewhat inefficient.

This is on top of the general uglyness of RPC-over-RPC, which applies
both to (b+) and (c) -- either variant is more convoluted in one way or
the other for SG transactions than the (already ugly) existing
device_get_status()/device_set_status() calls.

> > > I do not understand yet the "RPC-over-RPC" aspect, resp. why it is so
> > > bad.
> > That's like loading your car onto a truck and driving the truck,
> It is more like the tunnel between France and Britain.
> You put your car into a railroad wagon because its own wheels do not
> fit the track.

That's actually a fairly nice comparision (mine can be considered
backwards in a way...) -- but your description of the setup is not quite
right :-)

The Tunnel trains can transport passengers directly, but they can *also*
cars. This is useful, as taking the car along can be convenient for the
travelers before/after passing the Tunnel; and since cars can't pass the
Tunnel directly, tunneling them (pun almost unintended ;-) ) by train
makes perfect sense.

However, the RPC-over-RPC mechanism would have absolutely no use outside
passing the client/server barrier. So it essentially means designing
cars that aren't used for anything else but being loaded on a train to
hold people while passing the tunnel... Which is ridiculous, when people
who don't otherwise need the cars can just as well get seats in the
waggons directly.

> > Well, I must admit that I'm not a big fan of OOP... :-) 
> Aggregation and encapsulation are indispensible for any large
> software project. (Aggregation stems from old Structured Programming,
> afaik.)

Aggregation is important, but works perfectly fine without any OO
syntax. Encapsulation is helpful when used sparingly. Too much
encapsulation, along with other OOP paradigms, makes the program
structure too rigid and impoverished IMHO. Simple things tend to become
cumbersome; and when faced with inevitable evolution, a rigid structure
-- being hard to adapt -- tends to fuck up the design, and thus do more
harm then good.

But that's entirely off-topic here :-)

> The motivation for these potentially messy patterns is re-usability,
> which was the big promise of OOP (not always kept, though).
> My motivation for proposing a generic RPC is the same. It shall
> be reusable with other device classes.

Reusing an interface doesn't provide anf inherent benefit by itself.
It's only useful if it means code can be reused without adapting it to
other situations. And that's not the case here: the code doing CD writer
invocations won't be useful for anything else but for talking to CD
writers -- no matter how generic you make the interface...

> > Polymorphism doesn't come in here at all BTW.
> My eyes can clearly see it in form of function_code, that selects
> the actuall implementation class, and of generic parameter structs,
> which become senseful only within a specific implementation class.

You are right. I was confused about the meaning of polymorphism. This is
indeed a case polymorphism -- and for the reasons explained above, it's
not useful here :-)

> > I don't see any reason why the specific SG transaction call can't be
> > hooked into device_emulation_ops() just as well. There is precedence for
> > having a device-specific call in the generic device interface:
> > device_set_filter() is used only for network interfaces.
> Why using the method panel at all in this case ?
> The call could just jump into the appropriate kernel code and perform
> the network specific operation there.
> I understand struct device_emulation_ops as an interface definition,
> which gets implemented by various device classes.
> It makes few sense to burden this interface with a method that can
> only be implemented for a single device class, because its parameters
> are highly specific.

Well, I never tried to really understand the full purpose of the
device_emulation stuff; but having a generic dispatcher definitely makes
sense for really generic calls, such as open(), read(), write(), or
seek(). For inherently device-specific calls -- such as set_filter() or
get_status()/set_status() -- it is not really useful. Apparently the
Mach designers wanted to keep consistency with the other device_ calls
though. This had the unfortunate side effect that to keep the
device_emulation structure compact, they tried to make the
get_status()/set_status() calls generic, instead of using specific
per-device calls.

Now bypassing the device_emulation interface for just one new device
call, would be really inconsistent and confusing; so passing it through
the device_emulation layer instead is probably preferable. However, as
this call is not going to be used for anything else, it really doesn't
matter whether it's explicitly specific, or pretends to be generic --
either way, it will occupy NULL slots in all other device classes. For
all practical purposes, there is no downside to making it specific, and
reaping the benefits.

> What Linux does not expose, are some driver status results of the
> REQUEST SENSE command that is issued automatically after the
> command from userspace has failed.
> FreeBSD and Solaris expose this to some degree.

I somehow managed to temporarily forget the most important reason why it
makes sense to stick to whatever the Linux interface does: as the actual
driver we will use in the future will most likely come directly from
Linux, using any substantially different interface would be extra
effort... So better keep close to Linux, unless there is a really really
good reason not to.

But as I said, we should leave these considerations for the time when we
actually port the new driver :-)

> > I for my part consider the network transpacency a legacy I'd prefer to
> > get rid of (just like the actual network IPC code we already purged), as
> > I believe it's responsible for a substantial part of the complexity of
> > Mach RPC. (And possibly other design shortcomings as well.)
> As mentioned in my last post: Cloud and number crunching graphics cards
> are modern forms of mixed architecture.
> I mentioned these examples to counter the assumption that nowadays
> everything runs on homogenous architectures.

I never stated such an assumption :-) What I claim is that transparent
IPC between different architectures is not useful.

I'll ignore the "cloud" bit -- as I stated already in my previous mail,
I'm not aware of any actual mixed-architecture scenario there... As for

> > But there is nothing *transparent* about the communication between
> > processes running on CPUs and GPUs;
> I think that the _published_ architecture of Hurd would be
> well suitable to employ GPU servers.

Actually, the control capabilities of GPUs are too limited to run
autonomous processes on them. The best you can do is running server
processes on the CPU, which then explicitly push their computations to
the GPU as appropriate. These server processes can take care of any
necessary translation.

A more relevant case for non-symmetric multiprocessing, can be found in
the CPU+DSP combinations employed in some embedded setups. The DSPs are
actually fully autonomous processors AFAIK, so you can indeed invoke
RPCs between clients and servers on the CPU and DSP parts. However, this
is also *not* a situation where IPCs can transparently pass architecture
boundaries: any server or client will be explicitly written to run
always on the CPU or always on the DSP part. The situation is clear
while writing the code -- so any necessary data format translation can
easily be done explicitly.

The idea of the network-transparent IPC in Mach on the other hand, was
that machines having different architectures could be connected to form
an SSI cluster, where each node can basically do the same work, and you
never know whether the process you talk to right now runs on the same
architecture. But as I said, there seems to be more or less universal
agreement nowadays that doing network-transparent IPC at the kernel
level is not a good approach.

(One quite promising approach is 9p BTW, which does network transparency
at the filesystem level.)

> > Another problem might be that to allow network-transparent IPC, MIG
> > would have to guess the appropriate machine-independent types to safely
> > hold the value of each field
> Solved by SUN Microsystems in form of XDR and SUN RPC at least two years
> before the text mig.ps was published. RFC 1014 (SUN XDR, 1987) gives
> motivation why certain design decisions were made.
> These competitors are the main reason why MIG disappoints me.
> SUN XDR prescribes that the primitive types are to be represented
> as on SUN MC68000 workstations.
> This saves the effort to attach information about byte sex, word
> size or floating point formats (of which there were several, back
> then).
> Intel x86, VAX, Intergraph Clipper, and others had to convert
> at the interface. Nevertheless SUN NFS was implemented with
> sufficient performance on all these systems using SUN RPC and XDR.

Competitors? It seems to me you are comparing apples to oranges... The
SUN RPC stuff is an explicit network RPC mechanism. Always converting to
a certain well-defined format is fine here, as the conversion cost is
totally irrelevant compared to the cost of the actual network
transaction. The Mach IPC mechanism on the other hand first and formost
has to provide fast local communication -- the cost of doing unnecessary
conversion in the local case would be inacceptable.

Unnecessary conversion could be avoided, if Mach itself handles
conversions (rather than MIG), as Mach knows when conversion is
unnecessary -- like it indeed already does for byte order swapping.
However, the mere possibility of doing data size changing conversions
would introduce a lot of overhead I believe. Using machine-independent
types at RPC definition level is much more efficient: conversions from
one int type to another in unserialised form are almost trivial; and can
be avoided alltogether, if the rest of the code dealing with the fields
in question also uses the machine-independent types.

In fact I still fail to see any other method to handle word size
differences at all. The size of "int" for example varies from 16 to 64
bits on architectures I'm aware of -- how is either Mach or MIG supposed
to know how many bits it actually needs to transfer? Only the programmer
knows this -- and can express it just fine by using machine-independent
types in the RPC definitions.

> > > Representing arbitrary C-pointer graphs is quite impossible for an RPC
> > > system.
> > Actually, if the relations of these pointers are known in advance, I
> > don't see why they couldn't be expressed in the interface definition
> Let's look at the example of Linux sg_io_hdr
>   unsigned int dxfer_len;     /* [i] byte count of data transfer */
>   void * dxferp;              /* [i], [*io] points to data transfer memory
> This can only be represented in a serialized (byte array) form, if
> the converter knows about the relation of both members. Only then
> it can determine how many bytes have to be taken from .dxferp.

Right... I'm sure basic cases like this one could be expressed in the
RPC definitions somehow; but I guess it wouldn't be pretty... And it
could never cover all possible cases.

Might be another reason why the MIG designers didn't implement any
automated struct handling: without the possibility of handling pointers,
it would be useless in many cases anyways...

> > Mach *does* require the same sizes for the individual parameters on both
> > sides;
> Is this really specified that way ?

I don't remember reading an explicit specification regarding that. But
from all I've seen so far, I'm pretty sure this is a safe assumption.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]