[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] fully enable rpctrace to trace multitask programs.

From: olafBuddenhagen
Subject: Re: [PATCH] fully enable rpctrace to trace multitask programs.
Date: Sun, 14 Jun 2009 23:39:36 +0200
User-agent: Mutt/1.5.19 (2009-01-05)


On Mon, Jun 08, 2009 at 09:27:29PM +0800, Da Zheng wrote:
> olafBuddenhagen@gmx.net wrote:
>> On Sun, May 31, 2009 at 11:32:25AM +0800, Zheng Da wrote:

>>> 1. tracing the source and destination of the RPC requests/replies.
>>> It is required when we move the port rights and it also allows
>>> rpctrace to show the caller of RPCs (I think it is a nice feature).
>> It only shows the caller when it is also a traced task, i.e. this is
>> only meaningful when indeed tracing multitasked stuff, right?
> I think it is still useful even when rpctrace is tracing a single task
> program. rpctrace can know whether a send right is to a traced task or
> not and  thus, don't need to treat mach_port_insert_right() specially
> and can  handle the case below properly.

Well, I'm not sure what the difference to the original rpctrace is
exactly... However, if there is a change in tracing of
source/destination that is meaningful even without actually supporting
multiple tasks, I guess that's an obvious candidate for putting it in a
separate patch :-)

>>> The patch can also fix some bugs reported in savannah, for example,
>>> the traced task hangs when the it gets signals.
>> Are these fixes a side effect of the new functionality, or have you
>> done fixes explicitely? In the latter case, they should be split out
>> into distinct patches...
> I provided a patch before to fix the bug that a traced task hangs when
> it gets signals. see https://savannah.gnu.org/patch/?6830

Ah, nice.

In that case, please keep this change in a seperate patch (as it
originally was), instead of lumping it in with the main multitask patch

>>> +void
>>> +add_task (task_t task)
>>> +{
>>> +  static size_t capability = 100;
>>> +
>>> +  if (traced_tasks == NULL)
>>> +    traced_tasks = malloc (sizeof (task_t) * capability);
>>> +  else if (nb_traced_tasks == capability)
>>> +    {
>>> +      capability *= 2;
>>> +      traced_tasks = realloc (traced_tasks, capability);
>>> +    }
>>> +  assert (traced_tasks);
>>> +
>>> +  traced_tasks[nb_traced_tasks++] = task;
>>> +}
>> actually I don't think it's a good idea to use chunked allocation here
>> at all... Probably just adds unnecessary complexity. Simply realloc()
>> each time -- it's not *that* inefficient. Or use a proper dynamic data
>> structure; like a linked list, or perhaps even better an ihash.
> I think this is the simplest way to handle it. The entry of the array  
> 'traced_tasks' is just a task port.

> The array is also relatively static assuming that task creation and  
> termination is relatively rare, so it is really unnecessary to use  
> dynamic data structure, which increases the complexity and reduces the  
> performance (thought the performance isn't an important issue in 
> rpctrace).

Actually, with all the custom code you need for handling the array, it's
*more* complex than using a standard data structure like ihash!

Of course, the amount of code actually used with ihash would be larger
-- but this code is maintained elsewhere, so it doesn't really figure in
the complexity comparision. What matters is the amount of code you need
to maintain here, and I don't think the custom array handling lowers

I'm not even convinced on the performance point. ihash requires the CPU
to execute more individual instructions of course; but on modern
processors, that's not the major performance factor in most applications
anyways. What usually matters more is cache usage -- and this is
probably better with ihash, as that code is used in other places as
well, and thus it doesn't actually occupy additional space in the caches
when you use it here as well...

Admittedly, the last part is guesswork. But even if it is somewhat less
efficient, I think that avoiding the custom handling is worth it.

>> Yeah, this definitely looks like a job for a dynamic data
>> structure...
> I don't really understand why glibc doesn't provide general data
> structure such as vector and linked list as STL or Java library does.


Most people just use glib for such things nowadays... But the Hurd
predates that by many years; and also, glib is problematic in certain
use cases :-(

> Though Hurd provides ihash, we cannot use it as a dynamic array or
> linked list. They are very common structure. We should have a general
> implementation and include them in some library such as
> libshouldbeinlibc so Hurd developers don't need to implement them
> again  and again when they need them.

Well, single-linked lists are very simple once you understand the idea;
so adding standard functions (or macros) for handling them is probably
considered hardly worthwhile...

Also note that in many cases (including the one above), actually you
*can* use ihash instead of linked lists... While it is a more complex
data structure, I don't think that really matters most of the time.

>>>  struct traced_info
>>>  {
>> I had various doubts about this structure, until I realized that the
>> real issue goes deeper: It is no longer appropriate to use a single
>> structure here, when tracing multiple tasks.
> I thought about separating this structure but I compromised because it  
> needs much bigger modification.

Well, I think that keeping the code maintainable in the long run is more
important than keeping modifications minimal... It is quite normal that
non-trivial changes require some refactoring.

Note that if the refactoring changes can be split out into separate
patches, this helps keeping the main patch with the actual functional
changes smaller and easier to understand...

>>> +struct port_info *notify_pi;
>> Is this actually related to tracing multiple tasks?...
> which one? notify_pi? I would say, yes.


Still, could this be done in a separate patch, to go before the main
patch for tracing multiple tasks?...

>>> +  info = malloc (sizeof (*info));
>>> +  assert (info);
>> Never assert() on malloc() errors!
>> malloc() failures are perfectly possible in normal use; they do *not*
>> indicate program errors. Use normal error reporting for them.
>> (Note that when building with NDEBUG, assert()s are not even compiled at
>> all.)
> Maybe I should use error().


>>> +  err = mach_port_request_notification (mach_task_self (), right,
>>> +                                   MACH_NOTIFY_DEAD_NAME, 1,
>>> +                                   notify_pi->port_right,
>>> +                                   MACH_MSG_TYPE_MAKE_SEND_ONCE, &foo);
>>> +  assert_perror (err);
>> Are you sure that mach_port_request_notification() will only fail when
>> the program messed up, and never in normal use?...
> The program might not work properly as I explained above (see the  
> comments on notify_pi), but it depends on the traced task.
> So I want rpctrace to stop and everyone knows where the problem is.

Well, the question really is whether the function can *only* fail if
there is some bug in rpctrace itself -- in that case, assert() would be
the right thing. However, if it is possible that the call fails even if
rpctrace itself is absolutely correct, you should use a normal error()
rather than assert(), as explained above.

According to the manual, mach_port_request_notification() with
KERN_UREFS_OVERFLOW -- the former at least is something that can happen
in normal operation, even if rpctrace is bug-free. (The latter probably
as well, though I'm not entirely sure about that one...)

>>> @@ -302,6 +500,7 @@ new_send_once_wrapper (mach_port_t right,
>>> mach_port_t *wrapper_right)
>> Your mail client wrapped this line (and many more like this), breaking
>> the patch :-(
> I knew thunderbird had some problems so I used gmail directly.
> It seems gmail isn't suitable for sending patches, either:-(

You might consider using a proper mail client, like mutt... ;-)

IIRC kernel.org has some tips on sending patches with various mail
clients, including Thunderbird and the gmail webmail interface.

>>> +  if (info->receive_right)
>>> +    {
>>> +      /* Remove it from the send right list. */
>>> +      prev = &info->receive_right->next;
>>> +      if (*prev)
>>> +   {
>>> +     while (*prev != info && *prev)
>>> +       prev = &((*prev)->next);
>>> +     assert (*prev);
>>> +     *prev = info->next;
>>> +   }
>> Another case of unnecessary outer IF...
>> This code is actually incorrect I believe: surely the assert() should
>> also trigger when the list is empty?
> The outer IF here is necessary because of the code "prev =  
> &info->receive_right->next;".

Sorry, I guess my wording was confusing. I actually meant the "if
(*prev)" bit, which is unnecessary, just like in the other places I
pointed out...

> If the list is empty, the code inside the inner IF cannot be executed.

Yes, that's my point. The assert() inside the IF block wouldn't get
executed, and in my understanding it should.

Just drop the IF, and everything is fine :-)

>>> +struct traced_info *
>>> +discover_receive_right (mach_port_t send)
>>> +{
>> I'm rather sceptical about this whole function. Is it really necessary
>> to do such a brute-force search for the receive right? Shouldn't we
>> actually always know about all receive rights in all traced tasks, as we
>> are tracing their task ports?
> No, we cannot. A receive right is created by mach_port_allocate(),  
> mach_port_allocate_name() or mach_reply_port().
> The caller of mach_port_allocate() creates the port, but it dosn't get  
> the receive right but the port name in the target task.
> The caller of mach_port_allocate_name() is the same except that the port  
> name is decided by the caller.

> mach_reply_port() cannot be traced by the rpctrace at all since it's a
> system trap.

I see... So there is no way to divert it? That seems strange :-(

Anyways, while this means that we won't see a port created by
mach_reply_port() immediately, we still can start fully tracking it as
soon as it is used in any way, can't we?...

> If we don't treat the first two RPCs specially, rpctrace dosen't know
> a  port has been created and the target task has the receive right. Of
> course, we can do it (understand them and get the receive right) as I
> did before.

That's the right approach IMHO.

> But it becomes unnecessary after discover_receive_right is
> implemented.  This is a more general and elegant (I think:-) way to
> handle the receive  right.

I don't think this is elegant at all. It means that we keep track of
some information, while other is only retreived with a brute-force
search upon need -- this mixup is rather messy.

One way to make it more consistent would be never to try keeping track
of stuff; always fetching the information as needed instead. This would
be very inefficient, though.

The other option is trying to track everything, maintaining a consistent
"world view" at all times.

>>> +      /* If the receive right is in one of traced tasks,
>>> +       * but we don't know its name
>>> +       * (probably because the receive name has been moved),
>>> +       * we need to find it out. */
>> As I said above, I don't see how this could ever happen...
> did you? I forget. maybe the patch is really too large:-)
> as I explained in the comment, a receive right can be moved from one  
> task to another.
> When a receive right is moved, I don't try to get its new port name  
> immediately. Instead, I wait until this function is called.

Yeah, my point was that I don't think it's necessary or elegant to do it
like this -- see above :-)

>> Do I get it right, that this function will create forward ports for
>> any receive rights it finds along the way, as a side effect?
>> Seems questionable to me. At least it should be documented.
> Yes, you are right. The code above does have two functions. I will add
> the comment.

Well, when I said "questionable", I actually meant that I see a problem
with this behaviour. Creating the forward port will create a reference
to a port that otherwise might not have any references at present. You
destroy the forward port when the last external reference is dropped,
but you create a forward port before there is any external reference?
That seems just wrong.

But as I said, I don't like this whole brute-force search anyways :-)

>>> +     /* It's unusual to see an unknown send right from a traced task.
>>> +      * We ignore it. */
>>> +     if (source != UNKNOWN_TASK)
>>> +       {
>>> +         error (0, 0, "get an unknown send right from process %d",
>>> +                (int) task2pid (source));
>>> +         return &dummy_wrapper;
>>> +       }
>> "Unusual"? What does that mean?...
>> Isn't this something that should never happen; and if it does, something
>> must be seriously screwed? In that case, is it even useful to go on at
>> all?...
> Here is one example. see https://savannah.gnu.org/bugs/?26476
> I think the best way is to ignore the case. Thus, the program can still  
> work correctly even though we might not be able to trace some RPCs  
> related to it.

I see.

>>>       {
>>> -       /* This is a receive right that we have been tracing sends to.  */
>> You dropped the old comment, but haven't added a new one for this code
>> block at all...
> I didn't just drop one comment. I dropped the comment and the code below it.
> The new code is explained by other comments.

My point is that the original comment for the whole code block was
useful, and you should add such a comment for your new code block as
well. It's harder to understand the meaning of a whole code block only
from comments of individual pieces.

>>> -       name = info->name;
>>> -       rr = ports_claim_right (info);
>>> -       /* That released the refs on INFO, so it's been freed now.  */
>>> +       struct traced_info *send_wrapper2;
>>> +       char *name;
>>> +       mach_port_t rr;
>> Why did you move the declarations of these two variables down here?
> These variables are only used in this IF clause.

Yes, but that was also the case in the original code...

>> Looks like another change that could go in an extra cleanup patch,
>> but better should not go into a patch that does non-trivial
>> changes...
> I don't understand what you mean here.

While the change might be useful by itself, it is not strictly required
for the actual functional changes you did. Having such non-functional
changes in the same patch makes the actual functional changes harder to

If you want to move the declarations, do it in an extra patch, that
doesn't change other stuff at the same time.

>>> +       /* We have to deallocate the send right in
>>> +        * receive_wrapper->forward before we import the port to
>>> +        * port_info. So the reference count in the port info will be 1,
>>> +        * if it doesn't have any other send rights. */
>>> +       mach_port_deallocate (mach_task_self (), receive_wrapper->forward);
>> This comment is very hard to understand; I don't think it is helpful at
>> all :-(
>> I can only *guess*, from my understanding of the code, that what you are
>> trying to say is: the task just gave up the receive right, so we won't
>> be forwarding to it anymore, and thus need to drop the send right for
>> the forward port.
> If 'forward' has the send right before ports_import_port() is called,  
> the reference count of the new created port info is 2.
> If I don't call mach_port_deallocate before ports_import_port, the  
> reference count is always 2 because 'forward' always hold the send right  
> to the port, and ports_port_deref will not be able to destroy the port  
> info if the source task doesn't have the send right to the port.

Well, the way you describe it doesn't really hit the point IMHO.

You make it sound like the reference count being increased is some kind
of side effect, and we need to "fix" that.

What really happens -- in my understanding -- is that we don't need the
*original* reference, i.e. our send right to the forward port anymore.
That we will get a new reference through the import, and that this
should be our only reference at this point, is really besides the point
I believe.

To put it a different way, we should never need to care whether a
reference count is "1" or "2" or anything at some particular time. The
whole point of reference counting is that we do *not* need to think
about this. All we need to do is make sure that we actually adjust the
reference count when we give up a reference... That's what we do here,
and that is all the comment should say.

>>> +       /* The port has been removed from the send wrapper,
>>> +        * so we cannot put it in the free list. */
>> I don't understand this comment at all :-(
> The original rpctrace recycles send wrappers that lose all send rights  
> by increasing the weak reference count of port info and adding the send  
> wrapper in the free list when the reference count becomes 0.

Actually, this seems questionable to me anyways... It increases the
complexity of the code, and I'm not sure this is justified here. Surely
the performance gain can't matter that much?...

Of course, that's not really related to your changes.

> However, send_wrapper2 cannot be recycled because its port will be  
> removed from the send wrapper by ports_claim_right().
> So I need to decrease its weak reference count with  
> ports_port_deref_weak before calling ports_claim_right.

I see.

>> I don't like constructs of the form "if (foo == a || foo == b) { if (foo
>> == a) {} else {} }"... It's much clearer to use "switch (foo) {}" or "if
>> (foo == a) {} else if (foo == b) {}" IMHO.
>> But I guess this is also a matter of taste...
> if (foo == a || foo == b) was necessary because handling thread_create  
> and task_create share much the same code.
> You can see it https://savannah.gnu.org/patch/?6830
> However, those code disappear after I make it trace multitask programs,  
> but this IF isn't removed.

Ah, OK. This should be more obvious when you keep the original patch
separate, as I requested above.

Also, I suggest following up with a cleanup patch, which removes the
strange construct, now that it's not needed anymore...

>>>       else if (newtypes[0] != name)
>>> -       if (type->msgt_longform)
>>> -         lt->msgtl_name = newtypes[0];
>>> -       else
>>> -         type->msgt_name = newtypes[0];
>>> +       {
>>> +         if (type->msgt_longform)
>>> +           lt->msgtl_name = newtypes[0];
>>> +         else
>>> +           type->msgt_name = newtypes[0];
>>> +       }
>> Putting it in braces is definitely preferable, and in fact recommended
>> by the GNU Coding Standards... However, like all cleanups, this should
>> go in an extra patch -- it is totally unrelated to the other changes.
> I just wanted to satisfy GCC.
> OK. I will put the original code.

Yeah, please submit an extra patch for this cleanup.

>>>     print_data (name, data, nelt, eltsize);
>>> +      fflush (ostream);
>> I guess this is not related to tracing multiple tasks, but rather a
>> general fix?...
> It isn't even a fix.
> I just simplify the debugging.
> I can see what has been received by rpctrace by flushing the stream.
> I can remove it, but it can definitely help the further development of  
> rpctrace.

No, I didn't ask for removing it... Just wanted to suggest doing it an
extra patch again :-)

>>> +     receive_wrapper = hurd_ihash_find (&traced_names, n->not_port);
>>> +     /* The receive wrapper might have been destroyed. */
>> How?
> When all corresponding send wrappers are destroyed, the receive info  
> will be destroyed.
> However, the dead-name notification isn't canceled.

I see.

>> Also, this comment is rather confusing: it looks like it would
>> explain the purpose of the whole IF block, while actually it only
>> explains why the block needs to be conditional...
> Should I say "The receive wrapper hasn't been destroyed"?

I'd rather say something like "the port might have been destroyed in the
meantime", and the explanation you gave above, followed by "if not,
destroy the receiver info and forward port now".

>>> +  /* If it's the request of exec_startup_get_info,
>>> +   * it means that the traced process starts to run */
>> What's the purpose of this? I thought all task/thread creation is
>> intercepted anyways?...
> No. The first thread is created by exec server, which cannot be traced  
> by rpctrace.
> So we have to find this first thread. The best way I can find is  
> observing exec_startup_get_info, which is called when the process starts  
> to run.
> Of course, it has drawback. If the traced program doesn't use glibc  
> (which I believe is really rare), it cannot work.

Hm, is there no other way? Seems a bit fragile to me...

The task might already have gained some ports that we know nothing
about, before we started tracing...

>>> +      munmap (threads, nthreads * sizeof (thread_t));
>> Using vm_deallocate() is clearer.
> I see proc use munmap.

Well, I have seen vm_deallocate() used in the original rpctrace code, so
there isn't even a consistency argument here against vm_deallocate()...

>>> @@ -744,6 +1301,7 @@ trace_and_forward (mach_msg_header_t *inp,
>>> mach_msg_header_t *outp)
>>>        we are consuming its `forward' right in the message we send.  */
>>>     free (info->name);
>>>     info->name = 0;
>>> +   info->forward = 0;
>> Is this related to tracing multiple tasks?
> It might be unnecessary. But I just wanted to reset it to avoid some  
> possible error.
> Yes, you can think it as cleanup or something like that.

So probably also a candidate for an extra patch?...

>>> -  ti = new_send_wrapper (traced_task, &task_wrapper);/* consumes ref */
>>> -  asprintf (&ti->name, "task%d", (int) pid);
>>> +  ti = new_send_wrapper (receive_ti, traced_task, &task_wrapper);
>> Why did you drop the original comment from this line?...
> What comment is dropped?

/* consumes ref */

>> BTW, have you checked that no changes are required to the
>> documentation?...

> The document in where? The only document I can find is
> http://www.gnu.org/software/hurd/hurd/debugging/rpctrace.html, which
> actually isn't document at all.

Well, I assumed rpctrace might have some kind of documentation in the
Hurd manual or something...

> It is a very big patch. It took me almost the whole afternoon to reply it.

Better don't ask how long it took me to write the review in the first
place :-)

> But it seems impossible to split it because the modifications are  
> closely related.

Well, with the rework of the info structures I recommended, you
naturally get a split into three phases at least: first a patch that
untangles some code paths, to make sure different kinds of info
structures never need to be passed to the same function. This would be a
preparation for the second patch, which actually splits the info
structures. This in turn would be a preparation for the main changes,
allowing tracing of multiple tasks.

At least the last phase can be further split down in this fashion
(perhaps the others too) -- in fact, I pointed out several things that
can be split out, in my remarks above.

When doing major functional changes, it's quite normal that you find
many other things that need to be changed along the way, to make the
main change possible. These other changes usually don't make much sense
by themselfs; but nevertheless, it's often possible to do them in
advance, without breaking the existing code. Each such preparation
change can be done in an extra patch, to be commited before doing the
main changes depending on it.

When doing refactoring in netrik, I sometimes did like five preparation
commits, before doing the actual change I intended to do. (Sometimes the
preparations even required other preparations first.) The ultimate patch
doing the main change often boils down almost to a one-liner this way...

I'm aware that working like that is a bit more work than just writing
one large patch. However, it makes the individual patches simple and
clean -- IHMO, this pays off in the end.

Or perhaps it's just me being pedantic ;-)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]