bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reboots?


From: Marcus Brinkmann
Subject: Re: Reboots?
Date: Thu, 29 Mar 2001 02:03:41 +0200
User-agent: Mutt/1.3.15i

On Wed, Mar 28, 2001 at 04:45:50PM -0500, Roland McGrath wrote:
> > I wouldn't know how to get it, so I don't know if I can. What do I need for
> > this?
> 
> Does ddb work these days?  Last time I did kernel hacking it was
> oskit-mach, and that dumps a stack trace when it panics.

I don't know, I never used ddb.
 
> > If it isn't "wire" I am looking for, I don't know what I am looking for (a
> > grep showed nothing in proc/).
> 
> You are right.  proc used to wire itself (wire_task_self), but it doesn't
> now (init does).  So this kernel bug is of more concern than I thought.

I should mention something. I attached two gdbs, and exited the first one
before the second. (I didn't clear the suspend count when starting the
first, and it didn't ask me for the suspend count when exiting, as it would
in another session I tried). So this might be related to gdb mayhem. I don't
know if running two gdbs is fine (it shouldn't crash the kernel, but...).

Anyway, I sticked with one gdb only this time and it didn't crash. The
subhurd reported that it can't emulate the crash and would reboot the Hurd
now, after exiting gdb. So the kernel panic thread_invoke is either a random
crash or a side effect of the two gdbs (would need to do more testing to
find out. Reproducing the crash takes about one hours, so I'd like to avoid
that).
 
> > Sometimes I wonder if the kernel ring buffer proposed by RMS wouldn't be
> > helpful in situations like this.
> 
> Well, maybe.  But it is a lot of overhead.  I'd be more inclined to work
> on a way to make it possible to trace a sub-hurd using rpctrace on
> the parent hurd.

Ok, sounds fine, too.

I have reproduced exactly the crash Jeff reported. I have collected the data.
I used a ring buffer of 16 entries (can increase if needed), and the full
gdb log is attached. Here are the three ports on which RPCs where logged
immediately before the crash (in interleaved order, see left column). If a
field is blank, it is the same as the previous one in the same column:

  port 218:

real-
order   bits            size    seqno   id
1.      2147488018      32      1246    24021 dostop
2.                              1247    24031 task2proc
3.                              1248    24031
5.                              1249    24018 get_arg_locations
7.                              1250    24030 task2pid
8.                              1251    24012 child

  port 229:

order   bits            size    seqno   id
4.      2147488018      32      0       24013 setmsgport
6.      4370            40      1       24017 set_arg_locations
9.                      24      2       24016 getpids
10.     2147488018      120     3       24022 handle_exceptions
11.                     32      4       24021 dostop
12.                             5       24031 task2proc
13.                             6       24031
15.     4370            24      7       24018 get_arg_locations

  port 279:

order   bits            size    seqno   id
14.     2147488018      32      0       24013 setmsgport
16.     4370            40      1       24017 set_arg_locations

 *** crash ***

Of course, one data point is not very much. I can run this a few more times,
and we can see if a pattern emerges. We can insert assertions etc.
We can probably log whole messages.
Can we run proc single threaded, so that we know where exactly it crashed?

Thanks,
Marcus


-- 
`Rhubarb is no Egyptian god.' Debian http://www.debian.org brinkmd@debian.org
Marcus Brinkmann              GNU    http://www.gnu.org    marcus@gnu.org
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de

Attachment: typescript
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]