[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: race condition in libports

From: Brent W. Baccala
Subject: Re: race condition in libports
Date: Fri, 5 Jan 2018 17:45:57 -0500

On Sun, Dec 17, 2017 at 11:41 PM, Brent W. Baccala <cosine@freesoft.org> wrote:
On Sun, Dec 17, 2017 at 8:02 AM, Samuel Thibault <samuel.thibault@gnu.org> wrote:

Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote:
> basically, we're storing ports in a inode-to-port hash, looking them
> up when io_identity() gets called, and removing them from the hash
> when the class's clean routine gets called.

That's the bug: one needs a reference for this.  And it's a weak
reference: identity is fine with getting rid of it.  Could you try the
attached patch?

Makes sense.  I've applied it.  So far, so good.

I'm still having problems with this, even with the stock ext2fs/libpager code.  To reproduce these bugs, I suggest making a copy of the virtual disk (hd0 -> hd1), booting a subhurd on hd1, then trying to build the entire hurd package (dpkg-buildpackage -b).  I've yet to successfully build the hurd packages in a subhurd - there's always some bug or another that gets triggered.  If not this one, then I've been having problems with the proc server.

Back to the io_identity code...

Part of the problem is that sizeof(ino_t) is 8 bytes, while sizeof(hurd_ihash_key_t) is only 4 bytes.  So we need to use the "generalized key interface" to hash the full 8 bytes.

But that doesn't fix everything.  I'm seeing periodic double removal from the hash table.  It originally manifested as an underflow on the weak reference count, but I added a line:

       assert_backtrace(hurd_ihash_value_valid(* i->id_hashloc));

right before the hurd_ihash_locp_remove in id_clean, and this assert periodically triggers.

The most notable thing about it is that it always seems to be triggered by a NO SENDERS notification with a mscount field less than the mscount in the portinfo structure:  

#8  0x0807ed9d in __assert_fail_backtrace (assertion=0x817c080 "hurd_ihash_value_valid(* i->id_hashloc)", 
    file=0x817c03c "/root/hurd-0.9.git20171119/build-deb/../libfshelp/get-identity.c", line=60, 
    function=0x817c0a8 <__PRETTY_FUNCTION__.8282> "id_clean") at ./build-deb/../libshouldbeinlibc/assert-backtrace.c:61
#9  0x08073d61 in id_clean (cookie=0x20550ec0) at ./build-deb/../libfshelp/get-identity.c:60
#10 0x0807a20e in ports_port_deref (portstruct=0x20550ec0) at ./build-deb/../libports/port-deref.c:39
#11 0x0807a9bd in internal_demuxer (outheadp=0x22800ef0, inp=0x22802f00) at ./build-deb/../libports/manage-multithread.c:215
#12 synchronized_demuxer (inp=0x22802f00, outheadp=0x22800ef0) at ./build-deb/../libports/manage-multithread.c:239

(gdb) frame 11
#11 0x0807a9bd in internal_demuxer (outheadp=0x22800ef0, inp=0x22802f00) at ./build-deb/../libports/manage-multithread.c:215
215               ports_port_deref (pi);
(gdb) x/8x inp
0x22802f00:     0x00001700      0x00000020      0x00000000      0x20550ec0
0x22802f10:     0x00000000      0x00000046      0x10012002      0x00000002

(gdb) print *pi
$9 = {class = 0x209ae8, refcounts = {references = {hard = 0, weak = 1}, value = 4294967296}, mscount = 24, 
  cancel_threshold = 0, flags = 0, port_right = 22723, current_rpcs = 0x0, bucket = 0x820e628, hentry = 0x84a7cf8, 
  ports_htable_entry = 0x205b3610}

Notice that the NO SENDER message had an mscount of 2 (the last word in the x/8x), but the portinfo structure has an mscount of 24.

Also, there's no hard references and one weak reference in the portinfo structure.  At this point there should be two weak references - one for the hash table entry and one for the demuxer.  If the hash table assert didn't fail, then we'd be getting a weak reference underflow soon.

I've puzzled over this for days, and this is my current theory for what's happening.

We started with one hard reference (because we have outstanding send rights) and one weak reference (the hash table).  Then we got a NO SENDERS message, so we started to drop the hard reference by demoting it to a weak reference, in libports/port-deref.c.  Now we've got no hards, two weaks, the entry still in the hash table, and nothing locked (we're about to call the dropweak_routine).

At this point, other threads preempt us.  Our portinfo structure is still in the hash table, so more send rights get created (22 more, in the gdb trace above!), they get consumed, and another NO SENDERS message gets generated.  It gets to the dropweak_routine with no hard references and three weak references (the two from before, plus one more from the second NO SENDERS' demuxer).  It removes the hash table entry (dropping one weak reference), drops another weak reference (in ports_port_deref), and finishes running.

Now the first thread (finally) gets to run again.  It's now got no hard references, one weak reference, and the dropweak_routine runs on a portinfo structure that's already been removed from the hash table.  It removes the hash table entry (again), drops the last weak reference, and then tries to drop another weak reference in ports_port_deref, which triggers the underflow.

I think.

I've "fixed" this by making sure we don't remove the hash table entry unless there are exactly two weak references outstanding, but I'm not sure that's the best way to handle it.  It doesn't seem like the dropweak routine should have to be so careful; it shouldn't get called twice like that.

Anyway, the attached patch does three things:

    1. uses generalized keys to hash the entire ino_t
    2. adds an assert to check for double removal
    3. doesn't remove the hash entry unless there are exactly two weak references

It's been "tested" in the sense that I'm not seeing that assert fail anymore; other other bugs having crashed the program today.


Attachment: identity_weak_2
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]