[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FIFO race condition on SunOS kernels

From: Vladimir Marek
Subject: Re: FIFO race condition on SunOS kernels
Date: Tue, 1 Jan 2019 23:47:50 +0100
User-agent: Mutt/ (2013-10-16)


> You'd think that establishing a pipe between two processes is a very basic
> UNIX feature that should work reliably on all UNIX variants.

One would think that _opening_ a file is a very basic UNIX feature ...

> But the following script seems to break consistently on Solaris and variants
> (SunOS kernels) when executed by bash, ksh93, or dash. All it does is make
> 100 FIFOs and read a line from each -- it should be trivial.
> And it does work fine on (recent versions of) Linux, macOS, and all the
> BSDs, on all shells.

In a horror I have quickly tested on S11.3 SRU34 x86, S11.3 SRU18 sparc,
S11.4 FCS x86, S11.4 FCS sparc, S10 mostly FCS and even development
Solaris version. I have tested it on S11.3 SRU 35 on LDOM. In all cases
it prints lines

this is FIFO 1
this is FIFO 100

I haven't seen single issue. Then I tried VirtualBox. And it printed
only two lines. Trussing it it gets stuck at

1546/1:         open("/tmp/FIFOs1546/FIFO4", O_RDONLY|O_XPG4OPEN) (sleeping...)

That said, I do use VirtualBox 5.1.22r115126 which is pretty old.

Putting 0.5s delay anywhere in the loop makes the problem disappear.

Looking more closely at the truss output, I can see

5030/1:          4.142629    open("/var/tmp/FIFOs5030/FIFO7", 

and shortly after it gets stuck. Process 5030 is bash.

I don't have debug symbols right now, but I guess from disassembly that
we are in redir_open function. I have tried to protect the loop taking
into consideration both EINTR and ERESTART, but that didn't help.

Let's try to look into kernel.

$ mdb -k
> 0t1775::pid2proc|::whatthread|::findstack
stack pointer for thread ffffa1002910c840: ffffe33001d95990
[ ffffe33001d95990 _resume_from_idle+0x192() ]
  ffffe33001d959c0 swtch+0x19d()
  ffffe33001d95a30 cv_wait_sig_swap_core+0x19c()
  ffffe33001d95a50 cv_wait_sig_swap+0x18()
  ffffe33001d95ae0 fifo_open+0x43e()
  ffffe33001d95b60 fop_open+0x18f()
  ffffe33001d95d50 vn_openat+0x974()
  ffffe33001d95ec0 copen+0x4fd()
  ffffe33001d95ef0 openat+0x31()
  ffffe33001d95f00 sys_syscall+0x247()

Hmm, so we are waiting on a condition. That needs to be investigated
deeper. It would be great if you could open a case for this as we have
to prioritize our work ...

At the moment it looks like a bug in Solaris to me, but it shows only on
VirtualBox. I'll try to look at it more. Or rather to find out someone
who knows the filesystems code.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]