|
From: | Marcus Müller |
Subject: | Re: [Discuss-gnuradio] Thread safety of PMT objects in python |
Date: | Sat, 9 Jul 2016 14:58:05 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 |
Hello Jonas,
Hm, yes, that sounds like the typical C++ object life time. (In fact, as I'll explain below, the problem lies deeper than threading – it's about object ownership, which is kind of borked for PMTs, here, and actually, not only those.)The problem arises when accessing 'old' PMTs. That is PMTs, that were handed over to python from the C++ domain in the past, i.e. through a message handling callback. It appears the PMTs are only valid throughout the duration of the function they were handed to. So, great that you attached a test case! By the way, segfaults in
valid code (i.e. with small exceptions in any python code) are
usually bugs, and you're more than invited to open a bug report
under [1], but you'll need a gnuradio.org redmine account to see
the "New Issue" button. Rather than just present an answer I'll explain what I'm doing
here, so that you (and others) might recreate. I think there will
not be much new info in here for you, Jonas, but rather than just
doing what I did to verify, I'd thought I share Roughly:
So, basically, we're stuck with 3. There's this [2] wiki page that explains what you can do with bog-normal GDB and python scripts. The current state of affairs is that at least Fedora (and I suspect Arch, too) ship GDB and python-devel (or their Arch/pacman equivalents) with a script that automatically enables python symbol name resolution when running a python process – which is great, because that allows us to see in which python functions things go wrong! Then it all comes down to running (after installing the debug
infos for a lot of libraries – luckily, my GDB even prints out the
actual package manager commands I need to run to install the
missing debug symbols) gdb --args python /tmp/min_err_repro.py then, on the GDB shell, "run", wait for the crash, and then "bt" (short for "backtrace"). This led to this output for me: #0 0x00007fffef62d2c5 in boost::detail::atomic_count::atomic_exchange_and_add (dv=1, pw=0x39) at /usr/include/boost/smart_ptr/detail/atomic_count_gcc_x86.hpp:67 #1 boost::detail::atomic_count::operator++ (this=0x39) at /usr/include/boost/smart_ptr/detail/atomic_count_gcc_x86.hpp:30 #2 pmt::intrusive_ptr_add_ref (address@hidden) at /home/marcus/src/gnuradio/gnuradio-runtime/lib/pmt/pmt.cc:66 #3 0x00007fffe7e184c5 in boost::intrusive_ptr<pmt::pmt_base>::intrusive_ptr (rhs=..., this=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:92 #4 boost::intrusive_ptr<pmt::pmt_base>::operator= (rhs=..., this=<synthetic pointer>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:129 #5 _wrap_write_string (args=<optimized out>, kwargs=<optimized out>) at /home/marcus/src/gnuradio/build/gnuradio-runtime/swig/pmt_swigPYTHON_wrap.cxx:39897 #6 0x00007ffff7af2796 in call_function (oparg=<optimized out>, pp_stack=0x7fffde621220) at /usr/src/debug/Python-2.7.11/Python/ceval.c:4427 #7 PyEval_EvalFrameEx ( address@hidden 0x7fffdf682730, for file /home/marcus/.usrlocal/lib64/python2.7/site-packages/pmt/pmt_swig.py, line 3295, in write_string (obj=<swig_int_ptr(this=<SwigPyObject at remote 0x7fffdfcfaed0>) at remote 0x7fffdf659a10>), address@hidden) at /usr/src/debug/Python-2.7.11/Python/ceval.c:3061 #8 0x00007ffff7af23e2 in fast_function (nk=<optimized out>, na=<optimized out>, n=1, pp_stack=0x7fffde621360, func=<optimized out>) at /usr/src/debug/Python-2.7.11/Python/ceval.c:4513 #9 call_function (oparg=<optimized out>, pp_stack=0x7fffde621360) at /usr/src/debug/Python-2.7.11/Python/ceval.c:4448 So, yes, your suspicion was pretty right, this has something to
do with with the handling of objects in "pythonland". PMTs are a bit special in a number of ways. I don't like all of these, because they make those polymorphic types meant to be used for portability less portable :) So, first of all, pmt::pmt_t is actually a typedef for
boost::intrusive_pointer<pmt_base>, which is a refcounting
pointer wrapper. Now, if you hand over pmt_t from C++ to Python, Python needs your object to be a CPython PyObject, which is the Python-internal "universal" struct that's behind every single Python object. GNU Radio could have written "glue code" for every single thing that we want to expose to Python from C++, but instead, SWIG is used – which (kind of) fully automatically generates wrapper code for C++/C functions, and adds PyObjects with the appropriate properties and function delegates (including type conversions etc) to all the classes that we need in Python. So, this all is a bit of an onion situation: Python(SWIG-generated PyObject(SWIG type abstraction(Intrusive Pointer (pmt_base) ) ) ) Notice how we have a bit of a problem here: Python has its own refcounting for the PyObject* that it handles. In other words, as you do key = self.get_tags_in_range(0, offs, offs+1)[0].key Python increases the refcount of the PyObject that "self.get...[0].key" is, and makes the "key" refer to that, but that does not increase the refcount the intrusive_ptr has! In other words, after the GNU Radio scheduler is done calling work (through C++/Python PyEval delegation), it executes a "pruning" algorithm to identify the tags that do no longer need to be held in the block's internal tag registry, and removes them from the same, reducing their refcount – and if that count hits 0, then the pmt_base the intrusive_ptr points to (and the intrusive_ptr itself) gets deallocated. Python's PyObject* doesn't notice any of that. It just happily calls pmt:: functions on non-existing objects when you do print self.tags which can lead to a seg fault already at the second iteration. Absolutely the same business is happening with your self.messages
contents – only that messages in a single sender/single receiver
scenario hit zero refcount more reliably. Workaround: yeah. Cheers, Marcus [1] http://gnuradio.org/redmine/projects/gnuradio/issues
On 04.07.2016 21:33, Jonas Deitmerg
wrote:
Hello everyone, I've recently experienced some unexpected behavior when working with PMTs in messages and tags. Although I have already figured out how to avoid this issue, I'd like to know whether it's a systematic error or just a misunderstanding on my part. The problem arises when accessing 'old' PMTs. That is PMTs, that were handed over to python from the C++ domain in the past, i.e. through a message handling callback. It appears the PMTs are only valid throughout the duration of the function they were handed to. To illustrate the problem I have attached some python code which will reliably crash with a segmentation fault. Here's my current understanding of what's happening: 1. The block's thread sees a message that needs to be processed. 2. It dispatches the message (packed as pmt::pmt_t) to the callback function. Through Swig. I assume the reference counting of the pmt object is lost here. 3. The python function works on the data, e.g. saves it for later use. 4. Control returns to the C++ side, the pmt object goes out of scope and is freed. 5. Some other python code tries to access the pmt object and a segfault occurs. Is this roughly correct? If so, is there a way to solve this nicely? It's obviously possible to unpack the pmt object in step 3 and save the contained data for later use. But I'm probably not the last one to get bitten by this, and it's not exactly fun to debug. My setup consists of gnuradio 3.7.9.2, swig 3.0.10 and python 2.7.11 running on Arch Linux, kernel 4.6.3, 64 bit. Thanks in advance Jonas |
[Prev in Thread] | Current Thread | [Next in Thread] |