[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_t
From: |
West, Nathan |
Subject: |
Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test |
Date: |
Fri, 21 Feb 2014 01:39:33 -0600 |
On Thu, Feb 20, 2014 at 11:25 PM, Kelly Boswell <address@hidden> wrote:
> After the make test failed for this module, I decided to poke around to see
> if there is an easy fix. I made a script that simply executes the test over
> and over until it seg faults and exits after the core file is created.
>
> address@hidden:~/src/gnuradio/build/gr-digital/python/digital$ ./runtests.sh
> Using Volk machine: avx_64_mmx
> Segmentation fault (core dumped)
>
> address@hidden:~/src/gnuradio/build/gr-digital/python/digital$ gdb
> /usr/bin/python2.7 core
> (gdb) bt
> (gdb) bt
> #0 0x00007fe8f627fb17 in volk_32fc_32f_dot_prod_32fc_a_avx ()
> from /home/kelly/src/gnuradio/build/volk/lib/libvolk.so.0.0.0
> #1 0x00007fe8f52dd25f in
> gr::filter::kernel::fir_filter_ccf::filter(std::complex<float> const*) ()
> from
> /home/kelly/src/gnuradio/build/gr-filter/lib/libgnuradio-filter-3.8git.so.0.0.0
> #2 0x00007fe8f143c45b in
> gr::digital::pfb_clock_sync_ccf_impl::general_work(int, std::vector<int,
> std::allocator<int> >&, std::vector<void const*, std::allocator<void const*>
>>&, std::vector<void*, std::allocator<void*> >&) ()
> from
> /home/kelly/src/gnuradio/build/gr-digital/lib/libgnuradio-digital-3.8git.so.0.0.0
> #3 0x00007fe8f653809e in gr::block_executor::run_one_iteration() ()
> from
> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
> #4 0x00007fe8f6573622 in
> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, int) ()
> from
> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
> #5 0x00007fe8f6565ea1 in
> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>,
> void>::invoke(boost::detail::function::function_buffer&) ()
> from
> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
> ---Type <return> to continue, or q <return> to quit---
> #6 0x00007fe8f6526610 in boost::detail::thread_data<boost::function0<void>
>>::run() ()
> from
> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
> #7 0x00007fe8f9adc94a in ?? ()
> from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.53.0
> #8 0x00007fe8fc8a3f6e in start_thread (arg=0x7fe8e2ffd700)
> at pthread_create.c:311
> #9 0x00007fe8fc5ce9cd in clone ()
> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>
> Of course, I had to recompile it with debugging info to glean anything
> useful from the stack trace. So, I did that and I traced the bug to this
> line:
>
> c0Val = _mm256_mul_ps(a0Val, b0Val);
>
> I can't dump the values in a0Val or b0Val, though, because they're
> intermediate values that are optimized away by the optimized kernel code. I
> tried stepping through the assembler instructions but I'm not familiar with
> the various sse and avx extensions. Heck, I'm not even familiar with the
> x86_64 instruction set. So I have a huge learning curve ahead of me, there.
> Is it possible to just dump the values in these __m256 data types to a file
> so I can debug it that way? If that's not easy to do, then I'm willing to
> learn what I have to about the instruction set so I can debug this thing.
> But I would sure appreciate some help if anyone has some advice to offer.
>
> Software version:
> I rebased to the latest version of the next branch last night before I went
> to bed at around 1:30 am CDT.
>
> Operating System:
> address@hidden:~/src/gnuradio/volk/kernels/volk$ uname -a
> Linux octs2 3.11.0-17-generic #31-Ubuntu SMP Mon Feb 3 21:52:43 UTC 2014
> x86_64 x86_64 x86_64 GNU/Linux
> It's Ubuntu 13.10
>
> Hardware: ASUS X750J
> Intel Quad Core i7 4700HQ 2.4GHz
>
> cpuinfo:
> processor : 7
> vendor_id : GenuineIntel
> cpu family : 6
> model : 60
> model name : Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
> stepping : 3
> microcode : 0x8
> cpu MHz : 2401.000
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 3
> cpu cores : 4
> apicid : 7
> initial apicid : 7
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est
> tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb
> xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase
> tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
> bogomips : 4789.27
> clflush size : 64
> cache_alignment : 64
> address sizes : 39 bits physical, 48 bits virtual
> power management:
>
Hi Kelly,
First, this is great debugging, thanks for getting so much info and
trying to go for a fix on your own.
On to the good stuff. I was able to reproduce this on my i7-4700MQ.
Here's some additional info for the logs:
* constellation_receiver is a hier block with a fir_filter_ccf inside
that is calling the volk avx dot product.
* The avx dot product proto-kernel passes VOLK QA
* The qa_fir_filter.py is testing a fir_filter_ccf that passes its QA.
* Just for kicks, I forced VOLK to use the generic kernel and I still
see the segfault.
A couple of things I'd like to try (and please feel free to give these a try):
* Go back to a commit just before fir_filter.cc started using
volk_malloc and volk_free. (or for bonus points go back to some point
in time when this test always passes and do a git bisect)
* fiddle with parameters of the test, data length, number of taps in
filter, etc.
* Doubtful this would change, but test on different processors. It
would be pretty wild if there was something off in the 4700 line, but
the fact that the generic proto-kernel had the same result and nobody
else has reported this yet is suspicious. My guess is GCC is actually
emitting *very* similar code for the generic and avx dot product
proto-kernels.
Nathan
- [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test, Kelly Boswell, 2014/02/21
- Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test,
West, Nathan <=
- Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test, Tom Rondeau, 2014/02/21
- Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test, Kelly Boswell, 2014/02/21
- Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test, Kelly Boswell, 2014/02/21
- Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test, West, Nathan, 2014/02/21
- Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test, Kelly Boswell, 2014/02/21
- Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test, Tom Rondeau, 2014/02/23