discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Using volk in Mac: test report


From: Nick Foster
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report
Date: Fri, 17 Feb 2012 10:33:42 -0800

On Fri, Feb 17, 2012 at 8:14 AM, Tom Rondeau <address@hidden> wrote:
Carles,

Thanks for the report! We'll look into those failures. Hopefully just some minor misundertanding.

As for the generic sometimes being the best arch, I'm not sure I can help too much on it. I can certainly speculate. Having seen this in my own machines and looked at some of the kernels where generic wins out (which have some overlap with yours), I think it's something about the operation being performed. First, we might be able to do something a bit smarter in the Volk kernel. But more likely, it's simply because the operation being performed is so trivial that it doesn't really matter.

Another reason could be that the tests aren't long enough to avoid OS-level variances while completing a test. The tests use the clock() function for calculating the time difference, which is only the approximate time of the process. It might mean that we need to run the tests for a bit longer to see if that makes any difference. I have noticed that some of the tests where generic wins, it only wins by a very, very small amount of time.

Please ignore the "best arch" reports during the QA code execution; it's very often wrong. The "best arch" report is intended for the volk_profiler, which reuses the same test code with much larger datasets for better execution time resolution, as Tom suggested. The QA code is only intended to show that Volk is working and to find functions which are executing incorrectly. Use volk_profiler to benchmark Volk functions; it will create a custom profile for your machine.

One caveat -- the dataset size on E100/NEON is enough that the profiler might run for several hours, so either recompile with smaller datasets or avoid the profiler... eventually I guess I'll make the benchmark program benchmark itself to set appropriate dataset sizes.

--n
 

Tom

On Tue, Jan 17, 2012 at 3:26 PM, Carles Fernandez <address@hidden> wrote:
Hi all,

I would like to use the volk library in a C++ program that uses
gnuradio-core and currently builds under Linux and MacOS X. In MacOS
1.6.8 (Snow Leopard, updated), I used macports for installing
gnuradio-core (which is in version 3.3, enough for my app). Since, in
my understanding (please correct me if I'm wrong), volk is a library
that can live independently from the gnuradio version, I did the
following:

$  git clone git://gnuradio.org/gnuradio
$  cd gnuradio/volk
$  cmake .
$  make
...
[100%] Built target volk_profile
$  sudo make install

Then I ran the tests:

$ lib/test_all

All test but one passed, and I see that in some functions the generic
architecture is the best one, which is beyond my understanding. The
test that failed is:

...
volk_32fc_32f_multiply_32fc_a: fail on arch sse
Best arch: sse
/Users/carlesfernandez/Documents/workspace/gnuradio/volk/lib/testqa.cc:25:
error in "volk_32fc_32f_multiply_32fc_a_test": check
run_volk_tests(volk_32fc_32f_multiply_32fc_a_get_func_desc(), (void
(*)())volk_32fc_32f_multiply_32fc_a_manual,
std::string("volk_32fc_32f_multiply_32fc_a"), 1e-4, 0, 20460, 1, 0) ==
0 failed [true != 0]
...


I'm quite happy because I see dramatic improvements in some functions
of my interest (basically I want to implement correlators and mixers,
so I'm sensible precisely to this function, bad luck), but this
"generic" superiority in some cases intrigues me. I would appreciate
if anyone can shed some light on the internals of volk, or if I have
to configure or install something else. Anyway, thanks to the
developers for releasing such interesting stuff :-)




This is the complete output, for the records:


volk carlesfernandez$ cmake .
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc
-- Check for working C compiler: /usr/local/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found PythonInterp: /opt/local/bin/python (found version "2.6.7")
-- Boost version: 1.48.0
-- Found the following Boost libraries:
--   unit_test_framework
-- checking for module 'orc-0.4'
--   package 'orc-0.4' not found
-- orc files (missing:  ORC_LIBRARY ORC_INCLUDE_DIR ORCC_EXECUTABLE)
-- Check size of void*
-- Check size of void* - done
-- Performing Test have_maltivec
-- Performing Test have_maltivec - Failed
-- Performing Test have_mfpu=neon
-- Performing Test have_mfpu=neon - Failed
-- Performing Test have_mfloat-abi=softfp
-- Performing Test have_mfloat-abi=softfp - Failed
-- Performing Test have_funsafe-math-optimizations
-- Performing Test have_funsafe-math-optimizations - Success
-- 32 overruled
-- Performing Test have_m64
-- Performing Test have_m64 - Success
-- Performing Test have_m3dnow
-- Performing Test have_m3dnow - Success
-- Performing Test have_msse4.2
-- Performing Test have_msse4.2 - Success
-- Performing Test have_mpopcnt
-- Performing Test have_mpopcnt - Failed
-- Performing Test have_mmmx
-- Performing Test have_mmmx - Success
-- Performing Test have_msse
-- Performing Test have_msse - Success
-- Performing Test have_msse2
-- Performing Test have_msse2 - Success
-- orc overruled
-- Performing Test have_msse3
-- Performing Test have_msse3 - Success
-- Performing Test have_mssse3
-- Performing Test have_mssse3 - Success
-- Performing Test have_msse4a
-- Performing Test have_msse4a - Success
-- Performing Test have_msse4.1
-- Performing Test have_msse4.1 - Success
-- Performing Test have_mavx
-- Performing Test have_mavx - Failed
-- Available arches:
generic;64;3dnow;abm;mmx;sse;sse2;sse3;ssse3;sse4_a;sse4_1;sse4_2
-- Available machines: generic;sse2_only;sse2_64;sse3_64;ssse3_64;sse4_1_64
-- Did not find liborc and orcc, disabling orc support...
-- Using install prefix: /usr/local
-- Configuring done
-- Generating done


Tests output:



Running 77 test cases...
Using Volk machine: sse4_1_64
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a
sse4_1 completed in 1.5e-05s
sse completed in 5.5e-05s
generic completed in 1.4e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16ic_deinterleave_real_8i_a
ssse3 completed in 7e-06s
generic completed in 8e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_16ic_deinterleave_16i_x2_a
ssse3 completed in 1.7e-05s
sse2 completed in 1.1e-05s
generic completed in 2.1e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_32f_x2_a
sse completed in 7.4e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16ic_deinterleave_real_16i_a
ssse3 completed in 6e-06s
sse2 completed in 8e-06s
generic completed in 9e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_16ic_magnitude_16i_a
sse3 completed in 0.000132s
sse completed in 0.00015s
generic completed in 0.000218s
Best arch: sse3
RUN_VOLK_TESTS: volk_16ic_s32f_magnitude_32f_a
sse3 completed in 0.000113s
sse completed in 0.000107s
generic completed in 2.7e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_s32f_convert_32f_a
sse4_1 completed in 1.2e-05s
sse completed in 2e-05s
generic completed in 1.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_s32f_convert_32f_u
sse4_1 completed in 1.2e-05s
sse completed in 2.1e-05s
generic completed in 1.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_convert_8i_a
sse2 completed in 4e-06s
generic completed in 6e-06s
Best arch: sse2
RUN_VOLK_TESTS: volk_16i_convert_8i_u
sse2 completed in 6e-06s
generic completed in 6e-06s
Best arch: sse2
RUN_VOLK_TESTS: volk_16u_byteswap_a
sse2 completed in 6e-06s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_accumulator_s32f_a
sse completed in 2.5e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_x2_add_32f_a
sse completed in 1.9e-05s
generic completed in 2.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a
sse completed in 5.5e-05s
generic completed in 7.2e-05s
offset 4 in1: 0.387495 in2: 0.103868
offset 6 in1: 0.201248 in2: -0.203787
offset 8 in1: 0.549574 in2: 0.499452
offset 12 in1: 0.00829957 in2: 0.00535752
offset 14 in1: 0.139478 in2: 0.0225341
offset 23 in1: 0.440276 in2: 0.620457
offset 24 in1: 0.103921 in2: 0.238003
offset 25 in1: 0.126775 in2: 0.290342
offset 29 in1: 0.135211 in2: -0.115313
offset 30 in1: 0.375913 in2: 0.478058
volk_32fc_32f_multiply_32fc_a: fail on arch sse
Best arch: sse
/Users/carlesfernandez/Documents/workspace/gnuradio/volk/lib/testqa.cc:25:
error in "volk_32fc_32f_multiply_32fc_a_test": check
run_volk_tests(volk_32fc_32f_multiply_32fc_a_get_func_desc(), (void
(*)())volk_32fc_32f_multiply_32fc_a_manual,
std::string("volk_32fc_32f_multiply_32fc_a"), 1e-4, 0, 20460, 1, 0) ==
0 failed [true != 0]
RUN_VOLK_TESTS: volk_32fc_s32f_power_32fc_a
sse completed in 0.000989s
generic completed in 0.000985s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_calc_spectral_noise_floor_32f_a
sse completed in 1.8e-05s
generic completed in 4.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f_a
sse4_1 completed in 0.000503s
sse completed in 0.000503s
generic completed in 0.000503s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32fc_x2_conjugate_dot_prod_32fc_u
generic completed in 1.6e-05s
sse3 completed in 1.5e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_deinterleave_32f_x2_a
sse completed in 1.8e-05s
generic completed in 2.3e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_deinterleave_64f_x2_a
sse2 completed in 4.4e-05s
generic completed in 3.8e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_s32f_deinterleave_real_16i_a
sse completed in 2.7e-05s
generic completed in 2e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_deinterleave_real_32f_a
sse completed in 1.1e-05s
generic completed in 1.5e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_deinterleave_real_64f_a
sse2 completed in 1.5e-05s
generic completed in 1.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc_a
generic completed in 8.8e-05s
sse_64 completed in 2e-05s
sse3 completed in 2.5e-05s
sse4_1 completed in 2.6e-05s
Best arch: sse_64
RUN_VOLK_TESTS: volk_32fc_index_max_16u_a
sse3 completed in 5e-06s
generic completed in 1e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_s32f_magnitude_16i_a
sse3 completed in 3.3e-05s
sse completed in 3.1e-05s
generic completed in 8.1e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_magnitude_32f_a
sse3 completed in 2.2e-05s
sse completed in 2.1e-05s
generic completed in 2.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_x2_multiply_32fc_a
sse3 completed in 2.4e-05s
generic completed in 0.000201s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_s32f_convert_16i_a
sse2 completed in 7e-06s
sse completed in 2.3e-05s
generic completed in 1.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_16i_u
sse2 completed in 1e-05s
sse completed in 2.3e-05s
generic completed in 1.8e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_32i_a
sse2 completed in 8e-06s
sse completed in 2e-05s
generic completed in 1.4e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_32i_u
sse2 completed in 1.5e-05s
sse completed in 2.3e-05s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_convert_64f_a
sse2 completed in 1.4e-05s
generic completed in 1.6e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_convert_64f_u
sse2 completed in 2.1e-05s
generic completed in 1.6e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_convert_8i_a
sse2 completed in 7e-06s
sse completed in 2.1e-05s
generic completed in 2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_8i_u
sse2 completed in 9e-06s
sse completed in 2.5e-05s
generic completed in 2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32fc_s32f_power_spectrum_32f_a
sse3 completed in 1.8e-05s
generic completed in 1.5e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_x2_square_dist_32f_a
sse3 completed in 3e-06s
generic completed in 4e-06s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_x2_s32f_square_dist_scalar_mult_32f_a
sse3 completed in 6e-06s
generic completed in 6e-06s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_x2_divide_32f_a
sse completed in 2.3e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_x2_dot_prod_32f_a
generic completed in 0.000351s
sse completed in 0.000112s
sse3 completed in 0.000121s
sse4_1 completed in 7.5e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32f_x2_dot_prod_32f_u
generic completed in 0.000942s
sse completed in 0.000477s
sse3 completed in 0.000267s
sse4_1 completed in 0.000395s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_index_max_16u_a
sse4_1 completed in 1.6e-05s
sse completed in 2e-05s
generic completed in 7e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32f_x2_s32f_interleave_16ic_a
sse2 completed in 1.2e-05s
sse completed in 3.6e-05s
generic completed in 2.7e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_x2_interleave_32fc_a
sse completed in 1.4e-05s
generic completed in 1.9e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_max_32f_a
sse completed in 1.1e-05s
generic completed in 1.8e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_min_32f_a
sse completed in 1.8e-05s
generic completed in 2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_multiply_32f_a
sse completed in 1.4e-05s
generic completed in 1.3e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_normalize_a
sse completed in 6e-06s
generic completed in 5e-06s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_power_32f_a
sse4_1 completed in 0.000523s
sse completed in 0.000521s
generic completed in 0.000521s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_sqrt_32f_a
sse completed in 2.5e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_stddev_32f_a
sse4_1 completed in 8e-06s
sse completed in 6e-06s
generic completed in 2.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_stddev_and_mean_32f_x2_a
sse4_1 completed in 9e-06s
sse completed in 6e-06s
generic completed in 2.1e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_subtract_32f_a
sse completed in 1.2e-05s
generic completed in 1.3e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x3_sum_of_poly_32f_a
sse3 completed in 6e-06s
generic completed in 1.7e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32i_x2_and_32i_a
sse completed in 1.2e-05s
generic completed in 1.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32i_s32f_convert_32f_a
sse2 completed in 7e-06s
generic completed in 1e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32i_s32f_convert_32f_u
sse2 completed in 1.1e-05s
generic completed in 1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32i_x2_or_32i_a
sse completed in 1.2e-05s
generic completed in 1.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32u_byteswap_a
sse2 completed in 1.3e-05s
generic completed in 2.2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_convert_32f_a
sse2 completed in 1.1e-05s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_convert_32f_u
sse2 completed in 1.9e-05s
generic completed in 1.6e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_64f_x2_max_64f_a
sse2 completed in 2.4e-05s
generic completed in 2.7e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_x2_min_64f_a
sse2 completed in 2.2e-05s
generic completed in 2.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64u_byteswap_a
sse2 completed in 2.7e-05s
generic completed in 2.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_8ic_deinterleave_16i_x2_a
sse4_1 completed in 9e-06s
generic completed in 0.000114s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_s32f_deinterleave_32f_x2_a
sse4_1 completed in 1.4e-05s
sse completed in 7.2e-05s
generic completed in 9.5e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_deinterleave_real_16i_a
sse4_1 completed in 5e-06s
generic completed in 3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_s32f_deinterleave_real_32f_a
sse4_1 completed in 8e-06s
sse completed in 5.3e-05s
generic completed in 4.8e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_deinterleave_real_8i_a
ssse3 completed in 5e-06s
generic completed in 5e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_8ic_x2_multiply_conjugate_16ic_a
sse4_1 completed in 1.9e-05s
generic completed in 0.000318s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_x2_s32f_multiply_conjugate_32fc_a
sse4_1 completed in 2.2e-05s
generic completed in 0.000356s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_convert_16i_a
sse4_1 completed in 5e-06s
generic completed in 3.3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_convert_16i_u
sse4_1 completed in 6e-06s
generic completed in 3.3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_s32f_convert_32f_a
sse4_1 completed in 7e-06s
generic completed in 4.8e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_s32f_convert_32f_u
sse4_1 completed in 1.3e-05s
generic completed in 4.9e-05s
Best arch: sse4_1

*** 1 failure detected in test suite "Master Test Suite"


Best regards,
Carles

_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio


_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]