First off I definitely want to encourage investigations of this sort: so even though I have some thoughts similar to Sylvains/Tom's about whether VOLK is the right place to do this, I definitely want to encourage *trying* this, since you never know - we could be entirely wrong about whether or not this will work. The only way to know for sure is to try it.
That said: I do think there are way *within* VOLK to deal with the issue of the input size (i.e. vector size) having a large impact on performance - namely the custom dispatcher. This is a concept that exists in VOLK, but has larger gone unnoticed because by in the large the default dispatcher does a good (or at least, good-enough) job at selecting the proper proto-kernel. For off-loading concepts such as utilizing GPU's via OpenCL, a custom dispatcher *could* select the appropriate proto-kernel (including directing the OpenCL implemention to select a CPU vs. GPU-based implementation, if multiple OpenCL implementations are available) on a per-'work()' call from the GNURadio scheduler. In other words, instead of relying on volk_profile to select the best proto-kernel for all calls to that particular volk kernel, the dispatcher could have something more akin to the FFTW 'wisdom' where for different sizes of matrices/vectors, different proto-kernels are called (including the CPU SIMDized call, instead of the OpenCL call for smaller input sizes, etc.).
Anyways - I definitely think this is something that should be looked into more, and if you are interested in pursuing this as - either as a GSoC project or otherwise, I would definitely encourage it, as well as offer assistance/advice where I can.