Given that floats are faster on ARM devices (and probably older x86 devices as well), I think it makes sense to switch back to using single point precision by default.
I don't really care too much. Double precision needed real world testing in the past, and now I'm confident enough with it. We may want to suggest in the build system different optimal defaults for each architecture instead a single default for all. This is non-trivial because issues like cross-compiling, usual for ARM development.
I'm trying to verify this theory,
Automatic denormals to zero doesn't prevent FP operations being slower when
such numbers are involved. The problem is the larger CPU usage if there are FP
exceptions, which happens in all CPUs except modern Intel processors.
I'm not sure which part you don't buy: that there are still some problems with denormals or other FP exceptions in the DSP code? or that denormals are a problem in most architectures? The former is pretty easy: the FPE check build option as I suggested. The latter is pretty well documented on the Net, I think.
Maybe it's something else, like cache misses or so?
Sure, there may be several additional factors. In my opinion, a good strategy is to walk one step at a time.