I've been working on a vectorized version of nlfilter (which can be used for non-linear filtering of pixel neighborhoods). My version is attached (along with a driver). Depending on the type of processing I've found it to be 10 - 1000 times faster than nlfilter. It gets the speed boost at the cost of using more memory. It is not completely compatible with nlfilter. The processing function is different. For processing an image/matrix with P elements over an (m x n) neighborhood, the nlfilter function must accept an argument of size (m x n). The function is called P times by nlfilter. The processing function used by nlfilter2 must accept an argument with (m x n) rows and P columns. It is only called once. Each column is a neighborhood of a pixel in left-to-right, top-to-bottom order.
It is not as robust or general as nlfilter, but I expect it can be made so. I have a different version of this function which allows you to perform neighborhood processing only over selected pixels (selected by a logical index matrix). It also allows for other types of padding (both nlfilter and nlfilter2 only use zero-padding). Let me know if you are interested. It is currently even in rougher form than nlfilter2 though.
Anyway, I hope someone finds this useful. I'd appreciate (constructive) feedback.
Tony