Thanks Kieran, it is very much a relief. Thanks to everyone who helped, the interaction helped get us focused looking in the right places instead of the wrong ones.
In our porting layer we were doing nothing (except spit out a debug message that we didn't see anywhere) about the a mailbox overflow in sys_mbox_post.
Pbufs were getting dropped right there during burst periods -- I believe when the OOSEQ processing got the packet it was waiting for it unloaded all the queued up data on the mailbox. My theory, anyway, it could be a different burst scenerio, though. Either way, our porting layer hurt us.
A better solution might be to have sys_mbox_post return a status which can be used by the caller to prevent such a resource leak (I would think sys_mbox_post can't do it since it's unaware of what type of resource is being put in the mailbox). The caller could do something more intelligent with the knowledge of a failure, free the resource, set an error code that filters back to the user, increment a stat that indicates data loss, there are a bunch of possibilities.