I'm surprised no one responded to this. My group has had some experience with this.
We found latency can be extremely large for packetized transmits, especially if the packets are small. Without modification, there is a minimum buffer size GNURadio sets regardless of the buffer size settings in blocks (or top_block). This means packets will build up in the output buffers of every block b/w source and sink. Even the solution collecting packets will still have large latency because the output buffers are again filled with packets before being sent out.
What you might be able to do.
1. Modify the minimum buffer size. This is non-trivial, the correct implementation will maintain the minimum buffer size for efficiency but essentially manage a sliding window across the larger buffer, whereby the window size is the desired buffer size. This is prob. the solution to #3.
2. Use PDUs everywhere such that you can propagate packets at the generation rate and not build up a queue before the sink.
We were able to get latency down with option 1. I asked my colleague who performed experiments with a USRP N210 on a moderately powerful laptop. The flowgraph was usrp source to usrp sink and he correlated the signal running into and out of the USRP. With real-time scheduling and high thread priorities and small buffer size he was able to get 1 ms in-to-out latency @ 2 Msps sample rate.
One other tip: the PC clock and USRP clock are not in sync. If you want precisely timed transmissions, you need something monitoring the internal clock of the USRP. You may be able to do this by querying the time periodically. I believe you can do this through the command interface: you request a time tag to be output on a usrp source stream. You can then keep time by counting samples from the source tag.
Good luck, I hope you make some progress and report it here. I for one would be very interested.
Jared.