Hello Yang,
it is true that each block will have its own thread in GNURadio, but, in any case, your resources are limited when it comes to CPU processing. What seems to be happening is that one of your paths is demanding resources and, when it gets them, leaves the other path without enough resources to perform its tasks. You will have to identify the critical paths of your signal processing and optimize them in order to have enough resources for both TX and RX to run simultaneously.
An easy first step to identify where most of the load is being done is by using tools such as "htop", where each thread (created for each block) can be listed, along with the percentage of CPU that they are using; high CPU percentage might be a call for optimization.
Assuming that you are writing your own blocks in this application. there are multiple ways to further analyze/optimize the performance of your processes. Maybe some of the functions that you are using can be optimized, or replaced with VOLK [1]. You could also turn on the performance counters [2] and use the information to profile your blocks sample-wise. In addition, using performance profiling tools such as "perf" or "valgrind" might also give you a good insight to where the most processing is being done in your application.
In addition, you can tweak your system a little in order to get more performance from your host machine. Please have a look at the system configuration page [3] of our manual and specify priority scheduling as well as setting your CPU governors to "performance" (in case you haven't done that yet). This might also boost the performance and reduce the problems that you are encountering. The rest of that configuration page [3] might come handy for further tweaking.
Regards,
-Nicolas