This seems like a pretty complicated problem to tackle for all scenarios. However, I think it wouldn't be too hard to try and handle the majority of cases.
The perceived volume of an instrument depends on at least the following factors:
* The number of simultaneously synthesized samples for a given note event
* The amplitude of the sample data
* The velocity of the note on event
* The attenuation at the instrument level (local or global zone)
* The attenuation at the preset level (local or global zone)
* Low pass filter settings
* Reverb level
* Chorus level
* Panning (left/right volume)
* Volume envelope
* Modulation envelope to filter cutoff
As you can see there is a lot that affects the volume which isn't volume specific. Whats more, many of these are likely to vary in the affect they have on volume from one SoundFont synthesizer to the next.
I would approach this with several possible solutions and decide which one would be the best for your particular requirements. The flexibility of SoundFont files provides a lot of ways to do things, which could affect the volume in uncommon ways (for example mucking about with modulators).
1. Simple volume calculation
- Calculate perceived instrument zone volumes from the sample data and loop parameters
- Adjust volume attenuation on individual instrument zones to normalize note volume across the instrument
- Possibly adjust preset level volume attenuation to adjust for other factors which affect the entire instrument
2. Adjust by calculated volume
- More in depth version of #1 for calculating perceived output volume. Factor in other parameters (sample layering, low pass filter, reverb, chorus, etc).
3. Adjust by synthesized output volume
- Utilize FluidSynth in an automated fashion to synthesize note events, measure output and set attenuation accordingly.
If I was tasked with doing this, I'd likely use libInstPatch and it's Python binding. For #3 it might be difficult to utilize FluidSynth from Python, so that might be best tackled with C code using libInstPatch's native API.
I would probably start with something like #1 and see how well it does in practice. Then incrementally add additional factors into the perceived volume calculations, until a generally satisfactory result is found.
However, solution #3 would remove the necessity to calculate perceived output volume from all the SoundFont parameters and just measure the output directly from the resulting synthesis. This could be done in faster than realtime. This may end up actually being the easiest route in some ways, though a lot of FluidSynth specific volume affect behavior would come into play, which may not correlate well with other SoundFont synthesizers.
Hopefully you have found these thoughts useful.