[...] with the interactive play - this is an
area we did not touch on before - a compromise could be achieved (again
using the musical onset markers) by delaying each interactively played
note/sample just enough to make any and all notes played to be delayed
by the same consistent amount [...]
I doubt that this idea would be very popular real-time control... :-) People have put enormous effort into bringing sound output latency down to low and acceptable levels.
> And that for a single note-on, many different samples (with different
> attack phases) could be started simultaneously.
It would be a meaningful musical discussion to hash out why this needs
to be done at all, and whether in those cases the sound resulting from
the ensemble of those samples played together ends up having a musical
note onset moment instead of the individual samples:
In SoundFonts (and many other sample bank formats), the sound you hear through the speakers after a note-on event can (and often is) the combination of many different samples. Those samples are mixed together depending on values passed through MIDI messages. How those samples are mixed together can depend on the note value, the note-on velocity, the current time (oscillators) and other many other MIDI messages. All of these parameters can affect many different aspects of this mixing process: static volume, pitch, note-on delay, length of the attack phase, sample offset, speed of oscillators, to name just a few. And before and/or after they are mixed together, the resulting sound can then pass through different filters that shape the sound even more. And the reason for this complicated and expressive system is exactly the point you raise below:
if one tries to
reproduce the sound of a preexisting musical instrument, the sound of
that natural instrument when played has a musically meaningful onset -
so by extension, reproducing the sound of that instrument should have
conceptually the same onset, even if for whatever reason the
reproduction of the sound of the instrument ends up getting actually
constructed from multiple "samples" played in parallel [...]
Yes, this is exactly it. But one important point you don't mention here is: most musical instruments do not have fixed sound characteristics. Their sound and - most importantly for this discussion - their attack phase and shape and length of onset transients depend on how you play the instrument. And your playing style and many other aspects also affect if and how well defined the border between transient phase and "musically meaningful sound" is. I'm sure I could have long debates with fellow musicians about when that "musically meaningful sound" of a particular instrument actually starts.
So in my opinion, your initial premise for this discussion - that there is one or a limited number of musically meaningful note-onset durations measured in sample offsets that could easily be compensated - is flawed. For very simple cases and a very narrow musical style it might be ok. But I can't imagine a general system we could implement that achieves this "just-do-the-musically-meaningful-correct-thing" effect you are after when it is applied to the wide range of sounds and music styles that SoundFonts, MuseScore and similar software is used to create today.
Yes, there are tools like NotePerformer that attempt to solve this problem, and they seem to do quite a good job at it. But NotePerformer is not a general sample-bank format like SF2. It's not even a synthesizer like FluidSynth. It is a synthesizer fused with a sequencer fused to a specific(!) set of samples and a predetermined set of performance rules. It has its own opinion on how the MIDI notes that you write in your notation editor should be articulated and performed. It takes your input, adds its own musical playing style and performs your music in a certain way, similar to a human musician. They talk about having analysed lots of classical pieces to train their model. I like the idea and I imagine it being a really useful tool if you compose classical music, orchestral film scores or similar music.
But how does NotePerformer sound if you use it to play rock music, reggae, an irish jig, a scottish hornpipe, a french two-time bourree or a swedish polska? Or something not rooted in the western musical world? I haven't tried it, but my guess is that the output would not be "musically meaningful" at all in those contexts. I imagine it being like listening to a classically trained violin player performing an irish jig - it just doesn't sound right. The phrasing, the rhythm, the groove (i.e. tiny shifts in note onset timings) you need are totally different to classical music.
Don't get me wrong: I would love to have an open-source NotePerformer. Ideally one you could train yourself so that it can learn many different performance styles. I would even be interested in participating in creating one, it sounds like a really interesting project! But you are barking up the wrong tree here. FluidSynth is not the right software to implement your idea, and I don't think you will get close to your goal by creating an extension to the SoundFont format. If I have understood your goal correctly, then you need to start from a blank slate.