My understanding of FSK is that the most important factor is that the frequencies are orthogonal. It helps me to go through the math of creating the constellation points.
The basis functions in FSK are cosines at whatever frequencies your choose, let's go with binary FSK.
b1 = cos(2*pi*f1 * t);
b2 = cos(2*pi*f2 * t);
s1 = b1;
s2 = b2;
The constellation can be plotted by the zero-lag correlation of the symbols with the basis functions.
symbol1 = [integral(0, bit time, s1.*b1 ), integral(0, bit time, s1.*b2 ) ]
symbol0 = [integral(0, bit time, s2.*b2 ), integral(0, bit time, s2.*b2 ) ]
Orthogonal frequencies will be at right angles (obviously), anything else will have an angular distance less than 90 degrees, which means they are "closer", so probability of bit error is worse.
Orthogonal is basically equivalent to having an integer number of cycles in the bit time.
So Fd is not so important, but orthogonality is. Somewhat counter-intuitive, but try giving the math a shot and playing with it in whatever software you use for this kind of stuff (octave, matlab, python, whatever).
Changing f2 to a decimal will cause non-integer number of cycles for the 2nd symbol, and you can see the angle change. It might help to make the time array something more useful, but hopefully it's good enough to see the concept.
Copy/paste error, so I'll comment on what I thought of when I read this.
Granted I've only read a couple of textbooks with discussions on FSK I've only seen P(BE) plots and the like focusing on E_b/N_0. The equations they give will assume that the symbols are orthogonal.
I think as long as Fd will cause an integer number of cycles in the bit time, you're in business.