[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Octal-dev] Fourier Transforms and samples
From: |
n_nelson |
Subject: |
Re: [Octal-dev] Fourier Transforms and samples |
Date: |
Mon, 22 May 2000 11:38:52 -0700 |
Steve Mosher wrote:
----
There ought to be sound cards that do this... to boot, clusters are
impractical for home use, not so bad for studio use (good idea,
actually), and horrible for set use. It would be -sweet- to see a
cluster devoted to processing FFTs -- it could be optimized to do
simply that, and a higher load could be placed on it. It would be
more useful to have it perform all your DSP... then again, wouldn't
it be more efficient (and cheaper) to rip apart a PSX2 and use the
sound synthesizer? I still want to see Octal ported to it =). I
think it would be the best platform thus far for such a program...
----
In trying to get up-to-speed on the Octal-DSP situation I reviewed
many of the links on www.dspdimension.com/html/links.html. But in
reviewing those many links, the general approaches, in my opinion,
divide into two basic themes: (1) digital processing of sound by
emulating analog methods and (2) digital sound synthesis using
Fourier-description-like methods. This division of approaches
derives from the prior analog limitations of music analysis toward
the desired Fourier description on one hand (or interesting
analog-derived effects), and on the other, the essentially Fourier
based generating methods used, in say, musical scores. A sheet of
music identifies the fundamental frequencies, their start and stop
times, amplitudes, and harmonic formants (defined according to the
instrument selected to play the passage) and designated method of
playing. We have: DSP effects applied to sampled sound with some use
of Fourier description approximation, and digital synthesizer
methods using various Fourier description formats (of which I expect
Octal is one--what is the web site for Octal?). From my perspective,
these are approximations (perhaps some very good to precise) of (1)
the forward Fourier transform (analysis) and then (2) the reverse
transform (synthesis).
And it may be the case that some approaches are not interested in or
related to Fourier descriptions, but the primary emphasis should be
Fourier descriptions because our human awareness of sound is
essentially Fourier because the ear performs that transform and
provides a Fourier description to the brain. We hear, in our brain
or awareness, Fourier descriptions. And it is that medium we should
concentrate on.
On the DSP analysis side, we never seem to get to a point that gives
a high quality control over the Fourier description because of
either: the significant computational requirement, the only very
approximate result (distortion), or the lack of simplification or
additional translation toward music score or common synthesis
formats. On the synthesis side, we have somewhat blocky, unnaturally
sounding tools where the result may sound electronic or is unable to
address the analysis description because of a significant gap
between the analysis description and common synthesis
formats/descriptions upon which our tools are designed. I.e., there
is an undesirable gap between the analysis and synthesis
descriptions. And then as that gap is closed, additional tools will
be required/desired to manipulate the eventual, optimal description.
My interest concentrates on the recording-studio where the real-time
issue can be reduced. Whereas Steve's interest appears toward
real-time usage. In considering the real-time aspect with utilizing
pre-analyzed descriptions, it should be noted that our music scores
or common speaking of music is a simplification based on
fundamentals (the lowest frequency) and other information
indexes/abbreviations. We would then need to perform the analysis
toward obtaining that simplification such that a real-time analysis,
say, of an instrument could proceed quickly to the indexes--e.g.,
use routines to get the fundamental frequency while ignoring the
other frequencies--and then use those indexes to generate a complete
sound based on the pre-analyzed description. Once the fundamental
frequency is obtained, the harmonic formant, decay envelopes, and
various other description components would be generated.
The recording studio and real-time approaches are complementary in
that the real-time approach is essentially the recording-studio
synthesis portion with a minimal analysis portion. For real-time we
could do the analysis portion in the studio to obtain the
pre-analyzed real-time-use descriptions. In the studio, we will want
to use many of the real-time techniques in the synthesis portion.
The primary objective is to obtain a high quality but maximally
simple Fourier description in the analysis phase that will bridge
easily to the common techniques used in the synthesis phase. I will
detail my general program in the following, and if it is of any
useful merit, I hope proper credit will be given.
Stephan Sprenger at www.dspdimension.com/html/pscalestft.html
detailed the STFT procedure toward obtaining the precise frequencies
that become a key requirement of the hoped-for Fourier description.
What does not seem to be commonly realized, likely because of the
thorough, and almost necessarily practical use of the FFT on
historically low-powered computers, is that once the primary
frequency nodes have been estimated, as Stephan Sprenger
illustrates, variations on the Complete Fourier Transform algorithm
(CFT in the following), which is commonly considered too slow, can
be applied where the computational requirement is linear or
proportional to the number of frequency nodes and not exponential in
sample size. And I suspect it will be more computationally efficient
to increase the frequency determination precision after an initial
STFT determination using the CFT as against using increasingly
overlapping STFTs. I.e., if you know the local range of the expected
frequency, just run a convergent sequence of individual frequencies
using the CFT.
Even though the FFT (core algorithm of the STFT) has a lower
computational load (n log n) for a given number of samples n, the
FFT computes for _every_ discrete frequency determined by the number
of samples. Whereas a CFT can be limited (LCFT) to look at only
_one_ frequency at a time such that the resulting additional load is
a function of the number of frequency nodes and _not_ the number of
samples.
The LCFT obtains an advantage on a per-frequency basis in that it
can be tuned to an arbitrary fractional frequency. And it can be
tuned (convergent methods applied) to other functions such as exact
frequency start time, amplitude envelope, and frequency movement.
The output of the FFT is a sequence of Fourier description blocks
(possibly simultaneous sequences of different length blocks) such
that the description is dense and highly time divided. And this FFT
format must be maintained if a reverse FFT method is used for
synthesis. However I expect that in most common synthesizers,
individual frequencies are generated and then added together, as in
a reverse, LCFT, as against attempting a reverse FFT, primarily
because, again, the load is a function of the number of frequencies
and not sample size.
The output of the LCFT will also be more dense and time divided than
we would like which requires a further simplification procedure.
E.g., one LCFT frequency description over a small length of time may
be easily combined with subsequent descriptions giving a longer
envelope that becomes a single frequency description. Frequency
harmonics are identified that begin and decay together and these may
be assembled into sound objects or notes from a particular
instrument. The coding for this simplification procedure represents
a kind of lossy compression, or finding of a smaller best fit from
AI (pattern recognition). This kind of compression then provides a
simplified Fourier description that would be used in synthesis.
In recap: use the FFT to obtain the nodes of approximate frequency,
use the LCFT to obtain the precise descriptions, and use AI
compression to obtain the simplified descriptions. As an additional
note, the entire analysis-synthesis sequence can be bounded within
an (AI) optimization sequence where the synthesis output (the sample
stream) is subtracted from the input, and convergence factors are
selected for: minimum distortion (smallest combined output signal),
smallest simplified description, and smallest computational load.
Also different kinds of music or sounds, as in the real-time
scenario, can be identified in the analysis phase to tune/minimize
that computational load.
Neil Nelson address@hidden
[Octal-dev] OCTAL and literate programming, Dave O'Toole, 2000/05/19
Re: [Octal-dev] OCTAL and literate programming, Steve Mosher, 2000/05/22