Difference between revisions of "Signal analysis, modeling, processing"

From Helpful
Jump to: navigation, search
m (Practicalities - alternatives, interpretation)
m (Practicalities - alternatives, interpretation)
Line 836: Line 836:
and aside from probably getting a little more error because fewer samples,
and aside from probably getting a little more error because fewer samples,
we must interpret the result completely differently.
we must interpret the result completely differently.
(Note that the more information → more buckets comes basically from the band limited nature: more information to construct means more elements within the bands that can be represented means more elements required to reconstruct it)

Revision as of 22:21, 9 June 2019

Template:Audio and signal processing

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(See also the glossary)


Visualization types

Transforms; the relation between waveforms and frequency content

...often for converting between time domain and frequency domain.

Fourier transform

Note: If you like math, you will prefer other explanations. This one is going more for intuition, context, and footnotes of practice. (Well, ideally)

Basic idea, and some uses

Jean Baptiste Joseph Fourier figured that any periodic vibration can be constructed as a weighed sum of an infinite series of number of harmonically related sinusoids. Where harmonically related means 'all of these sines have a frequency that are multiples of a base frequency.

So Fourier analysis does that: it takes a signal (often a time series), and expresses it as how much of each such periodic component there is.

Mathematically, the result of this analysis can be seen as magnitude and phase for each frequency component.

Magnitude is the amount presence of this frequency
Phase is mostly necessary to reconstruct it accurately, intuitively to shift each part to the right place

When you have both, the fourier-space data is actually entirely equivalent information to what it came from (...to within floating-point errors).

Some applications rely on this equivalence to do something that just happens to be easier to do in one space or the other.

...whereas e.g. a spectrogram visualization throws away the phase data, because it's visualizing how much of each frequency component there is (s just need the magnitudes).

Fourier synthesis is the reverse, taking frequency-space information, and synthesizes the (often time-series) data it represents.

Synthesis is sometimes seen by itself, doing signal generation, such as in mimicking some types of sounds.

More frequently it is combined with Fourier analysis, often to do do filtering that is much easier to express in fourier space.

Fourier implementations can be used to assist some other calculations, often because it is mathematically related to convolutions.

For example, you can do large convolutions more efficiently (than naive convolution implementations), because the convolution theorem implies that element-wise multiplication in the frequency domain is equivalent to convolution in the time domain.

Practicalities - constraints

Bad words: infinite

Infinite sounds bad for real world application.

Not as bad as you may think.

For discrete data, the amount of sinusoids necessary is strictly bound by the input size.

...because discrete data is implicitly band-limited - it cannot represent anything higher.

...which just sidesteps the potential issues, in part because you faced those earlier: If that discrete data is a sampling of something that did contain something higher, you've now either sampled it poorly (possibly introducing some nasty artifacts, primarily aliasing) or you've filtered it out before it got to the sensor.

For example, anything made to digitally record sound will avoid aliasing by doing filtering, taking out frequencies higher than the ADC is looking at. (actually there are some variants of implementation. Historically it might have been a physical filter, these days it's often part of the supersampling that nicer DACs do anyway)

If it didn't do that filtering, the data would still be band limited, but it would probably include a little aliased signal (not very much because most microphones barely respond over ~15kHz, and the world is often fairly quiet in this range anyway).

Bad words: periodic

The word periodic turns out to be the larger problem.

A fundamental property/assumption of the FT is that it essentially assumes the entire signal is a single infinitely repeating chunk (not exactly, but intuitively true enough).

Even for very periodic signals, even a perfect sine wave, perfect periodicity happens only when the wavelength is an exact multiple of the window size. In other words, basically never.

If you intuit the FT as a tuning fork for each output bucket, then it's still mainly the right (closest) bucket that responds the most, but the adjacent few will do so too. This is one reason for spectral leakage.

Another practical problem is that, between the first and last sample in your data, there will usually be a large step in values.

Which is a slope discontinuity just like a large step in the middle of your data would be. It will try to model this as best it can, which it will do with all the frequencies - this is a problem with any pulse or step, which is also spectral leakage. (This is vague description, find the math if you care)

Since this one is not even actually present in your data, it's effectively wrong, though note that there's only so much energy in it.

If just showing the spectrum, you've muddled it a tiny bit. If doing FT space filtering, you've introduced new signal (and exactly how much depends on implementation details).

Another problem is that the longer a sample is, the likelier any real-world signal is to contain something non-periodic. Just because most real signals are not pure and perfectly behaved.

Say, 0.1 seconds from a piece of music is likely to mostly just contain a sustained note, or silence, or something noisy, or something else somewhat specific. A 10 second chunk? Not so much. A 100 second chunk? Basically not.

Sure you can analyze these well enough, in that synthesis will reconstruct the same thing. But the less periodic the content really is, the less intuitive the frequency-space data is to interpret or manipulate.

This is related to that you get no time information. The basic FT gives output that summarizes the overall frequency content of the entire input.

...and phase content that puts things in the right place in time, which is very much central to reconstruction because of the way it combines. In fact, the phase is arguably more information-dense than the ampitudes. It's just so entangled that it is almost impossible to read off anything useful from the phase information.

If you want frequencies over time information, such as a spectrogram[1], you probably want the STFT (short-time FT). This is a controlled tradeoff between time resolution and frequency resolution, by taking the FT of short chunks of signal, which both localizes and reduces the problems mentioned above (e.g. considering music in (overlapping) 0.1 second chunks).

It's not perfect, but it's good enough for many needs. For example, music and other sound analysis is typically done this way.

On 'Band limited'

Your data must be band limited, meaning it must have no frequency content beyond a cutoff band.

With digital samples, this is mostly something you need to think about at the sampling step.

Once you have discrete-interval data, it's implicitly already band-limited. It can no longer represent frequencies beyond its Nyquist frequency.

That doesn't always mean your sampled data is exactly what you wanted.

For example, if the band limit of your recording is lower (because low sample rate) than the frequency of the sound you recorded, and you didn't filter it out, the physically present higher frequencies, they will be present in aliased form.

For sound recording, this is almost always solved by the recording device doing implicit filtering typically as an electronic lowpass (or sometimes via a supersampling ADC).

But in more generic hardware, it's something you need to think about.

The reason behind the requirement is that mathematically, Fourier analysis is guaranteed to converge only for band-limited signals.

Which is mostly just a mathematical property of the general theory related to the word 'infinite' in 'infinite sum of sinusoids'. (Without being band limited, such as for FTs on continuous mathematical functions, it would actually be infinite, or at least become a more complex issue.)

So for band-limited data, a finite sum is enough. In fact, there is a hard bound on the necessary amount of components.

(And for a lot of everyday sound, a modest amount of sums will already be a decent approximation)

For example, a sweater that shows moire patterns on TV is entirely band limited once on TV, but not quite what you intended to record, because the sampling was rougher than what was there. What you would generally prefer in this case is filtering before sampling so that it's fuzzy, instead of distracting.

Practicalities - common solutions/uses/alternatives

On windowing

The way people often use the Short-Time Fourier Transform (STFT), plus window function, will address the 'no time information', 'has to be periodic', 'likely step at the edge', and optionally the 'avoid losing too much information' issues together.

'Windowing' can refer to both taking a block of data at a time, and to applying an envelope within that window, typically called a window function.

In many applications (e.g. spectrograms, because STFT) you do both.

You suppresses the step-at-the-signal-edge discontinuity by suppressing the entire signal at the edge with the mentioned window function (envelope).

If you care to get the most out of your signal (or get representative amplitudes), then you will care that this lowers the signal strength before handing it to the FT.

If you were to partition the data (i.e. no overlap) and apply a windowing function to each, you wold effectively be ignoring part of it, so lose part of the signal. In spectrograms you might not care, but in low-signal-to-noise applications you may wish to overlap these blocks.

You choices:

  • choice of window size is a tradeoff between time resolution and frequency resolution (generic STFT stuff)
  • choice of window function offers a few different tradeoffs - roughly how large the main lobe of the frequency response is and how large the tail is.
  • ideal window overlap depends on the exact shape of the window function.
Exactly how much overlap is (mathematically) ideal depends on the characteristics of the windowing function - described in a few places, see e.g. "spectrum and spectral density estimation by the discrete fourier transform".
Secondarily on the computational resources you care to spend, as it's a diminishing-returns thing.

Data massaging and tradeoffs - what happens when you...
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
...choose different window sizes for analysis?
Interpretation: sound, images, etc.

Practicalities - alternatives, interpretation

Relevant to choice of library functions
On real and complex input/output, and symmetry
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
real    --FT--> complex
complex --iFT--> complex

and, not unusually,

complex --iFT--> real

The last is, in some ways, not special at all, just the same as the one before it but with the assumption that all imaginary output components happen to be zero. This often amounts to throwing them away, which is valid only if they actually were (which, under many controlled circumstances, they weill be, within rounding error of being zero).

So a DFT package will often have a number of variant functions.

The most generic DFT functions often take complex-valued input, and output bins for -f/2 through f/2. This because for complex input, the two parts are not redundant/symmetric.

However, most practical applications are real-valued input. You can turn that in complex-with-zero-imaginary to hand it into the just-mentioned function, but it would do some redundant calculation, and the output would be symmetic around 0.

As such, most packages also have a 'FT on real-valued input' function, which will be a little faster.

As FFTW points out, "In many practical applications, the input data in[i] are purely real numbers, in which case the DFT output satisfies the 'Hermitian' redundancy: out[i] is the conjugate of out[n-i]. It is possible to take advantage of these circumstances in order to achieve roughly a factor of two improvement in both speed and memory usage"

(Some will still give the redundant data, some will return bins for 0..f/2. (verify))

(If you care to look up the math, look for terms like heritian symmetry and conjugate complex symmetry)

TODO: add diagram of real input versus complex input http://stackoverflow.com/questions/6740545/need-help-understanding-fft-output

Similarly, when you have complex data with hermitian redundancy and want to inverse-FT it, the output is complex with zero imaginary component (within rounding error) or, if you (can) ask the libary, real output. This saves (only) a little calculation, some memory, and probably some code on your side.

"So what if you did

real --FT-> complex --iFT-> complex

and you notice the complex output has non-zero imaginary data?"

Well, if you did just that without altering the frequency-space data, then they'll be near zero, within rounding error.

But you probably did more, e.g. implemented a frequency-space filtering, then chances are you might have broken the conjugate symmetry doing so.

How much that actually matters depends on what you were doing.





On shift
On and even/odd-sized input
Output - value scale and units
Output - frequency bins
Output - bin width and position
Output - note on (non)linearity


Spectral leakage, window functions
(Temporal) Aliasing
Picket fence effect
Ringing (Gibbs phenomenon)

More math

Relations between fourier, convolution, and correlation


Multi dimensional FT

Variants and alternatives

See also

Wavelet transforms

Hartley transform

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Similar to the Fourier transform, which

  • transforms real data to real data
  • is its own inverse (aside from scaling)

The FHT (fast discrete HT) is not fundamentally faster to calculate than FFT variants that works on real data, although in specific cases there are more optimizations, so in simpler implementations, the FHT uses slightly less calculation than the FFT.

This (and the 'its own inverse' sometimes being convenient) mean you see the FHT in memory-limited hardware.

For example: http://wiki.openmusiclabs.com/wiki/ArduinoFHT

Related concepts

For a wider overview, you will want to distinguish between the various related terms, which include (roughly ordered from more general/abstract to more applied/discrete):

  • Z-transform
    • something of a generalization of the Discrete-time Fourier transform (DTFT, and not to be confused with the DFT)
    • Takes a discrete real-numbered or complex-numbered time-domain signal and calculates a complex frequency-domain representation (apparently making it a discrete variation of the Laplace transform)
  • Fourier Series
    • Usually refers to the idea that x(t) is the sum of an infinite number of sinusoids
  • Fourier series integral
    • Mathematical basis for Fourier analysis
  • Fourier Analysis
    • x(t) to {ak} (time series to frequencies)
    • (the effective opposite of Fourier Synthesis)
  • Fourier Synthesis
    • {ak} to x(t) (frequencies to time series)
    • (the effective opposite of Fourier Analysis)
    • Note that most Fourier transforms have an inverse to allow (re-)synthesis, e.g. iDFT, iFFT, etc.
  • Fourier Transform(s)
    • Can refer to
      • The result of a Fourier analysis (the frequency-domain information)
      • The method of converting to that domain, and usually back from its results (Fourier analysis, and synthesis)
      • The more mathematical definition of the Continuous Fourier Transform, that transforms a function of a real variable into another, the frequency domain. (in which both are continuous and unbounded(
      • A group of implementations for this transform
  • The Discrete Fourier Transform (DFT)
    • a specific, practicalized version of the (continuous) fourier transform, in that
      • it is finite-domain and for discrete-time functions
      • input function must be discrete - a real- or complex-numbered time series, often from sampling a continuous signal
      • only analyses frequency content required to approximate the given segment (unlike the DTFT)
  • Fast Fourier Transform (FFT)
    • Refers to a number of different algorithms to quickly calculate a DFT
    • ...since the straightforward calculation of DFT takes O(n2), and FFT takes O(n log n)
  • Short-term Fourier Transform STFT
    • Refers to the fourier transform (often specifically analysis) in which windowing is applied (in the forms of continuous theory, and/or discrete application)
    • ...for a controlled tradeoff between frequency resolution and time resolution.

Other related concepts and transforms:

See also

  • Zero Padding Does Not Buy Spectral Resolution, Aug 2004,


  • Maria Elena Angoletta, Fourier Analysis / Part 2: Technicalities, FFT & system analysis, Aug 2004,


  • Modulated complex lapped transform (MCLT)

Other frequency-related transforms

Cepstrum, Cepstral transform

Filtering theory

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Time-domain filters

Frequency-domain filters

Filters and analyses

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Frequency pass and stop filters


Filters out frequencies higher than a cutoff frequency.

Regularly used in analog form, e.g. before sampling sound in sound cards, in DSL filters, etc.

Note that blurring samples works as a lowpass filter, in sound, images, and elsewhere.

Anti-aliasing filters are regularly lowpass filters.

See also Low-pass filter


Filters out frequencies lower than a cutoff frequency.


Also known as band limit filter, notch filter, 'T-notch filter' and also 'band-elimination filter', and 'band-reject filter'.

The band that is filtered out is referred to as the stopband.


The band that is left out is referred to as the passband.

Can be seen as the combination of a high-pass and a low-pass.

Band-related tools


Subband coding

Separate out different frequency bands, to use different coding for each (for reasons of efficient transmission).


Equalization refers to sound processing altering the frequency content/envelope, for correction (e.g. of a recording room's acoustic) or to give sound a particular feel or focus.

Fairly common in software is the graphic equalizer, which adjusts something like 20Hz-20kHz with various sliders, in bands of certain width. There are also hardware equivalents, which may offer feedback detection, to avoid feedback caused by peak sounds (such as vocals) or bad setups (where microphones strongly pick up speaker sound).

Pre-emphasis refers to structurally amplifying some frequencies. For example, vinyl records are mastered with their lower frequencies perhaps 30dB more silent than high frequencies, largely because low frequencies require larger groove size. Mastering with less present low frequencies allows closer grooves and therefore more recording time. Phono pre-amps will equalize the inverse of what was applied to the master, so that they produce approximately what was recorded.

Comb filter

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

A comb filter adds delayed copy of a signal to itself.


More complex applications

(...which are sometimes presented as complete filters)

Noise, SNR, etc.

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Methods for noise reduction / signal (recognition) enhancement assistance :

  • Spectral subtraction (subtracting a pure noise signal's spectrum from te signal's. )
  • time domain neural methods (e.g. direct, Kalman)
  • frequency domain neural methods
  • wavelet shrinkage/tresholding
  • ...many more

Pitch detection, frequency estimation

Envelope detection

Beat/tempo detection

See Descriptions_used_for_sound_and_music#Beat_and_tempo

Dynamic range compression

Dynamic range compression refers to varying amplification, that adapts to loudness.

Music mixing/mastering (where context means it's often just called 'compression'), is often used to make different kinds of sources (instruments, vocals) behave similarly, and to let you e.g. draw attention to one thing and make another background texture.

Yes, just adjusting volume does that, but not always well enough.

Some instruments have a much larger dynamic range, that is, the difference between the loudest part and quiet parts are large - say, with drums. You could adjust the volume according to how it's played -- or you could let dynamic range compression essentially just do that for you.

There are some tricks that make for specific sounds, such as the (apparenty) increased decay time, the way drums don't drown out vocals, the way techno beats have bits of silence for emphasis (arguably a mistake, but we're used to it now). This means tweaking the parameters (e.g. the speed at which it allows changes), and amounts to informed use.

Dynamic compression also has risks, such as making things sound less natural, and that you end up with less detail in your mix than in your recording.

The largest potential problem is blanket uninformed application.

For example, home cinema systems often apply this so that you do not have to keep changing the volume in movies with both silent and loud parts, at expense of some quality (that is, dynamic range).

That's useful, and you can disable it if you care.

The same thing is done in broadcasts, to make a radio/tv station sound both consistent in level and perceptually louder. This has no direct use for us, and we can't choose the quality.

Record producers have caught wind of the concept of "compression == perceptually louder", and since they figure "louder = more noticeable = more sales" (probably true), they force mastering technicians to apply more overall compression. None of that volume variation, psh.

Which is sort of stupid, because radio stations do this anyway. (If not quite as aggressively, because it doesn't sound so good) Essentially, record companies sell all their CDs with effectively lowered quality, on the off chance that a station (and/or DJ) is completely ignorant of these details and this squeezes out slightly more perceptual volume.

The loudness wars this resulted in means that perceptual sparkle mentioned earlier (not easily quantifiable since it is an implication of dynamic range that depends on particular recorded signal/music itself) is reduced, sometimes significantly.

(Interestingly, vinyl versions of music may be better -- old ones because it was before this nonsense, new ones because of their target market excludes broadcast)


Voice coders are now mostly used for distorting vocals and instruments in music. The original idea was to parametrize speech for efficient transmission.

A basic vocoder is an analysis/synthesis system, that puts each each band from a multiband filter through an envelope follower (an analog circuit), resulting in the control signals from that follower. Sending these signals is a simple form of encryption (mostly by obscurity).

A phase vocoder

See also:

Linear Predictive Coding (LPC)

Analysis of the spectral envelope of a digital signal, with the assumption of the basic case of voiced speech, and therefore useful for speech parameter estimation.

e.g. seen in LPC vocoders, often for speech transmission.

Warped LPC (WLPC)

CELP and variants

Code excited linear prediction (CELP), with variants including ACELP, RCELP, LD-CELP (ITU-T G.728) and VSELP, and the older RELP, use source-filter models of speech production.

Spectral whitening

Activity Recognition

Fourier correlation notes

See also:

Unsorted / See also