Difference between revisions of "Signal analysis, modeling, processing"

From Helpful
Jump to: navigation, search
m (Bad words: periodic)
m (Practicalities - constraints)
Line 192: Line 192:
This is a controlled tradeoff between time resolution and frequency resolution, by taking the FT of short chunks of signal, which both localizes and reduces the problems mentioned above (e.g. considering music in (overlapping) 0.1 second chunks).
This is a controlled tradeoff between time resolution and frequency resolution, by taking the FT of short chunks of signal, which both localizes and reduces the problems mentioned above (e.g. considering music in (overlapping) 0.1 second chunks).
: It's not perfect, but it's good enough for many needs. For example, music and other sound analysis is typically done this way.
: It's not perfect, but it's good enough for many needs. For example, music and other sound analysis is typically done this way.
=====On 'Band limited'=====
=====On 'Band limited'=====

Revision as of 12:51, 29 September 2022

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music: Some history, ways of making noises · Gaming synth

Modular synth (eurorack, mostly): sync · power supply · formats (physical, interconnects)

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

(See also the glossary)


Visualization types

Transforms; the relation between waveforms and frequency content

...often for converting between time domain and frequency domain.

Fourier transform

Note: If you like math, you will prefer other explanations. This one is going more for intuition, context, and footnotes of practice. (Well, ideally)

Basic idea, and some uses

Jean Baptiste Joseph Fourier figured that any periodic vibration can be constructed as a weighed sum of an infinite series of number of harmonically related sinusoids.

Where 'harmonically related' means 'all of these sines have a frequency that are multiples of a base frequency.

So Fourier analysis takes a signal, and expresses it as how much of each such periodic component there is.

Mathematically, the result expresses things in terms of magnitude and phase for each frequency component.

Magnitude is the amount of presence of this particular frequency
Phase is mostly necessary to reconstruct it accurately, intuitively to shift each one sideways as necessary, so that it combines best

When you have both, the fourier-space data is actually equivalent information to what it came from, so if rounding errors / floating point errors weren't a thing, you could go back and forth at will.

Some applications rely on this equivalence, often to do parts of their processing in real space or in frequency space, whichever is most convenient to express or calculate that change in. (Others do not use all information anyway. In particular visualisation usually just throws away the phase data (see spectrograms), because it only cares to show the magnitude of each frequency component, not to reconstruct the original precisely.)

Fourier synthesis is the reverse operation, taking frequency-space information, and synthesizes the (often time-series) data it represents.

Synthesis is sometimes seen by itself, doing signal generation, such as in mimicking some types of sounds.

More frequently, Fourier analysis and Fourier synthesis are combined, often to do do filtering that is much easier to express in Fourier space.

Fourier implementations can be used to assist some other calculations, often because it is mathematically related to convolutions.

For example, you can do large convolutions more efficiently (than naive convolution implementations), because the convolution theorem implies that element-wise multiplication in the frequency domain is equivalent to convolution in the time domain.

Practicalities - constraints

Bad words: infinite

Infinite sounds bad for real world application.

Not as bad as you may think.

For discrete-interval data, the amount of sinusoids necessary is strictly bound by the input size.

...because discrete data is implicitly band-limited - it cannot represent anything higher.

If it has a sample rate, it is band limited. Nyquist had something to say about this too.

This sounds like math sidestepping the real issue with a definition, but in practice it just points out already faced (and probably already dealt) with this issue earlier:

If your discrete data is a sampling of something that did contain something higher-frequency, you've either not sampled it at all (e.g. because it was filtered it out before it got to the sensor), or have sampled it with some potentially nasty artifacts (primarily aliasing)

For example, devices made to record sound are almost all designed to filter out higher frequencies than people, the medium, and ADCs can deal with. (There are some variants of how - it can be done in hardware filter (e.g. RC circuit) but these days, the the ADC is typically doing further filtering via supersampling)

If an ADC didn't do any filtering, the data would still be band limited, but it would probably include aliased signal (not very much because most microphones barely respond over ~15kHz, and the world is quiet in this range anyway, at least relative to the human-audble frequencies).

Bad words: periodic

The word periodic turns out to be the larger problem.

A fundamental property/assumption of the FT is that it essentially assumes the entire signal is a single infinitely repeating chunk (not exactly, but intuitively true enough).

Even for very periodic signals, even a perfect sine wave, perfect periodicity happens only when the wavelength is an exact multiple of the window size. In other words, almost never.

If you intuit the FT as a tuning fork for each output bucket, then it will still assign mostly to the bucket that responds the most, but the adjacent few will do so too. This is one reason for spectral leakage.

Another practical problem is that, between the first and last sample in your data, there will usually be a large step in values.

Which is a slope discontinuity just like a large step in the middle of your data would be. It will try to model this as best it can, which it will do with all the frequencies - this is a problem with any pulse or step, which is also spectral leakage. (...yes, this is a vague description. Find the math if you care)

Since these other frequencies are not actually present in your data, it's effectively wrong.

There is not much energy that leaks, and if you're only showing the spectrum, you've blurred it a tiny bit.

Yet if doing FT space filtering, you've introduced new signal, and exactly how much depends on implementation details and parameters.

Another issue is that the longer a sample is, the likelier any real-world signal is to contain something non-periodic. Just because most real signals are not pure and perfectly behaved.

Say, 0.1 seconds from a piece of music might mostly contain a sustained note, or silence, or something noisy, or something else somewhat specific. A 10 second chunk? Rarely. A 100 second chunk? No.

Sure you can analyze these well enough, in that synthesis will reconstruct the same thing. But the less periodic the content really is, the less intuitive the frequency-space data is to interpret or manipulate.

This is related to that you get no time information. The basic FT gives output that summarizes the overall frequency content of the entire input.

...and phase content that puts things in the right place in time, which is very much central to reconstruction because of the way it combines. In fact, the phase is arguably more information-dense than the ampitudes. It's just so entangled that it is almost impossible to read off anything useful from the phase information.

If you want frequencies over time information, such as a spectrogram[1], you probably want the STFT (short-time FT). This is a controlled tradeoff between time resolution and frequency resolution, by taking the FT of short chunks of signal, which both localizes and reduces the problems mentioned above (e.g. considering music in (overlapping) 0.1 second chunks).

It's not perfect, but it's good enough for many needs. For example, music and other sound analysis is typically done this way.

On 'Band limited'

Your data must be band limited, meaning it must have no frequency content beyond a cutoff band.

With digital samples, this is mostly something you need to think about at the sampling step.

Once you have discrete-interval data, it's implicitly already band-limited. It can no longer represent frequencies beyond its Nyquist frequency.

That doesn't always mean your sampled data is exactly what you wanted.

For example, if the band limit of your recording is lower (because low sample rate) than the frequency of the sound you recorded, and you didn't filter it out, the physically present higher frequencies, they will be present in aliased form.

For sound recording, this is almost always solved by the recording device doing implicit filtering typically as an electronic lowpass (or sometimes via a supersampling ADC).

But in more generic hardware, it's something you need to think about.

The reason behind the requirement is that mathematically, Fourier analysis is guaranteed to converge only for band-limited signals.

Which is mostly just a mathematical property of the general theory related to the word 'infinite' in 'infinite sum of sinusoids'. (Without being band limited, such as for FTs on continuous mathematical functions, it would actually be infinite, or at least become a more complex issue.)

So for band-limited data, a finite sum is enough. In fact, there is a hard bound on the necessary amount of components.

(And for a lot of everyday sound, a modest amount of sums will already be a decent approximation)

For example, a sweater that shows moire patterns on TV is entirely band limited once on TV, but not quite what you intended to record, because the sampling was rougher than what was there. What you would generally prefer in this case is filtering before sampling so that it's fuzzy, instead of distracting.

Practicalities - common solutions/uses/alternatives

On windowing

The way people often use the Short-Time Fourier Transform (STFT), plus window function, will address the 'no time information', 'has to be periodic', 'likely step at the edge', and optionally the 'avoid losing too much information' issues together.

'Windowing' can refer to both taking a block of data at a time, and to applying an amplitude envelope within that window. That envelope is typically called a window function.

In many applications (e.g. spectrograms, because STFT) you do both.

You suppresses the step-at-the-signal-edge discontinuity by suppressing the entire signal at the edge with the mentioned window function (envelope).

If you care to get the most out of your signal (or get representative amplitudes), then you will care that this lowers the signal strength before handing it to the FT.

If you were to partition the data (i.e. no overlap) and apply a windowing function to each, you wold effectively be ignoring part of it, so lose part of the signal. In spectrograms you might not care, but in low-signal-to-noise applications you may wish to overlap these blocks.

You choices:

  • choice of window size is a tradeoff between time resolution and frequency resolution (generic STFT stuff)
  • choice of window function offers a few different tradeoffs - roughly how large the main lobe of the frequency response is and how large the tail is.
  • ideal window overlap depends on the exact shape of the window function.
Exactly how much overlap is (mathematically) ideal depends on the characteristics of the windowing function - described in a few places, see e.g. "spectrum and spectral density estimation by the discrete fourier transform".
Secondarily on the computational resources you care to spend, as it's a diminishing-returns thing.
Data massaging and tradeoffs - what happens when you...
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)
...choose different window sizes for analysis?
Interpretation: sound, images, etc.

Practicalities - alternatives, interpretation

Relevant to choice of library functions
On real and complex input/output, and symmetry
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

On shift
On and even/odd-sized input
Output - value scale and units
Output - frequency bins
Output - bin width and position
Output - note on (non)linearity


Spectral leakage, window functions
(Temporal) Aliasing
Picket fence effect
Ringing (Gibbs phenomenon)

More math

Relations between fourier, convolution, and correlation


Multi dimensional FT

Variants and alternatives

See also

Wavelet transforms

Hartley transform

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Similar to the Fourier transform, which

  • transforms real data to real data
  • is its own inverse (aside from scaling)

The FHT (fast discrete HT) is not fundamentally faster to calculate than FFT variants that works on real data, although in specific cases there are more optimizations, so in simpler implementations, the FHT uses slightly less calculation than the FFT.

This (and the 'its own inverse' sometimes being convenient) mean you see the FHT in memory-limited hardware.

For example: http://wiki.openmusiclabs.com/wiki/ArduinoFHT

Related concepts

For a wider overview, you will want to distinguish between the various related terms, which include (roughly ordered from more general/abstract to more applied/discrete):

  • Z-transform
    • something of a generalization of the Discrete-time Fourier transform (DTFT, and not to be confused with the DFT)
    • Takes a discrete real-numbered or complex-numbered time-domain signal and calculates a complex frequency-domain representation (apparently making it a discrete variation of the Laplace transform)
  • Fourier Series
    • Usually refers to the idea that x(t) is the sum of an infinite number of sinusoids
  • Fourier series integral
    • Mathematical basis for Fourier analysis
  • Fourier Analysis
    • x(t) to {ak} (time series to frequencies)
    • (the effective opposite of Fourier Synthesis)
  • Fourier Synthesis
    • {ak} to x(t) (frequencies to time series)
    • (the effective opposite of Fourier Analysis)
    • Note that most Fourier transforms have an inverse to allow (re-)synthesis, e.g. iDFT, iFFT, etc.
  • Fourier Transform(s)
    • Can refer to
      • The result of a Fourier analysis (the frequency-domain information)
      • The method of converting to that domain, and usually back from its results (Fourier analysis, and synthesis)
      • The more mathematical definition of the Continuous Fourier Transform, that transforms a function of a real variable into another, the frequency domain. (in which both are continuous and unbounded(
      • A group of implementations for this transform
  • The Discrete Fourier Transform (DFT)
    • a specific, practicalized version of the (continuous) fourier transform, in that
      • it is finite-domain and for discrete-time functions
      • input function must be discrete - a real- or complex-numbered time series, often from sampling a continuous signal
      • only analyses frequency content required to approximate the given segment (unlike the DTFT)
  • Fast Fourier Transform (FFT)
    • Refers to a number of different algorithms to quickly calculate a DFT
    • ...since the straightforward calculation of DFT takes O(n2), and FFT takes O(n log n)
  • Short-term Fourier Transform STFT
    • Refers to the fourier transform (often specifically analysis) in which windowing is applied (in the forms of continuous theory, and/or discrete application)
    • ...for a controlled tradeoff between frequency resolution and time resolution.

Other related concepts and transforms:

See also

  • Zero Padding Does Not Buy Spectral Resolution, Aug 2004,


  • Maria Elena Angoletta, Fourier Analysis / Part 2: Technicalities, FFT & system analysis, Aug 2004,


  • Modulated complex lapped transform (MCLT)

Other frequency-related transforms

Cepstrum, Cepstral transform

Filtering theory

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Time-domain filters

Frequency-domain filters

Filters and analyses

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Frequency pass and stop filters


Filters out frequencies higher than a cutoff frequency.

Regularly used in analog form, e.g. before sampling sound in sound cards, in DSL filters, etc.

Note that blurring samples works as a lowpass filter, in sound, images, and elsewhere.

Anti-aliasing filters are regularly lowpass filters.

See also Low-pass filter


Filters out frequencies lower than a cutoff frequency.


Also known as band limit filter, notch filter, 'T-notch filter' and also 'band-elimination filter', and 'band-reject filter'.

The band that is filtered out is referred to as the stopband.


The band that is left out is referred to as the passband.

Can be seen as the combination of a high-pass and a low-pass.

Band-related tools


Subband coding

Separate out different frequency bands, to use different coding for each (for reasons of efficient transmission).


Equalization refers to sound processing altering the frequency content/envelope, for correction (e.g. of a recording room's acoustic) or to give sound a particular feel or focus.

Fairly common in software is the graphic equalizer, which adjusts something like 20Hz-20kHz with various sliders, in bands of certain width. There are also hardware equivalents, which may offer feedback detection, to avoid feedback caused by peak sounds (such as vocals) or bad setups (where microphones strongly pick up speaker sound).

Pre-emphasis refers to structurally amplifying some frequencies. For example, vinyl records are mastered with their lower frequencies perhaps 30dB more silent than high frequencies, largely because low frequencies require larger groove size. Mastering with less present low frequencies allows closer grooves and therefore more recording time. Phono pre-amps will equalize the inverse of what was applied to the master, so that they produce approximately what was recorded.

Comb filter

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

A comb filter adds delayed copy of a signal to itself.


More complex applications

(...which are sometimes presented as complete filters)

Noise, SNR, etc.

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Methods for noise reduction / signal (recognition) enhancement assistance :

  • Spectral subtraction (subtracting a pure noise signal's spectrum from te signal's. )
  • time domain neural methods (e.g. direct, Kalman)
  • frequency domain neural methods
  • wavelet shrinkage/tresholding
  • ...many more

Pitch detection, frequency estimation

Envelope detection

Beat/tempo detection

See Descriptions_used_for_sound_and_music#Beat_and_tempo

Dynamic range compression

Dynamic range compression refers to varying amplification, that adapts to loudness.

Music mixing/mastering (where context means it's often just called 'compression'), is often used to make different kinds of sources (instruments, vocals) behave similarly, and to let you e.g. draw attention to one thing and make another background texture.

Yes, just adjusting volume does that, but not always well enough.

Some instruments have a much larger dynamic range, that is, the difference between the loudest part and quiet parts are large - say, with drums. You could adjust the volume according to how it's played -- or you could let dynamic range compression essentially just do that for you.

There are some tricks that make for specific sounds, such as the (apparenty) increased decay time, the way drums don't drown out vocals, the way techno beats have bits of silence for emphasis (arguably a mistake, but we're used to it now). This means tweaking the parameters (e.g. the speed at which it allows changes), and amounts to informed use.

Dynamic compression also has risks, such as making things sound less natural, and that you end up with less detail in your mix than in your recording.

The largest potential problem is blanket uninformed application.

For example, home cinema systems often apply this so that you do not have to keep changing the volume in movies with both silent and loud parts, at expense of some quality (that is, dynamic range).

That's useful, and you can disable it if you care.

The same thing is done in broadcasts, to make a radio/tv station sound both consistent in level and perceptually louder. This has no direct use for us, and we can't choose the quality.

Record producers have caught wind of the concept of "compression == perceptually louder", and since they figure "louder = more noticeable = more sales" (probably true), they force mastering technicians to apply more overall compression. None of that volume variation, psh.

Which is sort of stupid, because radio stations do this anyway. (If not quite as aggressively, because it doesn't sound so good) Essentially, record companies sell all their CDs with effectively lowered quality, on the off chance that a station (and/or DJ) is completely ignorant of these details and this squeezes out slightly more perceptual volume.

The loudness wars this resulted in means that perceptual sparkle mentioned earlier (not easily quantifiable since it is an implication of dynamic range that depends on particular recorded signal/music itself) is reduced, sometimes significantly.

(Interestingly, vinyl versions of music may be better -- old ones because it was before this nonsense, new ones because of their target market excludes broadcast)


Voice coders are now mostly used for distorting vocals and instruments in music. The original idea was to parametrize speech for efficient transmission.

A basic vocoder is an analysis/synthesis system, that puts each each band from a multiband filter through an envelope follower (an analog circuit), resulting in the control signals from that follower. Sending these signals is a simple form of encryption (mostly by obscurity).

A phase vocoder

See also:

Linear Predictive Coding (LPC)

Analysis of the spectral envelope of a digital signal, with the assumption of the basic case of voiced speech, and therefore useful for speech parameter estimation.

e.g. seen in LPC vocoders, often for speech transmission.

Warped LPC (WLPC)

CELP and variants

Code excited linear prediction (CELP), with variants including ACELP, RCELP, LD-CELP (ITU-T G.728) and VSELP, and the older RELP, use source-filter models of speech production.

Spectral whitening

Activity Recognition

Fourier correlation notes

See also:

Unsorted / See also