# Difference between revisions of "Signal analysis, modeling, processing"

 The physical and human spects dealing with audio, video, and images Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff Electronic music: Some history, ways of making noises · Gaming synth Modular synth (eurorack, mostly): sync · power supply · formats (physical, interconnects) Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification For more, see Category:Audio, video, images

## Transforms; the relation between waveforms and frequency content

...often for converting between time domain and frequency domain.

### Fourier transform

Note: If you like math, you will prefer other explanations. This one is going more for intuition, context, and footnotes of practice. (Well, ideally)

#### Basic idea, and some uses

Jean Baptiste Joseph Fourier figured that any periodic vibration can be constructed as a weighed sum of an infinite series of number of harmonically related sinusoids.

Where 'harmonically related' means 'all of these sines have a frequency that are multiples of a base frequency.

So Fourier analysis takes a signal, and expresses it as how much of each such periodic component there is.

Mathematically, the result expresses things in terms of magnitude and phase for each frequency component.

Magnitude is the amount of presence of this particular frequency
Phase is mostly necessary to reconstruct it accurately, intuitively to shift each one sideways as necessary, so that it combines best

When you have both, the fourier-space data is actually equivalent information to what it came from, so if rounding errors / floating point errors weren't a thing, you could go back and forth at will.

Some applications rely on this equivalence, often to do parts of their processing in real space or in frequency space, whichever is most convenient to express or calculate that change in. (Others do not use all information anyway. In particular visualisation usually just throws away the phase data (see spectrograms), because it only cares to show the magnitude of each frequency component, not to reconstruct the original precisely.)

Fourier synthesis is the reverse operation, taking frequency-space information, and synthesizes the (often time-series) data it represents.

Synthesis is sometimes seen by itself, doing signal generation, such as in mimicking some types of sounds.

More frequently, Fourier analysis and Fourier synthesis are combined, often to do do filtering that is much easier to express in Fourier space.

Fourier implementations can be used to assist some other calculations, often because it is mathematically related to convolutions.

For example, you can do large convolutions more efficiently (than naive convolution implementations), because the convolution theorem implies that element-wise multiplication in the frequency domain is equivalent to convolution in the time domain.

#### Practicalities - constraints

Infinite sounds bad for real world application.

Not as bad as you may think.

For discrete-interval data, the amount of sinusoids necessary is strictly bound by the input size.

...because discrete data is implicitly band-limited - it cannot represent anything higher.

This sounds like math sidestepping the real issue with a definition, but in practice it just points out already faced (and probably already dealt) with this issue earlier:

If your discrete data is a sampling of something that did contain something higher-frequency, you've either not sampled it at all (e.g. because it was filtered it out before it got to the sensor), or have sampled it with some potentially nasty artifacts (primarily aliasing)

For example, devices made to record sound are almost all designed to filter out higher frequencies than people, the medium, and ADCs can deal with. (There are some variants of how - it can be done in hardware filter (e.g. RC circuit) but these days, the the ADC is typically doing further filtering via supersampling)

If an ADC didn't do any filtering, the data would still be band limited, but it would probably include aliased signal (not very much because most microphones barely respond over ~15kHz, and the world is quiet in this range anyway, at least relative to the human-audble frequencies).

The word periodic turns out to be the larger problem.

A fundamental property/assumption of the FT is that it essentially assumes the entire signal is a single infinitely repeating chunk (not exactly, but intuitively true enough).

Even for very periodic signals, even a perfect sine wave, perfect periodicity happens only when the wavelength is an exact multiple of the window size. In other words, basically never.

If you intuit the FT as a tuning fork for each output bucket, then it's still mainly the right (closest) bucket that responds the most, but the adjacent few will do so too. This is one reason for spectral leakage.

Another practical problem is that, between the first and last sample in your data, there will usually be a large step in values.

Which is a slope discontinuity just like a large step in the middle of your data would be. It will try to model this as best it can, which it will do with all the frequencies - this is a problem with any pulse or step, which is also spectral leakage. (This is vague description, find the math if you care)

Since this one is not even actually present in your data, it's effectively wrong, though note that there's only so much energy in it.

If just showing the spectrum, you've muddled it a tiny bit. If doing FT space filtering, you've introduced new signal (and exactly how much depends on implementation details).

Another problem is that the longer a sample is, the likelier any real-world signal is to contain something non-periodic. Just because most real signals are not pure and perfectly behaved.

Say, 0.1 seconds from a piece of music is likely to mostly just contain a sustained note, or silence, or something noisy, or something else somewhat specific. A 10 second chunk? Not so much. A 100 second chunk? Basically not.

Sure you can analyze these well enough, in that synthesis will reconstruct the same thing. But the less periodic the content really is, the less intuitive the frequency-space data is to interpret or manipulate.

This is related to that you get no time information. The basic FT gives output that summarizes the overall frequency content of the entire input.

...and phase content that puts things in the right place in time, which is very much central to reconstruction because of the way it combines. In fact, the phase is arguably more information-dense than the ampitudes. It's just so entangled that it is almost impossible to read off anything useful from the phase information.

If you want frequencies over time information, such as a spectrogram[1], you probably want the STFT (short-time FT). This is a controlled tradeoff between time resolution and frequency resolution, by taking the FT of short chunks of signal, which both localizes and reduces the problems mentioned above (e.g. considering music in (overlapping) 0.1 second chunks).

It's not perfect, but it's good enough for many needs. For example, music and other sound analysis is typically done this way.

##### On 'Band limited'

Your data must be band limited, meaning it must have no frequency content beyond a cutoff band.

With digital samples, this is mostly something you need to think about at the sampling step.

Once you have discrete-interval data, it's implicitly already band-limited. It can no longer represent frequencies beyond its Nyquist frequency.

That doesn't always mean your sampled data is exactly what you wanted.

For example, if the band limit of your recording is lower (because low sample rate) than the frequency of the sound you recorded, and you didn't filter it out, the physically present higher frequencies, they will be present in aliased form.

For sound recording, this is almost always solved by the recording device doing implicit filtering typically as an electronic lowpass (or sometimes via a supersampling ADC).

But in more generic hardware, it's something you need to think about.

The reason behind the requirement is that mathematically, Fourier analysis is guaranteed to converge only for band-limited signals.

Which is mostly just a mathematical property of the general theory related to the word 'infinite' in 'infinite sum of sinusoids'. (Without being band limited, such as for FTs on continuous mathematical functions, it would actually be infinite, or at least become a more complex issue.)

So for band-limited data, a finite sum is enough. In fact, there is a hard bound on the necessary amount of components.

(And for a lot of everyday sound, a modest amount of sums will already be a decent approximation)

For example, a sweater that shows moire patterns on TV is entirely band limited once on TV, but not quite what you intended to record, because the sampling was rougher than what was there. What you would generally prefer in this case is filtering before sampling so that it's fuzzy, instead of distracting.

#### Practicalities - common solutions/uses/alternatives

##### On windowing

The way people often use the Short-Time Fourier Transform (STFT), plus window function, will address the 'no time information', 'has to be periodic', 'likely step at the edge', and optionally the 'avoid losing too much information' issues together.

'Windowing' can refer to both taking a block of data at a time, and to applying an amplitude envelope within that window. That envelope is typically called a window function.

In many applications (e.g. spectrograms, because STFT) you do both.

You suppresses the step-at-the-signal-edge discontinuity by suppressing the entire signal at the edge with the mentioned window function (envelope).

If you care to get the most out of your signal (or get representative amplitudes), then you will care that this lowers the signal strength before handing it to the FT.

If you were to partition the data (i.e. no overlap) and apply a windowing function to each, you wold effectively be ignoring part of it, so lose part of the signal. In spectrograms you might not care, but in low-signal-to-noise applications you may wish to overlap these blocks.

You choices:

• choice of window size is a tradeoff between time resolution and frequency resolution (generic STFT stuff)
• choice of window function offers a few different tradeoffs - roughly how large the main lobe of the frequency response is and how large the tail is.
• ideal window overlap depends on the exact shape of the window function.
Exactly how much overlap is (mathematically) ideal depends on the characteristics of the windowing function - described in a few places, see e.g. "spectrum and spectral density estimation by the discrete fourier transform".
Secondarily on the computational resources you care to spend, as it's a diminishing-returns thing.
##### Data massaging and tradeoffs - what happens when you...
###### ...cut?
 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

#### Practicalities - alternatives, interpretation

##### Relevant to choice of library functions
###### On real and complex input/output, and symmetry
 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

### Hartley transform

 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Similar to the Fourier transform, which

• transforms real data to real data
• is its own inverse (aside from scaling)

The FHT (fast discrete HT) is not fundamentally faster to calculate than FFT variants that works on real data, although in specific cases there are more optimizations, so in simpler implementations, the FHT uses slightly less calculation than the FFT.

This (and the 'its own inverse' sometimes being convenient) mean you see the FHT in memory-limited hardware.

For example: http://wiki.openmusiclabs.com/wiki/ArduinoFHT

### Related concepts

For a wider overview, you will want to distinguish between the various related terms, which include (roughly ordered from more general/abstract to more applied/discrete):

• Z-transform
• something of a generalization of the Discrete-time Fourier transform (DTFT, and not to be confused with the DFT)
• Takes a discrete real-numbered or complex-numbered time-domain signal and calculates a complex frequency-domain representation (apparently making it a discrete variation of the Laplace transform)
• Fourier Series
• Usually refers to the idea that x(t) is the sum of an infinite number of sinusoids
• Fourier series integral
• Mathematical basis for Fourier analysis
• Fourier Analysis
• x(t) to {ak} (time series to frequencies)
• (the effective opposite of Fourier Synthesis)
• Fourier Synthesis
• {ak} to x(t) (frequencies to time series)
• (the effective opposite of Fourier Analysis)
• Note that most Fourier transforms have an inverse to allow (re-)synthesis, e.g. iDFT, iFFT, etc.
• Fourier Transform(s)
• Can refer to
• The result of a Fourier analysis (the frequency-domain information)
• The method of converting to that domain, and usually back from its results (Fourier analysis, and synthesis)
• The more mathematical definition of the Continuous Fourier Transform, that transforms a function of a real variable into another, the frequency domain. (in which both are continuous and unbounded(
• A group of implementations for this transform
• The Discrete Fourier Transform (DFT)
• a specific, practicalized version of the (continuous) fourier transform, in that
• it is finite-domain and for discrete-time functions
• input function must be discrete - a real- or complex-numbered time series, often from sampling a continuous signal
• only analyses frequency content required to approximate the given segment (unlike the DTFT)
• Fast Fourier Transform (FFT)
• Refers to a number of different algorithms to quickly calculate a DFT
• ...since the straightforward calculation of DFT takes O(n2), and FFT takes O(n log n)
• Short-term Fourier Transform STFT
• Refers to the fourier transform (often specifically analysis) in which windowing is applied (in the forms of continuous theory, and/or discrete application)
• ...for a controlled tradeoff between frequency resolution and time resolution.

Other related concepts and transforms:

• Maria Elena Angoletta, Fourier Analysis / Part 2: Technicalities, FFT & system analysis, Aug 2004,

• Discrete-time Fourier transform (DTFT)
• Fractional Fourier transform (FRFT)
• Modified discrete cosine transform (MDCT)
• Modulated complex lapped transform (MCLT)

## Filtering theory

 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

## Filters and analyses

 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

### Frequency pass and stop filters

#### Low-pass

Filters out frequencies higher than a cutoff frequency.

Regularly used in analog form, e.g. before sampling sound in sound cards, in DSL filters, etc.

Note that blurring samples works as a lowpass filter, in sound, images, and elsewhere.

Anti-aliasing filters are regularly lowpass filters.

#### High-pass

Filters out frequencies lower than a cutoff frequency.

#### Band-stop

Also known as band limit filter, notch filter, 'T-notch filter' and also 'band-elimination filter', and 'band-reject filter'.

The band that is filtered out is referred to as the stopband.

#### Band-pass

The band that is left out is referred to as the passband.

Can be seen as the combination of a high-pass and a low-pass.

### Band-related tools

#### Subband coding

Separate out different frequency bands, to use different coding for each (for reasons of efficient transmission).

### Equalization

Equalization refers to sound processing altering the frequency content/envelope, for correction (e.g. of a recording room's acoustic) or to give sound a particular feel or focus.

Fairly common in software is the graphic equalizer, which adjusts something like 20Hz-20kHz with various sliders, in bands of certain width. There are also hardware equivalents, which may offer feedback detection, to avoid feedback caused by peak sounds (such as vocals) or bad setups (where microphones strongly pick up speaker sound).

Pre-emphasis refers to structurally amplifying some frequencies. For example, vinyl records are mastered with their lower frequencies perhaps 30dB more silent than high frequencies, largely because low frequencies require larger groove size. Mastering with less present low frequencies allows closer grooves and therefore more recording time. Phono pre-amps will equalize the inverse of what was applied to the master, so that they produce approximately what was recorded.

### Comb filter

 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

A comb filter adds delayed copy of a signal to itself.

### More complex applications

(...which are sometimes presented as complete filters)

#### Noise, SNR, etc.

 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Methods for noise reduction / signal (recognition) enhancement assistance :

• Spectral subtraction (subtracting a pure noise signal's spectrum from te signal's. )
• time domain neural methods (e.g. direct, Kalman)
• frequency domain neural methods
• wavelet shrinkage/tresholding
• ...many more

#### Dynamic range compression

Dynamic range compression refers to varying amplification, that adapts to loudness.

Music mixing/mastering (where context means it's often just called 'compression'), is often used to make different kinds of sources (instruments, vocals) behave similarly, and to let you e.g. draw attention to one thing and make another background texture.

Yes, just adjusting volume does that, but not always well enough.

Some instruments have a much larger dynamic range, that is, the difference between the loudest part and quiet parts are large - say, with drums. You could adjust the volume according to how it's played -- or you could let dynamic range compression essentially just do that for you.

There are some tricks that make for specific sounds, such as the (apparenty) increased decay time, the way drums don't drown out vocals, the way techno beats have bits of silence for emphasis (arguably a mistake, but we're used to it now). This means tweaking the parameters (e.g. the speed at which it allows changes), and amounts to informed use.

Dynamic compression also has risks, such as making things sound less natural, and that you end up with less detail in your mix than in your recording.

The largest potential problem is blanket uninformed application.

For example, home cinema systems often apply this so that you do not have to keep changing the volume in movies with both silent and loud parts, at expense of some quality (that is, dynamic range).

That's useful, and you can disable it if you care.

The same thing is done in broadcasts, to make a radio/tv station sound both consistent in level and perceptually louder. This has no direct use for us, and we can't choose the quality.

Record producers have caught wind of the concept of "compression == perceptually louder", and since they figure "louder = more noticeable = more sales" (probably true), they force mastering technicians to apply more overall compression. None of that volume variation, psh.

Which is sort of stupid, because radio stations do this anyway. (If not quite as aggressively, because it doesn't sound so good) Essentially, record companies sell all their CDs with effectively lowered quality, on the off chance that a station (and/or DJ) is completely ignorant of these details and this squeezes out slightly more perceptual volume.

The loudness wars this resulted in means that perceptual sparkle mentioned earlier (not easily quantifiable since it is an implication of dynamic range that depends on particular recorded signal/music itself) is reduced, sometimes significantly.

(Interestingly, vinyl versions of music may be better -- old ones because it was before this nonsense, new ones because of their target market excludes broadcast)

### Vocoders

Voice coders are now mostly used for distorting vocals and instruments in music. The original idea was to parametrize speech for efficient transmission.

A basic vocoder is an analysis/synthesis system, that puts each each band from a multiband filter through an envelope follower (an analog circuit), resulting in the control signals from that follower. Sending these signals is a simple form of encryption (mostly by obscurity).

A phase vocoder

### Linear Predictive Coding (LPC)

Analysis of the spectral envelope of a digital signal, with the assumption of the basic case of voiced speech, and therefore useful for speech parameter estimation.

e.g. seen in LPC vocoders, often for speech transmission.

Warped LPC (WLPC)

### CELP and variants

Code excited linear prediction (CELP), with variants including ACELP, RCELP, LD-CELP (ITU-T G.728) and VSELP, and the older RELP, use source-filter models of speech production.