Difference between revisions of "Signal analysis, modeling, processing"
m (→Bad words: infinite) |
m |
||
Line 1: | Line 1: | ||
− | {{ | + | {{avnotes}} |
− | + | ||
(See also the glossary) | (See also the glossary) |
Revision as of 15:12, 19 June 2019
The physical and human spects dealing with audio, video, and images
Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music Digital sound and processing: capture, storage, reproduction · programming and codescs · some glossary · Audio and signal processing - unsorted stuff Image: file formats · image processing Video: format notes · encoding notes Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions Unsorted: Signal analysis, modeling, processing (some audio, some more generic) For more, see Category:Audio, video, images |
(See also the glossary)
Contents
- 1 Visualization types
- 2 Transforms; the relation between waveforms and frequency content
- 2.1 Fourier transform
- 2.1.1 Basic idea, and some uses
- 2.1.2 Practicalities - constraints
- 2.1.3 Practicalities - common solutions/uses/alternatives
- 2.1.4 Practicalities - alternatives, interpretation
- 2.1.5 Error
- 2.1.6 More math
- 2.1.7 Relations between fourier, convolution, and correlation
- 2.1.8 Displaying
- 2.1.9 Multi dimensional FT
- 2.1.10 Variants and alternatives
- 2.2 Wavelet transforms
- 2.3 Hartley transform
- 2.4 Related concepts
- 2.5 See also
- 2.1 Fourier transform
- 3 Other frequency-related transforms
- 4 Filtering theory
- 5 Filters and analyses
- 6 Activity Recognition
- 7 Fourier correlation notes
- 8 Unsorted / See also
Visualization types
Transforms; the relation between waveforms and frequency content
...often for converting between time domain and frequency domain.
Fourier transform
Note: If you like math, you will prefer other explanations. This one is going more for intuition, context, and footnotes of practice. (Well, ideally)
Basic idea, and some uses
Jean Baptiste Joseph Fourier figured that any periodic vibration can be constructed as a weighed sum of an infinite series of number of harmonically related sinusoids.
Where 'harmonically related' means 'all of these sines have a frequency that are multiples of a base frequency.
So Fourier analysis takes a signal, and expresses it as how much of each such periodic component there is.
Mathematically, the result contains magnitude and phase for each frequency component.
- Magnitude is the amount presence of this frequency
- Phase is mostly necessary to reconstruct it accurately, intuitively to shift each part to the right place
When you have both, the fourier-space data is actually equivalent information to what it came from (...to within floating-point errors).
Some applications rely on this equivalence in processing, often to do parts of processing that just happens to be easier to do in one real space or in frequency space. (Others do not use all information anyway. For example, spectrogram visualization throws away the phase data, because it only cares to show the magnitude of each frequency component.)
Fourier synthesis is the reverse, taking frequency-space information, and synthesizes the (often time-series) data it represents.
Synthesis is sometimes seen by itself, doing signal generation, such as in mimicking some types of sounds.
More frequently it is combined with Fourier analysis, often to do do filtering that is much easier to express in fourier space.
Fourier implementations can be used to assist some other calculations,
often because it is mathematically related to convolutions.
For example, you can do large convolutions more efficiently (than naive convolution implementations), because the convolution theorem implies that element-wise multiplication in the frequency domain is equivalent to convolution in the time domain.
Practicalities - constraints
Bad words: infinite
Infinite sounds bad for real world application.
Not as bad as you may think.
For discrete-interval data, the amount of sinusoids necessary is strictly bound by the input size.
...because discrete data is implicitly band-limited - it cannot represent anything higher.
This sounds like math sidestepping the real issue, but in practice it just points out you already faced them earlier: If your discrete data is a sampling of something that did contain something higher-frequency, you've either not sampled it at all (e.g. because it was filtered it out before it got to the sensor), or have sampled it with some potentially nasty artifacts (primarily aliasing)
For example, devices made to record sound are almost all designed to filter out higher frequencies than the medium or ADC can deal with. (There are some variants of how - it can be done in hardware filter (e.g. RC circuit) but these days, the the ADC is typically doing further filtering via supersampling)
If an ADC didn't do any filtering, the data would still be band limited, but it would probably include aliased signal (not very much because most microphones barely respond over ~15kHz, and the world is quiet in this range anyway, at least relative to the human-audble frequencies).
Bad words: periodic
The word periodic turns out to be the larger problem.
A fundamental property/assumption of the FT is that it essentially assumes the entire signal is a single infinitely repeating chunk (not exactly, but intuitively true enough).
Even for very periodic signals, even a perfect sine wave,
perfect periodicity happens only when the wavelength is an exact multiple of the window size. In other words, basically never.
If you intuit the FT as a tuning fork for each output bucket, then it's still mainly the right (closest) bucket that responds the most, but the adjacent few will do so too. This is one reason for spectral leakage.
Another practical problem is that, between the first and last sample in your data, there will usually be a large step in values.
Which is a slope discontinuity just like a large step in the middle of your data would be. It will try to model this as best it can, which it will do with all the frequencies - this is a problem with any pulse or step, which is also spectral leakage. (This is vague description, find the math if you care)
Since this one is not even actually present in your data, it's effectively wrong, though note that there's only so much energy in it.
If just showing the spectrum, you've muddled it a tiny bit. If doing FT space filtering, you've introduced new signal (and exactly how much depends on implementation details).
Another problem is that the longer a sample is, the likelier any real-world signal is to contain something non-periodic.
Just because most real signals are not pure and perfectly behaved.
Say, 0.1 seconds from a piece of music is likely to mostly just contain a sustained note, or silence, or something noisy, or something else somewhat specific. A 10 second chunk? Not so much. A 100 second chunk? Basically not.
Sure you can analyze these well enough, in that synthesis will reconstruct the same thing. But the less periodic the content really is, the less intuitive the frequency-space data is to interpret or manipulate.
This is related to that you get no time information. The basic FT gives output that summarizes the overall frequency content of the entire input.
...and phase content that puts things in the right place in time, which is very much central to reconstruction because of the way it combines. In fact, the phase is arguably more information-dense than the ampitudes. It's just so entangled that it is almost impossible to read off anything useful from the phase information.
If you want frequencies over time information, such as a spectrogram[1], you probably want the STFT (short-time FT).
This is a controlled tradeoff between time resolution and frequency resolution, by taking the FT of short chunks of signal, which both localizes and reduces the problems mentioned above (e.g. considering music in (overlapping) 0.1 second chunks).
- It's not perfect, but it's good enough for many needs. For example, music and other sound analysis is typically done this way.
On 'Band limited'
Your data must be band limited, meaning it must have no frequency content beyond a cutoff band.
With digital samples, this is mostly something you need to think about at the sampling step.
Once you have discrete-interval data, it's implicitly already band-limited. It can no longer represent frequencies beyond its Nyquist frequency.
That doesn't always mean your sampled data is exactly what you wanted.
For example, if the band limit of your recording is lower (because low sample rate) than the frequency of the sound you recorded, and you didn't filter it out, the physically present higher frequencies, they will be present in aliased form.
For sound recording, this is almost always solved by the recording device doing implicit filtering typically as an electronic lowpass (or sometimes via a supersampling ADC).
But in more generic hardware, it's something you need to think about.
The reason behind the requirement is that mathematically, Fourier analysis is guaranteed to converge only for band-limited signals.
Which is mostly just a mathematical property of the general theory related to the word 'infinite' in 'infinite sum of sinusoids'. (Without being band limited, such as for FTs on continuous mathematical functions, it would actually be infinite, or at least become a more complex issue.)
So for band-limited data, a finite sum is enough. In fact, there is a hard bound on the necessary amount of components.
(And for a lot of everyday sound, a modest amount of sums will already be a decent approximation)
For example, a sweater that shows moire patterns on TV is entirely band limited once on TV, but not quite what you intended to record, because the sampling was rougher than what was there. What you would generally prefer in this case is filtering before sampling so that it's fuzzy, instead of distracting.
Practicalities - common solutions/uses/alternatives
On windowing
The way people often use the Short-Time Fourier Transform (STFT), plus window function, will address the 'no time information', 'has to be periodic', 'likely step at the edge', and optionally the 'avoid losing too much information' issues together.
'Windowing' can refer to both taking a block of data at a time, and to applying an envelope within that window, typically called a window function.
In many applications (e.g. spectrograms, because STFT) you do both.
You suppresses the step-at-the-signal-edge discontinuity by suppressing the entire signal at the edge with the mentioned window function (envelope).
If you care to get the most out of your signal (or get representative amplitudes), then you will care that this lowers the signal strength before handing it to the FT.
If you were to partition the data (i.e. no overlap) and apply a windowing function to each, you wold effectively be ignoring part of it, so lose part of the signal. In spectrograms you might not care, but in low-signal-to-noise applications you may wish to overlap these blocks.
You choices:
- choice of window size is a tradeoff between time resolution and frequency resolution (generic STFT stuff)
- choice of window function offers a few different tradeoffs - roughly how large the main lobe of the frequency response is and how large the tail is.
- ideal window overlap depends on the exact shape of the window function.
- Exactly how much overlap is (mathematically) ideal depends on the characteristics of the windowing function - described in a few places, see e.g. "spectrum and spectral density estimation by the discrete fourier transform".
- Secondarily on the computational resources you care to spend, as it's a diminishing-returns thing.
Data massaging and tradeoffs - what happens when you...
...resize?
...cut?
...zero-pad?
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
...choose different window sizes for analysis?
Interpretation: sound, images, etc.
Practicalities - alternatives, interpretation
Relevant to choice of library functions
On real and complex input/output, and symmetry
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
complex real --FT--> complex complex --iFT--> complex
and, not unusually,
complex --iFT--> real
The last is, in some ways, not special at all, just the same as the one before it but with the assumption that all imaginary output components happen to be zero. This often amounts to throwing them away, which is valid only if they actually were (which, under many controlled circumstances, they weill be, within rounding error of being zero).
So a DFT package will often have a number of variant functions.
The most generic DFT functions often take complex-valued input, and output bins for -f/2 through f/2. This because for complex input, the two parts are not redundant/symmetric.
However, most practical applications are real-valued input. You can turn that in complex-with-zero-imaginary to hand it into the just-mentioned function, but it would do some redundant calculation, and the output would be symmetic around 0.
As such, most packages also have a 'FT on real-valued input' function, which will be a little faster.
As FFTW points out, "In many practical applications, the input data in[i] are purely real numbers, in which case the DFT output satisfies the 'Hermitian' redundancy: out[i] is the conjugate of out[n-i]. It is possible to take advantage of these circumstances in order to achieve roughly a factor of two improvement in both speed and memory usage"
(Some will still give the redundant data, some will return bins for 0..f/2. (verify))
(If you care to look up the math, look for terms like heritian symmetry and conjugate complex symmetry)
TODO: add diagram of real input versus complex input http://stackoverflow.com/questions/6740545/need-help-understanding-fft-output
Similarly, when you have complex data with hermitian redundancy and want to inverse-FT it, the output is complex with zero imaginary component (within rounding error) or, if you (can) ask the libary, real output.
This saves (only) a little calculation, some memory, and probably some code on your side.
"So what if you did
real --FT-> complex --iFT-> complex
and you notice the complex output has non-zero imaginary data?"
Well, if you did just that without altering the frequency-space data, then they'll be near zero, within rounding error.
But you probably did more, e.g. implemented a frequency-space filtering, then chances are you might have broken the conjugate symmetry doing so.
How much that actually matters depends on what you were doing.
https://stackoverflow.com/questions/19863410/conflict-in-imaginary-parts-after-ifft
https://dsp.stackexchange.com/questions/37272/how-to-get-a-real-signal-from-complex-ifft-values
-->
On shift
On and even/odd-sized input
Output - value scale and units
Output - frequency bins
Output - bin width and position
Output - note on (non)linearity
Error
Spectral leakage, window functions
(Temporal) Aliasing
Picket fence effect
Ringing (Gibbs phenomenon)
More math
Relations between fourier, convolution, and correlation
Displaying
Multi dimensional FT
Variants and alternatives
See also
Wavelet transforms
Hartley transform
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
Similar to the Fourier transform, which
- transforms real data to real data
- is its own inverse (aside from scaling)
The FHT (fast discrete HT) is not fundamentally faster to calculate than FFT variants that works on real data, although in specific cases there are more optimizations, so in simpler implementations, the FHT uses slightly less calculation than the FFT.
This (and the 'its own inverse' sometimes being convenient) mean you see the FHT in memory-limited hardware.
For example: http://wiki.openmusiclabs.com/wiki/ArduinoFHT
Related concepts
For a wider overview, you will want to distinguish between the various related terms, which include (roughly ordered from more general/abstract to more applied/discrete):
- Z-transform
- something of a generalization of the Discrete-time Fourier transform (DTFT, and not to be confused with the DFT)
- Takes a discrete real-numbered or complex-numbered time-domain signal and calculates a complex frequency-domain representation (apparently making it a discrete variation of the Laplace transform)
- Fourier Series
- Usually refers to the idea that x(t) is the sum of an infinite number of sinusoids
- Fourier series integral
- Mathematical basis for Fourier analysis
- Fourier Analysis
- x(t) to {a_{k}} (time series to frequencies)
- (the effective opposite of Fourier Synthesis)
- Fourier Synthesis
- {a_{k}} to x(t) (frequencies to time series)
- (the effective opposite of Fourier Analysis)
- Note that most Fourier transforms have an inverse to allow (re-)synthesis, e.g. iDFT, iFFT, etc.
- Fourier Transform(s)
- Can refer to
- The result of a Fourier analysis (the frequency-domain information)
- The method of converting to that domain, and usually back from its results (Fourier analysis, and synthesis)
- The more mathematical definition of the Continuous Fourier Transform, that transforms a function of a real variable into another, the frequency domain. (in which both are continuous and unbounded(
- A group of implementations for this transform
- Can refer to
- The Discrete Fourier Transform (DFT)
- a specific, practicalized version of the (continuous) fourier transform, in that
- it is finite-domain and for discrete-time functions
- input function must be discrete - a real- or complex-numbered time series, often from sampling a continuous signal
- only analyses frequency content required to approximate the given segment (unlike the DTFT)
- a specific, practicalized version of the (continuous) fourier transform, in that
- Fast Fourier Transform (FFT)
- Refers to a number of different algorithms to quickly calculate a DFT
- ...since the straightforward calculation of DFT takes O(n^{2}), and FFT takes O(n log n)
- Short-term Fourier Transform STFT
- Refers to the fourier transform (often specifically analysis) in which windowing is applied (in the forms of continuous theory, and/or discrete application)
- ...for a controlled tradeoff between frequency resolution and time resolution.
Other related concepts and transforms:
- Fourier sine transform and Fourier cosine transform [2], special cases of the continuous fourier transform for odd and even functions. Probably most often applied as the discrete sine transform [3] (DST) and discrete cosine transform (DCT)
See also
- Discrete Fourier Transform, Aug 2004, http://astronomy.swin.edu.au/~pbourke/analysis/dft/
- Discrete Fourier Transform and the FFT, Aug 2004, http://www.cage.curtin.edu.au/mechanical/info/vibrations/tut4.htm
- Introduction to DSP - frequency analysis Fourier transform, Aug 2004, http://www.bores.com/courses/intro/freq/3_ft.htm
- Part I: Fourier Transforms and Sampling, http://www.silcom.com/~aludwig/Signal_processing/Signal_processing.htm
- Zero Padding Does Not Buy Spectral Resolution, Aug 2004,
http://zone.ni.com/devzone/conceptd.nsf/webmain/81227DF4C2952C1A86256CA80053F322
- Maria Elena Angoletta, Fourier Analysis / Part 2: Technicalities, FFT & system analysis, Aug 2004,
- FFT Links http://www.fftw.org/links.html
- Discrete-time Fourier transform (DTFT)
- Fractional Fourier transform (FRFT)
- Modified discrete cosine transform (MDCT)
- Modulated complex lapped transform (MCLT)
Cepstrum, Cepstral transform
Filtering theory
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
Time-domain filters
Frequency-domain filters
Filters and analyses
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
Frequency pass and stop filters
Low-pass
Filters out frequencies higher than a cutoff frequency.
Regularly used in analog form, e.g. before sampling sound in sound cards, in DSL filters, etc.
Note that blurring samples works as a lowpass filter, in sound, images, and elsewhere.
Anti-aliasing filters are regularly lowpass filters.
See also Low-pass filter
High-pass
Filters out frequencies lower than a cutoff frequency.
Band-stop
Also known as band limit filter, notch filter, 'T-notch filter' and also 'band-elimination filter', and 'band-reject filter'.
The band that is filtered out is referred to as the stopband.
Band-pass
The band that is left out is referred to as the passband.
Can be seen as the combination of a high-pass and a low-pass.
Filterbanks
Subband coding
Separate out different frequency bands, to use different coding for each (for reasons of efficient transmission).
Equalization
Equalization refers to sound processing altering the frequency content/envelope, for correction (e.g. of a recording room's acoustic) or to give sound a particular feel or focus.
Fairly common in software is the graphic equalizer, which adjusts something like 20Hz-20kHz with various sliders, in bands of certain width. There are also hardware equivalents, which may offer feedback detection, to avoid feedback caused by peak sounds (such as vocals) or bad setups (where microphones strongly pick up speaker sound).
Pre-emphasis refers to structurally amplifying some frequencies. For example, vinyl records are mastered with their lower frequencies perhaps 30dB more silent than high frequencies, largely because low frequencies require larger groove size. Mastering with less present low frequencies allows closer grooves and therefore more recording time. Phono pre-amps will equalize the inverse of what was applied to the master, so that they produce approximately what was recorded.
Comb filter
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
A comb filter adds delayed copy of a signal to itself.
http://en.wikipedia.org/wiki/Comb_filter
More complex applications
(...which are sometimes presented as complete filters)
Noise, SNR, etc.
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
Methods for noise reduction / signal (recognition) enhancement assistance :
- Spectral subtraction (subtracting a pure noise signal's spectrum from te signal's. )
- time domain neural methods (e.g. direct, Kalman)
- frequency domain neural methods
- wavelet shrinkage/tresholding
- ...many more
Pitch detection, frequency estimation
Envelope detection
Beat/tempo detection
See Descriptions_used_for_sound_and_music#Beat_and_tempo
Dynamic range compression
Dynamic range compression refers to varying amplification, that adapts to loudness.
Music mixing/mastering (where context means it's often just called 'compression'),
is often used to make different kinds of sources (instruments, vocals) behave similarly,
and to let you e.g. draw attention to one thing and make another background texture.
Yes, just adjusting volume does that, but not always well enough.
Some instruments have a much larger dynamic range, that is, the difference between the loudest part and quiet parts are large - say, with drums. You could adjust the volume according to how it's played -- or you could let dynamic range compression essentially just do that for you.
There are some tricks that make for specific sounds, such as the (apparenty) increased decay time, the way drums don't drown out vocals, the way techno beats have bits of silence for emphasis (arguably a mistake, but we're used to it now). This means tweaking the parameters (e.g. the speed at which it allows changes), and amounts to informed use.
Dynamic compression also has risks, such as making things sound less natural, and that you end up with less detail in your mix than in your recording.
The largest potential problem is blanket uninformed application.
For example, home cinema systems often apply this so that you do not have to keep changing the volume in movies with both silent and loud parts, at expense of some quality (that is, dynamic range).
That's useful, and you can disable it if you care.
The same thing is done in broadcasts, to make a radio/tv station sound both consistent in level and perceptually louder. This has no direct use for us, and we can't choose the quality.
Record producers have caught wind of the concept of "compression == perceptually louder", and since they figure "louder = more noticeable = more sales" (probably true), they force mastering technicians to apply more overall compression. None of that volume variation, psh.
Which is sort of stupid, because radio stations do this anyway. (If not quite as aggressively, because it doesn't sound so good) Essentially, record companies sell all their CDs with effectively lowered quality, on the off chance that a station (and/or DJ) is completely ignorant of these details and this squeezes out slightly more perceptual volume.
The loudness wars this resulted in means that perceptual sparkle mentioned earlier (not easily quantifiable since it is an implication of dynamic range that depends on particular recorded signal/music itself) is reduced, sometimes significantly.
(Interestingly, vinyl versions of music may be better -- old ones because it was before this nonsense, new ones because of their target market excludes broadcast)
Vocoders
Voice coders are now mostly used for distorting vocals and instruments in music. The original idea was to parametrize speech for efficient transmission.
A basic vocoder is an analysis/synthesis system, that puts each each band from a multiband filter through an envelope follower (an analog circuit), resulting in the control signals from that follower.
Sending these signals is a simple form of encryption (mostly by obscurity).
A phase vocoder
See also:
- http://en.wikipedia.org/wiki/Vocoder
- http://en.wikipedia.org/wiki/Envelope_follower
- http://en.wikipedia.org/wiki/Phase_vocoder
Linear Predictive Coding (LPC)
Analysis of the spectral envelope of a digital signal, with the assumption of the basic case of voiced speech, and therefore useful for speech parameter estimation.
e.g. seen in LPC vocoders, often for speech transmission.
Warped LPC (WLPC)
CELP and variants
Code excited linear prediction (CELP), with variants including ACELP, RCELP, LD-CELP (ITU-T G.728) and VSELP, and the older RELP, use source-filter models of speech production.
Spectral whitening
Activity Recognition
Fourier correlation notes
See also: