Difference between revisions of "Signal analysis, modeling, processing"
m (→Data constraints) |
m (→Practicalities - alternatives, interpretation) |
||
(10 intermediate revisions by the same user not shown) | |||
Line 270: | Line 270: | ||
--> | --> | ||
− | ====Practicalities - alternatives | + | ====Practicalities - common solutions/uses/alternatives==== |
− | =====On | + | =====On windowing===== |
− | ====== | + | The way people often use the Short-Time Fourier Transform (STFT), plus window function, will address the |
+ | 'no time information', | ||
+ | 'has to be periodic', | ||
+ | 'likely step at the edge', | ||
+ | and optionally the 'avoid losing too much information' issues together. | ||
+ | |||
+ | |||
+ | 'Windowing' can refer to both taking a block of data at a time, and to applying an envelope within that window, typically called a [[#Spectral leakage, window functions|window function]]. | ||
+ | |||
+ | In many applications (e.g. spectrograms, because STFT) you do both. | ||
+ | |||
+ | |||
+ | |||
+ | You suppresses the step-at-the-signal-edge discontinuity by suppressing the entire signal at the edge with the mentioned window function (envelope). | ||
+ | |||
+ | If you care to get the most out of your signal (or get representative amplitudes), then you will care that this lowers the signal strength before handing it to the FT. | ||
+ | |||
+ | |||
+ | If you were to partition the data (i.e. no overlap) and apply a windowing function to each, you wold effectively be ignoring part of it, so lose part of the signal. In spectrograms you might not care, but in low-signal-to-noise applications you may wish to '''overlap these blocks'''. | ||
+ | |||
+ | |||
+ | You choices: | ||
+ | * choice of '''window size''' is a tradeoff between time resolution and frequency resolution (generic STFT stuff) | ||
+ | |||
+ | * choice of '''window function''' offers a few different tradeoffs - roughly how large the main lobe of the frequency response is and how large the tail is. | ||
+ | |||
+ | * '''ideal window overlap depends on the exact shape of the window function. | ||
+ | : Exactly how much overlap is (mathematically) ideal depends on the characteristics of the windowing function - described in a few places, see e.g. "{{search|spectrum and spectral density estimation by the discrete fourier transform}}". | ||
+ | : Secondarily on the computational resources you care to spend, as it's a diminishing-returns thing. | ||
+ | |||
+ | |||
+ | =====Data massaging and tradeoffs - what happens when you...===== | ||
+ | |||
+ | <!-- | ||
+ | (note: in some of the following, it may help consider each of these in terms of samples, and/''or'' in terms of 0..nyqist frequency) | ||
+ | |||
+ | Sometimes one has much more meaning, or e.g. reveals "okay yes lots of things changed but overall you've just blurred the thing" | ||
+ | |||
+ | --> | ||
+ | |||
+ | ======...resize?====== | ||
+ | <!-- | ||
+ | |||
+ | '''Sizing down original data''' | ||
+ | |||
+ | Consider the example of averaging every two samples together into one, halving the data size. | ||
+ | |||
+ | This is, in itself, blurring the data, effectively a lowpass. | ||
+ | |||
+ | If you care about relation to original units, | ||
+ | then you should also remember you've effectively just halved the sampling rate. | ||
+ | |||
+ | |||
+ | So if you think of the result in terms of 0 .. nyquist | ||
+ | |||
+ | |||
+ | |||
+ | '''Sizing down fourier-space data''' | ||
+ | |||
+ | If you e.g. bin the frequency space before inverse FT, | ||
+ | this is essentially fourier-space rescaling, | ||
+ | which is somewhat more frequency-accurate than e.g. most image interpolation methods, | ||
+ | and is sometimes preferred in e.g. scientific applications, | ||
+ | being information-truest for pixel/voxel data, | ||
+ | and it being slow doesn't matter. | ||
+ | Note it does introduce problems at the edge. | ||
+ | |||
+ | |||
+ | --> | ||
+ | |||
+ | ======...cut?====== | ||
+ | |||
+ | <!-- | ||
+ | : '''What happens on cut, pad, etc.''' | ||
+ | |||
+ | When you cut a smaller area from your real-space image, | ||
+ | you give it less data, so fewer bins. | ||
+ | |||
+ | In most cases the real-world-meaning of a sample (time, pixel, voxel) does not change, | ||
+ | so the Nyquist point has the same real-world meaning, | ||
+ | so the net effect is that your smaller bit of data is reported in fewer bins. | ||
+ | |||
+ | Taking patches of sound/image data can sometimes be useful, e.g. | ||
+ | : to judge how consistent a part of a signal is in a whole (e.g. find an intermittent beep) | ||
+ | : judge how much frequency content varies (e.g. find which parts of an image are blurry background, or have edges in different directions) | ||
+ | : to bring out weak signal (e.g. CTF determination in electron microscopy - this is effectively a tradeoff that lowers frequency-space precision but makes the weak signal (Thon rings) strong enough to model stably) | ||
+ | --> | ||
+ | |||
+ | ======...zero-pad?====== | ||
{{stub}} | {{stub}} | ||
+ | <!-- | ||
+ | Zero padding in one space leads to increased sampling rate in the other. | ||
+ | It's roughly equivalent to doing a non-padding transfer, | ||
+ | then following it up with a sinc interpolation. | ||
+ | |||
+ | |||
+ | |||
+ | '''Zero padding in fourier space''' turns out to be easier to explain. | ||
+ | It's basically just an easy way to get arbitrary-factor resize. | ||
+ | |||
+ | In terms of frequency-information content, the frequency spread of the distorts is more equal than many other types of interpolation{{verify}}, | ||
+ | so resizing in fourier space sees use e.g. in processing of low-SNR signals, and makes more sense when there are other fourier-space things you were doing (e.g. filtering). | ||
+ | |||
+ | http://users.ece.gatech.edu/~mcclella/courses/ee4078/FFT-interpol.pdf | ||
+ | |||
+ | https://dspguru.com/dsp/howtos/how-to-interpolate-in-time-domain-by-zero-padding-in-frequency-domain/ | ||
+ | |||
+ | |||
+ | |||
+ | '''Zero padding in real space''' is a little funnier. | ||
+ | |||
+ | That is, it will change how many frequency bins there are in fourier space, | ||
+ | and how they align with your data. | ||
+ | |||
+ | The first leads people to figure it's more resolution. | ||
+ | It's not. Sure it's more bins, but comparing padding to not padding the same data, it's really just interpolating between them. | ||
+ | |||
+ | It may even ''look'' higher-detail, because you're effectively interpolating the frequency-space output, and that nicely rounded shape comes not from the data but from the sinc function (that the interpolation effectively involves). | ||
+ | |||
+ | |||
+ | That said, there are still good reasons to zero-pad. Most common seem to be: | ||
+ | |||
+ | * you can guide the bin (resp. spectral leakage) to be better aligned with specific purpose of yours | ||
+ | : e.g. peak finding if you don't mind multiple FTs to get the best | ||
+ | |||
+ | * when cross-correlating or convoluting, the result is longer, and zero padding gives it space to get stored | ||
+ | : meaning that the actually-circular convolution is isolated, wrapping around to only zeroes and not other data, and effectively an linear convolution. | ||
+ | |||
+ | |||
+ | * smoothing in the frequency domain may be handier to | ||
+ | |||
+ | * can be handier to give equal-sized outputs when doing DTFTs | ||
+ | : (coding convenience, mostly) | ||
+ | |||
+ | * allows calculation when FFT implementation only work on power-of-2 inputs | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | http://www.bitweenie.com/listings/fft-zero-padding/ | ||
+ | |||
+ | http://en.wikipedia.org/wiki/Window_function#Spectral_analysis | ||
+ | |||
+ | http://dsp.stackexchange.com/questions/741/why-should-i-zero-pad-a-signal-before-taking-the-fourier-transform | ||
+ | |||
+ | http://math.stackexchange.com/questions/26432/discrete-fourier-transform-effects-of-zero-padding-compared-to-time-domain-inte | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | : '''Temporal versus spatial frequency''' (time sampling versus point sampling) | ||
+ | |||
+ | Most times, you apply FTs to time series, so you can say the frequencies are ''temporal'' frequencies. | ||
+ | |||
+ | |||
+ | When you apply FTs to images, you can say you get '''spatial frequencies'''. | ||
+ | Samples are pixels, so cycles/bins are now per pixel. The first bin is the DC component, the second means '1 period every 2 pixels', etc. | ||
+ | |||
+ | |||
+ | It's the same basic idea, but the meaning may be a little more elusive. In part because the pixel size is often not tied to a real-world length/size, and so neither is the bin interval, bin bandwidth, or Nyquist frequency. | ||
+ | |||
+ | There are exceptions: implicitly or explicitly flat things. Consider how microscopy, crystallography, and arguably things like cartography deal with flat projections. | ||
+ | |||
+ | |||
+ | Also, these results are typically messier, because few images are periodic - consider photos. | ||
+ | {{comment|(Even working just locally isn't great - image and video compression tend to work on pixel chunks that are, 16, 8, or 4 pixels wide/high - often square, sometimes rectangular. They're not larger, partly because ringing artifacts (Gibbs phenomenon) would be present in that larger area. Which is visible in JPEGs on sharp diagrams, even with its 8x8 cells)}} | ||
+ | |||
+ | Arguably much of the detail that characterizes the size of things is near the center, and much of the higher frequency content isn't as relevant. | ||
+ | |||
+ | |||
+ | You get amplitudes and phases that seem to sort of frantically scramble to approximate the data (consider how block waves lead to overtone superposition). Yes, it's what it should be doing for the given data, but not necessarily what you wanted out of the analysis. It's rare to see anything clean come out. | ||
+ | |||
+ | |||
+ | --> | ||
+ | |||
+ | ======...choose different window sizes for analysis?====== | ||
+ | <!-- | ||
+ | |||
+ | This mostly indicates the STFT, means taking shorter | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | --> | ||
+ | |||
+ | =====Interpretation: sound, images, etc.===== | ||
+ | <!-- | ||
+ | |||
+ | Sound is regularly sampled, meaning they can be directly related to the physical property of frequency. | ||
+ | |||
+ | |||
+ | |||
+ | Images are funnier. | ||
+ | |||
+ | Fourier analysis can be applied to pixel data just fine. And such relation to physical size exists if it's e.g. an X-ray or something else flat, but less meaningful for things like whole photographs with depth and interesting things in it. | ||
+ | |||
+ | This relates somewhat to [[point sampling]], but more to the fact that as a whole it is simply ''very'' aperiodic. | ||
+ | Similar story for hard-edged diagrams. | ||
+ | : Fourier may be useful to do filtering and reconstruction, but the intermediate frequency-space representation will be less informative than it is for naturally more periodic data. | ||
+ | : The 2D equivalent of the STFT can still be interesting, e.g. to tell where in the image things happen more - and in what direction. | ||
+ | |||
+ | |||
+ | --> | ||
+ | |||
+ | ====Practicalities - alternatives, interpretation==== | ||
+ | |||
+ | =====Relevant to choice of library functions===== | ||
+ | |||
+ | ======On real and complex input/output, and symmetry====== | ||
+ | {{stub}} | ||
<!-- | <!-- | ||
Line 285: | Line 493: | ||
Note that the output of DFT, i.e. the fourier space representation, is always complex. | Note that the output of DFT, i.e. the fourier space representation, is always complex. | ||
+ | |||
+ | Mathematically you can see these as 2D vectors. Note that the real and imaginary parts do ''not'' correspond to amplitude and phase, they entangle both. Even when just displaying, using only the real part is fairly meaningless; you want the amplitudes (whereas the phase would be in the angle). | ||
+ | |||
+ | |||
It turns out that the math lends itself equally to having real-valued and complex-valued input, so the most generic DFT functions provided tend to cover at least: | It turns out that the math lends itself equally to having real-valued and complex-valued input, so the most generic DFT functions provided tend to cover at least: | ||
Line 344: | Line 556: | ||
--> | --> | ||
− | ======On shift | + | ======On shift====== |
− | + | ||
<!-- | <!-- | ||
+ | Most DFTs will output the zero/DC frequency at the start/end of the | ||
+ | |||
+ | The shifted variant will put it in the middle. | ||
+ | This really just moves all values over by length(output)/2 | ||
+ | |||
+ | So there is no change in information. | ||
+ | |||
+ | |||
+ | |||
+ | So why do it? Purely practical reasons. | ||
+ | |||
+ | For some filters are slightly easier to express on the filtered version | ||
+ | |||
+ | for 2D FFTs, the low-frequency stuff (typically most important) is just more visible. | ||
--> | --> | ||
+ | ======On and even/odd-sized input====== | ||
+ | <!-- | ||
+ | {{verify}} {{verify}} {{verify}} {{verify}} {{verify}} | ||
− | =====Output - | + | Even-sized inputs give bins like (DC, 1, 2, 3, 4 (Nyquist), 3, 2, 1). |
+ | Odd-sized input give bins like (DC, 1, 2, 3, 3, 2, 1). | ||
+ | |||
+ | While the Nyquist bin is often not useful, it's often easier to work only with even-sized input for other reasons. | ||
+ | |||
+ | In part because there's enough details/variantions about the calculations/output already, and in part because faster are two-powers or factorizable and mostly even))}}. | ||
+ | |||
+ | |||
+ | https://dsp.stackexchange.com/questions/2970/how-to-make-frequency-axis-for-even-and-odd-fft-length | ||
+ | |||
+ | |||
+ | --> | ||
+ | |||
+ | =====Output - value scale and units===== | ||
<!-- | <!-- | ||
− | + | One question people have amount to "what are the units of the bins?" | |
− | |||
− | |||
− | + | Consider that your signal has a total energy, which you are distributing over some amount of bins. | |
+ | [https://en.wikipedia.org/wiki/Parseval%27s_theorem Parseval's theorem] says that the FT is unitary, i.e. its representation conserves energy. | ||
+ | When you only care to compare relative peaks, you don't need to care about that. | ||
− | |||
− | + | ||
− | When you care about the | + | |
+ | |||
+ | When you care about the absolute meaning of values (rather than just relative amplitudes), you will want to know about the following: | ||
* energy accumulates | * energy accumulates | ||
Line 380: | Line 622: | ||
* if you effectively widen the bin's bandwidth (see notes below), you accumulate energy from a wider bandwidth | * if you effectively widen the bin's bandwidth (see notes below), you accumulate energy from a wider bandwidth | ||
: to correct for that (as in, get the average energy per Hz, not the cumulative for the bin), see the notes on spectral density below. | : to correct for that (as in, get the average energy per Hz, not the cumulative for the bin), see the notes on spectral density below. | ||
+ | |||
+ | |||
+ | So if you have fewer/wider bins, that means higher values in a bin. | ||
+ | So wider bins catch more energy and have higher values. | ||
+ | |||
+ | So if you want to ''compare'' FT results, this adds an extra step (or two), | ||
+ | (or a well informed assumption via implementation). | ||
+ | |||
+ | The amplitude means nothing without knowing the width. | ||
+ | Thinking about this can be a little counterintuitive because real-world intuitions can steer you the wrong way. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | In most implementations (a few will do this for you), to get FFT amplitudes equal to the amplitude signal in the origin domain, you need to normalize (divide) the result by the number of sample points you handed in. (why exactly?{{verify}} | ||
+ | |||
+ | |||
+ | |||
Line 452: | Line 712: | ||
: If you care about real-world interpretation, then you want most of the above corrections, and a reference. | : If you care about real-world interpretation, then you want most of the above corrections, and a reference. | ||
: If you care only about relative amplitudes, and want to do the least amount of extra work, then you can often get away with (informedly) doing none of it. | : If you care only about relative amplitudes, and want to do the least amount of extra work, then you can often get away with (informedly) doing none of it. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | (Even then there are footnotes. For example, bins may align bad enough for even a pure sine wave to have amplitude split over two bins, that now look to be half the amplitude each. It conserved energy, but may not be what you had in mind) | ||
+ | {{comment|(If spectral leakage is a real problem for your goal, consider alternatives. | ||
+ | Looking for a specific signal may be better done with something parametric, | ||
+ | as detecting pitch often involves autocorrelation, wavelets have better time-domain tradeoffs, etc)}} | ||
Line 538: | Line 810: | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Line 633: | Line 896: | ||
--> | --> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=====Output - note on (non)linearity===== | =====Output - note on (non)linearity===== | ||
Line 698: | Line 920: | ||
You can make a similar table for sound. | You can make a similar table for sound. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Line 1,194: | Line 1,202: | ||
====Variants and alternatives==== | ====Variants and alternatives==== | ||
− | + | <!-- | |
Variants closer to the DFT include: (For alternatives see sections below) | Variants closer to the DFT include: (For alternatives see sections below) | ||
Line 1,216: | Line 1,224: | ||
: an application thing, of chunking your input and applying the FFT to each | : an application thing, of chunking your input and applying the FFT to each | ||
+ | --> | ||
Revision as of 15:12, 9 June 2019
Template:Audio and signal processing
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
(See also the glossary)
Contents
- 1 Visualization types
- 2 Transforms; the relation between waveforms and frequency content
- 2.1 Fourier transform
- 2.1.1 Basic idea, and some uses
- 2.1.2 Practicalities - constraints
- 2.1.3 Practicalities - common solutions/uses/alternatives
- 2.1.4 Practicalities - alternatives, interpretation
- 2.1.5 Error
- 2.1.6 More math
- 2.1.7 Relations between fourier, convolution, and correlation
- 2.1.8 Displaying
- 2.1.9 Multi dimensional FT
- 2.1.10 Variants and alternatives
- 2.2 Wavelet transforms
- 2.3 Hartley transform
- 2.4 Related concepts
- 2.5 See also
- 2.1 Fourier transform
- 3 Other frequency-related transforms
- 4 Filtering theory
- 5 Filters and analyses
- 6 Activity Recognition
- 7 Fourier correlation notes
- 8 Unsorted / See also
Visualization types
Transforms; the relation between waveforms and frequency content
...often for converting between time domain and frequency domain.
Fourier transform
Note: If you like math, you will prefer other explanations. This one is going more for intuition, context, and footnotes of practice. (Well, ideally)
Basic idea, and some uses
Jean Baptiste Joseph Fourier figured that any periodic vibration can be constructed as a weighed sum of an infinite series of number of harmonically related sinusoids. Where harmonically related means 'all of these sines have a frequency that are multiples of a base frequency.
So Fourier analysis does that: it takes a signal (often a time series), and expresses it as how much of each such periodic component there is.
Mathematically, the result of this analysis can be seen as magnitude and phase for each frequency component.
- Magnitude is the amount presence of this frequency
- Phase is mostly necessary to reconstruct it accurately, intuitively to shift each part to the right place
When you have both, the fourier-space data is actually entirely equivalent information to what it came from (...to within floating-point errors).
Some applications rely on this equivalence to do something that just happens to be easier to do in one space or the other.
...whereas e.g. a spectrogram visualization throws away the phase data, because it's visualizing how much of each frequency component there is (s just need the magnitudes).
Fourier synthesis is the reverse, taking frequency-space information, and synthesizes the (often time-series) data it represents.
Synthesis is sometimes seen by itself, doing signal generation, such as in mimicking some types of sounds.
More frequently it is combined with Fourier analysis, often to do do filtering that is much easier to express in fourier space.
Fourier implementations can be used to assist some other calculations,
often because it is mathematically related to convolutions.
For example, you can do large convolutions more efficiently (than naive convolution implementations), because the convolution theorem implies that element-wise multiplication in the frequency domain is equivalent to convolution in the time domain.
Practicalities - constraints
Bad words: infinite
Infinite sounds bad for real world application.
Not as bad as you may think.
For discrete data, the amount of sinusoids necessary is strictly bound by the input size.
...because discrete data is implicitly band-limited - it cannot represent anything higher.
...which just sidesteps the potential issues, in part because you faced those earlier: If that discrete data is a sampling of something that did contain something higher, you've now either sampled it poorly (possibly introducing some nasty artifacts, primarily aliasing) or you've filtered it out before it got to the sensor.
For example, anything made to digitally record sound will avoid aliasing by doing filtering, taking out frequencies higher than the ADC is looking at. (actually there are some variants of implementation. Historically it might have been a physical filter, these days it's often part of the supersampling that nicer DACs do anyway)
If it didn't do that filtering, the data would still be band limited, but it would probably include a little aliased signal (not very much because most microphones barely respond over ~15kHz, and the world is often fairly quiet in this range anyway).
Bad words: periodic
The word periodic turns out to be the larger problem.
A fundamental property/assumption of the FT is that it essentially assumes the entire signal is a single infinitely repeating chunk (not exactly, but intuitively true enough).
Even for very periodic signals, even a perfect sine wave,
perfect periodicity happens only when the wavelength is an exact multiple of the window size. In other words, basically never.
If you intuit the FT as a tuning fork for each output bucket, then it's still mainly the right (closest) bucket that responds the most, but the adjacent few will do so too. This is one reason for spectral leakage.
Another practical problem is that, between the first and last sample in your data, there will usually be a large step in values.
Which is a slope discontinuity just like a large step in the middle of your data would be. It will try to model this as best it can, which it will do with all the frequencies - this is a problem with any pulse or step, which is also spectral leakage. (This is vague description, find the math if you care)
Since this one is not even actually present in your data, it's effectively wrong, though note that there's only so much energy in it.
If just showing the spectrum, you've muddled it a tiny bit. If doing FT space filtering, you've introduced new signal (and exactly how much depends on implementation details).
Another problem is that the longer a sample is, the likelier any real-world signal is to contain something non-periodic.
Just because most real signals are not pure and perfectly behaved.
Say, 0.1 seconds from a piece of music is likely to mostly just contain a sustained note, or silence, or something noisy, or something else somewhat specific. A 10 second chunk? Not so much. A 100 second chunk? Basically not.
Sure you can analyze these well enough, in that synthesis will reconstruct the same thing. But the less periodic the content really is, the less intuitive the frequency-space data is to interpret or manipulate.
This is related to that you get no time information. The basic FT gives output that summarizes the overall frequency content of the entire input.
...and phase content that puts things in the right place in time, which is very much central to reconstruction because of the way it combines. In fact, the phase is arguably more information-dense than the ampitudes. It's just so entangled that it is almost impossible to read off anything useful from the phase information.
If you want frequencies over time information, such as a spectrogram[1], you probably want the STFT (short-time FT).
This is a controlled tradeoff between time resolution and frequency resolution, by taking the FT of short chunks of signal, which both localizes and reduces the problems mentioned above (e.g. considering music in (overlapping) 0.1 second chunks).
- It's not perfect, but it's good enough for many needs. For example, music and other sound analysis is typically done this way.
On 'Band limited'
Your data must be band limited, meaning it must have no frequency content beyond a cutoff band.
With digital samples, this is mostly something you need to think about at the sampling step.
Once you have discrete-interval data, it's implicitly already band-limited. It can no longer represent frequencies beyond its Nyquist frequency.
That doesn't always mean your sampled data is exactly what you wanted.
For example, if the band limit of your recording is lower (because low sample rate) than the frequency of the sound you recorded, and you didn't filter it out, the physically present higher frequencies, they will be present in aliased form.
For sound recording, this is almost always solved by the recording device doing implicit filtering typically as an electronic lowpass (or sometimes via a supersampling ADC).
But in more generic hardware, it's something you need to think about.
The reason behind the requirement is that mathematically, Fourier analysis is guaranteed to converge only for band-limited signals.
Which is mostly just a mathematical property of the general theory related to the word 'infinite' in 'infinite sum of sinusoids'. (Without being band limited, such as for FTs on continuous mathematical functions, it would actually be infinite, or at least become a more complex issue.)
So for band-limited data, a finite sum is enough. In fact, there is a hard bound on the necessary amount of components.
(And for a lot of everyday sound, a modest amount of sums will already be a decent approximation)
For example, a sweater that shows moire patterns on TV is entirely band limited once on TV, but not quite what you intended to record, because the sampling was rougher than what was there. What you would generally prefer in this case is filtering before sampling so that it's fuzzy, instead of distracting.
Practicalities - common solutions/uses/alternatives
On windowing
The way people often use the Short-Time Fourier Transform (STFT), plus window function, will address the 'no time information', 'has to be periodic', 'likely step at the edge', and optionally the 'avoid losing too much information' issues together.
'Windowing' can refer to both taking a block of data at a time, and to applying an envelope within that window, typically called a window function.
In many applications (e.g. spectrograms, because STFT) you do both.
You suppresses the step-at-the-signal-edge discontinuity by suppressing the entire signal at the edge with the mentioned window function (envelope).
If you care to get the most out of your signal (or get representative amplitudes), then you will care that this lowers the signal strength before handing it to the FT.
If you were to partition the data (i.e. no overlap) and apply a windowing function to each, you wold effectively be ignoring part of it, so lose part of the signal. In spectrograms you might not care, but in low-signal-to-noise applications you may wish to overlap these blocks.
You choices:
- choice of window size is a tradeoff between time resolution and frequency resolution (generic STFT stuff)
- choice of window function offers a few different tradeoffs - roughly how large the main lobe of the frequency response is and how large the tail is.
- ideal window overlap depends on the exact shape of the window function.
- Exactly how much overlap is (mathematically) ideal depends on the characteristics of the windowing function - described in a few places, see e.g. "spectrum and spectral density estimation by the discrete fourier transform".
- Secondarily on the computational resources you care to spend, as it's a diminishing-returns thing.
Data massaging and tradeoffs - what happens when you...
...resize?
...cut?
...zero-pad?
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
...choose different window sizes for analysis?
Interpretation: sound, images, etc.
Practicalities - alternatives, interpretation
Relevant to choice of library functions
On real and complex input/output, and symmetry
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
complex real --FT--> complex complex --iFT--> complex
and, not unusually,
complex --iFT--> real
The last is, in some ways, not special at all, just the same as the one before it but with the assumption that all imaginary output components happen to be zero. This often amounts to throwing them away, which is valid only if they actually were (which, under many controlled circumstances, they weill be, within rounding error of being zero).
So a DFT package will often have a number of variant functions.
The most generic DFT functions often take complex-valued input, and output bins for -f/2 through f/2. This because for complex input, the two parts are not redundant/symmetric.
However, most practical applications are real-valued input. You can turn that in complex-with-zero-imaginary to hand it into the just-mentioned function, but it would do some redundant calculation, and the output would be symmetic around 0.
As such, most packages also have a 'FT on real-valued input' function, which will be a little faster.
As FFTW points out, "In many practical applications, the input data in[i] are purely real numbers, in which case the DFT output satisfies the 'Hermitian' redundancy: out[i] is the conjugate of out[n-i]. It is possible to take advantage of these circumstances in order to achieve roughly a factor of two improvement in both speed and memory usage"
(Some will still give the redundant data, some will return bins for 0..f/2. (verify))
(If you care to look up the math, look for terms like heritian symmetry and conjugate complex symmetry)
TODO: add diagram of real input versus complex input http://stackoverflow.com/questions/6740545/need-help-understanding-fft-output
Similarly, when you have complex data with hermitian redundancy and want to inverse-FT it, the output is complex with zero imaginary component (within rounding error) or, if you (can) ask the libary, real output.
This saves (only) a little calculation, some memory, and probably some code on your side.
"So what if you did
real --FT-> complex --iFT-> complex
and you notice the complex output has non-zero imaginary data?"
Well, if you did just that without altering the frequency-space data, then they'll be near zero, within rounding error.
But you probably did more, e.g. implemented a frequency-space filtering, then chances are you might have broken the conjugate symmetry doing so.
How much that actually matters depends on what you were doing.
https://stackoverflow.com/questions/19863410/conflict-in-imaginary-parts-after-ifft
https://dsp.stackexchange.com/questions/37272/how-to-get-a-real-signal-from-complex-ifft-values
-->
On shift
On and even/odd-sized input
Output - value scale and units
Output - frequency bins
Output - bin width and position
Output - note on (non)linearity
Error
Spectral leakage, window functions
(Temporal) Aliasing
Picket fence effect
Ringing (Gibbs phenomenon)
More math
Relations between fourier, convolution, and correlation
Displaying
Multi dimensional FT
Variants and alternatives
See also
Wavelet transforms
Hartley transform
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
Similar to the Fourier transform, which
- transforms real data to real data
- is its own inverse (aside from scaling)
The FHT (fast discrete HT) is not fundamentally faster to calculate than FFT variants that works on real data, although in specific cases there are more optimizations, so in simpler implementations, the FHT uses slightly less calculation than the FFT.
This (and the 'its own inverse' sometimes being convenient) mean you see the FHT in memory-limited hardware.
For example: http://wiki.openmusiclabs.com/wiki/ArduinoFHT
Related concepts
For a wider overview, you will want to distinguish between the various related terms, which include (roughly ordered from more general/abstract to more applied/discrete):
- Z-transform
- something of a generalization of the Discrete-time Fourier transform (DTFT, and not to be confused with the DFT)
- Takes a discrete real-numbered or complex-numbered time-domain signal and calculates a complex frequency-domain representation (apparently making it a discrete variation of the Laplace transform)
- Fourier Series
- Usually refers to the idea that x(t) is the sum of an infinite number of sinusoids
- Fourier series integral
- Mathematical basis for Fourier analysis
- Fourier Analysis
- x(t) to {a_{k}} (time series to frequencies)
- (the effective opposite of Fourier Synthesis)
- Fourier Synthesis
- {a_{k}} to x(t) (frequencies to time series)
- (the effective opposite of Fourier Analysis)
- Note that most Fourier transforms have an inverse to allow (re-)synthesis, e.g. iDFT, iFFT, etc.
- Fourier Transform(s)
- Can refer to
- The result of a Fourier analysis (the frequency-domain information)
- The method of converting to that domain, and usually back from its results (Fourier analysis, and synthesis)
- The more mathematical definition of the Continuous Fourier Transform, that transforms a function of a real variable into another, the frequency domain. (in which both are continuous and unbounded(
- A group of implementations for this transform
- Can refer to
- The Discrete Fourier Transform (DFT)
- a specific, practicalized version of the (continuous) fourier transform, in that
- it is finite-domain and for discrete-time functions
- input function must be discrete - a real- or complex-numbered time series, often from sampling a continuous signal
- only analyses frequency content required to approximate the given segment (unlike the DTFT)
- a specific, practicalized version of the (continuous) fourier transform, in that
- Fast Fourier Transform (FFT)
- Refers to a number of different algorithms to quickly calculate a DFT
- ...since the straightforward calculation of DFT takes O(n^{2}), and FFT takes O(n log n)
- Short-term Fourier Transform STFT
- Refers to the fourier transform (often specifically analysis) in which windowing is applied (in the forms of continuous theory, and/or discrete application)
- ...for a controlled tradeoff between frequency resolution and time resolution.
Other related concepts and transforms:
- Fourier sine transform and Fourier cosine transform [2], special cases of the continuous fourier transform for odd and even functions. Probably most often applied as the discrete sine transform [3] (DST) and discrete cosine transform (DCT)
See also
- Discrete Fourier Transform, Aug 2004, http://astronomy.swin.edu.au/~pbourke/analysis/dft/
- Discrete Fourier Transform and the FFT, Aug 2004, http://www.cage.curtin.edu.au/mechanical/info/vibrations/tut4.htm
- Introduction to DSP - frequency analysis Fourier transform, Aug 2004, http://www.bores.com/courses/intro/freq/3_ft.htm
- Part I: Fourier Transforms and Sampling, http://www.silcom.com/~aludwig/Signal_processing/Signal_processing.htm
- Zero Padding Does Not Buy Spectral Resolution, Aug 2004,
http://zone.ni.com/devzone/conceptd.nsf/webmain/81227DF4C2952C1A86256CA80053F322
- Maria Elena Angoletta, Fourier Analysis / Part 2: Technicalities, FFT & system analysis, Aug 2004,
- FFT Links http://www.fftw.org/links.html
- Discrete-time Fourier transform (DTFT)
- Fractional Fourier transform (FRFT)
- Modified discrete cosine transform (MDCT)
- Modulated complex lapped transform (MCLT)
Cepstrum, Cepstral transform
Filtering theory
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
Time-domain filters
Frequency-domain filters
Filters and analyses
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
Frequency pass and stop filters
Low-pass
Filters out frequencies higher than a cutoff frequency.
Regularly used in analog form, e.g. before sampling sound in sound cards, in DSL filters, etc.
Note that blurring samples works as a lowpass filter, in sound, images, and elsewhere.
Anti-aliasing filters are regularly lowpass filters.
See also Low-pass filter
High-pass
Filters out frequencies lower than a cutoff frequency.
Band-stop
Also known as band limit filter, notch filter, 'T-notch filter' and also 'band-elimination filter', and 'band-reject filter'.
The band that is filtered out is referred to as the stopband.
Band-pass
The band that is left out is referred to as the passband.
Can be seen as the combination of a high-pass and a low-pass.
Filterbanks
Subband coding
Separate out different frequency bands, to use different coding for each (for reasons of efficient transmission).
Equalization
Equalization refers to sound processing altering the frequency content/envelope, for correction (e.g. of a recording room's acoustic) or to give sound a particular feel or focus.
Fairly common in software is the graphic equalizer, which adjusts something like 20Hz-20kHz with various sliders, in bands of certain width. There are also hardware equivalents, which may offer feedback detection, to avoid feedback caused by peak sounds (such as vocals) or bad setups (where microphones strongly pick up speaker sound).
Pre-emphasis refers to structurally amplifying some frequencies. For example, vinyl records are mastered with their lower frequencies perhaps 30dB more silent than high frequencies, largely because low frequencies require larger groove size. Mastering with less present low frequencies allows closer grooves and therefore more recording time. Phono pre-amps will equalize the inverse of what was applied to the master, so that they produce approximately what was recorded.
Comb filter
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
A comb filter adds delayed copy of a signal to itself.
http://en.wikipedia.org/wiki/Comb_filter
More complex applications
(...which are sometimes presented as complete filters)
Noise, SNR, etc.
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me) |
Methods for noise reduction / signal (recognition) enhancement assistance :
- Spectral subtraction (subtracting a pure noise signal's spectrum from te signal's. )
- time domain neural methods (e.g. direct, Kalman)
- frequency domain neural methods
- wavelet shrinkage/tresholding
- ...many more
Pitch detection, frequency estimation
Envelope detection
Beat/tempo detection
See Descriptions_used_for_sound_and_music#Beat_and_tempo
Dynamic range compression
Dynamic range compression refers to varying amplification, that adapts to loudness.
Music mixing/mastering (where context means it's often just called 'compression'),
is often used to make different kinds of sources (instruments, vocals) behave similarly,
and to let you e.g. draw attention to one thing and make another background texture.
Yes, just adjusting volume does that, but not always well enough.
Some instruments have a much larger dynamic range, that is, the difference between the loudest part and quiet parts are large - say, with drums. You could adjust the volume according to how it's played -- or you could let dynamic range compression essentially just do that for you.
There are some tricks that make for specific sounds, such as the (apparenty) increased decay time, the way drums don't drown out vocals, the way techno beats have bits of silence for emphasis (arguably a mistake, but we're used to it now). This means tweaking the parameters (e.g. the speed at which it allows changes), and amounts to informed use.
Dynamic compression also has risks, such as making things sound less natural, and that you end up with less detail in your mix than in your recording.
The largest potential problem is blanket uninformed application.
For example, home cinema systems often apply this so that you do not have to keep changing the volume in movies with both silent and loud parts, at expense of some quality (that is, dynamic range).
That's useful, and you can disable it if you care.
The same thing is done in broadcasts, to make a radio/tv station sound both consistent in level and perceptually louder. This has no direct use for us, and we can't choose the quality.
Record producers have caught wind of the concept of "compression == perceptually louder", and since they figure "louder = more noticeable = more sales" (probably true), they force mastering technicians to apply more overall compression. None of that volume variation, psh.
Which is sort of stupid, because radio stations do this anyway. (If not quite as aggressively, because it doesn't sound so good) Essentially, record companies sell all their CDs with effectively lowered quality, on the off chance that a station (and/or DJ) is completely ignorant of these details and this squeezes out slightly more perceptual volume.
The loudness wars this resulted in means that perceptual sparkle mentioned earlier (not easily quantifiable since it is an implication of dynamic range that depends on particular recorded signal/music itself) is reduced, sometimes significantly.
(Interestingly, vinyl versions of music may be better -- old ones because it was before this nonsense, new ones because of their target market excludes broadcast)
Vocoders
Voice coders are now mostly used for distorting vocals and instruments in music. The original idea was to parametrize speech for efficient transmission.
A basic vocoder is an analysis/synthesis system, that puts each each band from a multiband filter through an envelope follower (an analog circuit), resulting in the control signals from that follower.
Sending these signals is a simple form of encryption (mostly by obscurity).
A phase vocoder
See also:
- http://en.wikipedia.org/wiki/Vocoder
- http://en.wikipedia.org/wiki/Envelope_follower
- http://en.wikipedia.org/wiki/Phase_vocoder
Linear Predictive Coding (LPC)
Analysis of the spectral envelope of a digital signal, with the assumption of the basic case of voiced speech, and therefore useful for speech parameter estimation.
e.g. seen in LPC vocoders, often for speech transmission.
Warped LPC (WLPC)
CELP and variants
Code excited linear prediction (CELP), with variants including ACELP, RCELP, LD-CELP (ITU-T G.728) and VSELP, and the older RELP, use source-filter models of speech production.
Spectral whitening
Activity Recognition
Fourier correlation notes
See also: