Audio and signal processing glossary

From Helpful
Jump to: navigation, search
This page is in a collection about both human and automatic dealings with audio, video, and images, including


Audio physics and physiology

Digital sound and processing


Image

Video

Stray signals and noise


For more, see Category:Audio, video, images


Related to frequency bands

Band-limited and time-limited

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

A signal is band-limited if it has no frequency content beyond a cutoff point (frequency content zero above a certain finite frequency).

A signal is time-limited if it exists within limited time (nonzero for a finite length time interval).


Frequency-division multiplexing (FDM)

Wireless telecommunication often applies frequency-division multiplexing, meaning that multiple baseband signals are modulated on different carrier waves.

These signals can be put on the same medium, because they're not in each other's way, and with some filtering and demodulating you can get back the original baseband signal.

This is particularly useful for media where you must share, like longer-distance aerial EM transmission.

Downsides include that it is relatively inefficient use of the available spectrum.

Non-FDM methods (for comparison)

Time-division multiplexing (TDM) puts different signals on the same medium, interleaving them over time.

Simpler TDM has fixed-order, fixed-length slots, often pre-allocated. Often used on wired telecommunication. The upside is that it can utilizes the medium efficiently within each time division.

On wired, dedicated channels you can also leave them as baseband signals, which can be simpler.


Packet mode / Statistical TDM is similar to basic TDM but additionally manages/schedules the time slots / packets.

When channels don't always have the same amount to send this will be more efficient use of the medium, at the cost of more management.

Some systems combine multiplexing. For example, local wireless systems (such as WiFi, bluetooth) combine FDM and TDM, in that they apply TDM methods in assigned bands on the EM spectrum.


There are other variations on the time-based idea, with different upsides and downsides. Common downsides include that:

  • time divisions may go unused (particularly in TDM),
  • there is some implied lag (though dependent on the speed of the channel and the time slot length; in fibre and simiarly fast systems it is rarely large enough to matter)
  • the medium is often used in a lowest-common-demoninator way, in that
    • all participants have to agree and cooperate
    • it is somewhat easier for one misbehaving participant to disrupt the channel
    • it is hard to move to a new technology on an existing channel (consider WiFi)

Wideband; broadband

A relative term, describing that a spectrum contains a wide range of frequencies, often in relation to a channel's (coherence) bandwidth.


Wideband also has the implied sense of approximately equal response across the bands referred to.

Broadband (ignoring the varied meanings to different fields) sometimes suggests that gain may vary across the frequencies, and that the bands may be split into channels or frequency bins, as it is e.g. in various practical communication channels to separate signals (consider TV signal modulation, internet modem cooperation, etc).


See also:

Baseband

Various meanings, often used as an adjective in other terms, including:

  • baseband frequencies often refers to unchanged low frequency content of a signal
e.g. before modulating it for transmission, and so also often means "a directly usable signal"
  • Baseband modulation: typically describes a signal that is not modulated
For example, from the perspective of a TV, which demodulates channels from specific RF frequencies, composite video is the baseband signal you get after such demodulation. Early computers, some VCRs nand some early game consoles would output a composite-video signal, and use a RF modulator to modulate this signal on a specific TV channel.
  • baseband bandwidth often refers to the bandwidth of the signal, the highest frequency in a band-limited signal, low frequency conten (the band-limited result of a lowpass), or specifically the frequency content near 0 Hz;
  • A baseband channel may refer to a medium that will nicely transfer low (baseband) frequencies (e.g. short range wired networks)
....because often enough channels have having noise/distortion in particularly low frequencies (because EMI, because of component behaviour, etc.)


See also:

Narrowband

Has different meanings, primarily:

In audio signals, has a sense of 'not using much of the available spectrum' and suggests a a bandpassed signal.

In telecommunication, refers to a signal not (significantly) exceeding a particular channel's coherence bandwidth. Which is an idealization that makes certain theory simpler. Note that this sense does not imply that little of the relatively channel is used.

Coherence bandwidth

Intuitively, the range of frequencies over which a channel's gain / transfer function remains mostly flat/constant.


e.g. matters when channels are near each other and/or are subject to fading, e.g. in cellular communication.


See also:

Transform coding

The transform is often reversible (lossless) in itself but meant for lossy channels, specifically to have losses be less visible/audible via more useful quantization. so uses knowledge about the typical nature of as signal, the way it is perceived.

For example analog audio and video, especially the baseband variants.


https://en.wikipedia.org/wiki/Transform_coding


Sub-band coding

A form of transform coding that breaks audio (or video) into frequency bands to encode separately.

The point is often that you can treat different bands differentlym or that the amount of information in each can matter significantly.


For example, MP3 and many other lossy audio codecs use sub-band coding (combined with acoustic models) to spend the space on the most audible frequencies. It's also useful for lossless variants.


http://en.wikipedia.org/wiki/Sub-band_coding

Filter / model system terms

Q factor

Impulse response

Convolution

Dynamic range

As a concept

Dynamic range is a relative simple idea, yet not all contexts make it directly or equally meaningful.


Loosely speaking, dynamic refers to the range of amplitudes you can work with (you can transmit and/or store) without that signal being lost in immediately unavoidable noise, distortion, clipping, etc.

It is typically expressed as the ratio between the largest and smallest meaningful levels. Typically in dB for practical reasons (big number).


Dynamic range around music is probably most useful to approximate the abilities of recording, storage, transmission, and reproduction equipment.

For example

AM radio has perhaps ~30dB,
early 78rpm records around 35dB,
FM radio is ~50 dB,
casette tape maybe 50-60dB, and fancier tape closer to 70-80dB, [1]
vinyl varies and is harder to measure but quoted at 60dB, (to 70dB at best)

Sensors have dynamic range in that they have a point where they would distort/clip/saturate, and a noise floor below they could not predictably reproduce signal.

E.g. decent microphones may have 70-90dB between acoustic overload, and noise floor. The acoustic overload is typically an absolute level, e.g. in dB SPL, and the noise floor also doesn't move so much, so the dynamic range of what you record also depends on the actual sound levels. (This is a more complex subject that any sound engineer has to read up on)


A critical band in our ears has approximately 30dB of instantaneous dynamic range, i.e. that's what you'll hear within a second or so.

At the most optimistic have 60dB of instantaneousish range to give (across the range and over seconds)

In practice, this range shifts relatively slowly over our full range (or, if protective mechanisms kicked in, very slowly). That full range is around 120dB: if you take the lowest level at which you hear anything at all and call it 0dB SPL, then around 130-140dB SPL is pain and hearing damage.


So while 60-70dB is enough for decent quality reproduction, that leaves zero flexibility. It'd e.g. be fine to have a final mix with 70dB of dynamic range.

Yet whenever levels could vary at all, e.g. while recording, mixing, processing, interconnecting, or transmitting, you want more leeway for the entirely practical reason that without that, signal would so easily either distort/clip in the ceiling, or fading into the noise floor.

This is one reason that DAWs working at higher-bit precisions makes enough sense, yet once mastered properly it matters much less.



Caveats and fuzziness

Analog and digital share the somewhat ill-defined nature of "The smallest level at which signal still exists". There is no strict standard to measure or report that, just honest convention.

And that convention, even when honest, may be specific to a medium so not be directly comparable to others.

And this completely ignores noise floors in real setups, which can easily chop off 10-20dB of what you had.


Arguably dynamic range is a little more defined in digital devices than in analog, but mostly just because digital imperfections behave more predictably - usually the worst that happens is rounding error. Analog's limitations can sometimes behave in more varied and interesting ways.


People regularly treat dynamic range as available precision, but that makes an assumption or two that must hold up in your practice for this to be more than a vague approximation.


Dynamic range is sometimes used to describe a signal, for example that mastered music usually often uses no more than 60dB of 'detail', the meaning of that statement, while sometimes useful, is also quite fuzzy.

That is, amplitude range is necessary, but not sufficient, for detail. Nor is there necessarily a direct correlation, given that some media add compression by nature or convention. Plus this may be separately done to the signal in mastering.

It is useful to point out e.g. that for well-mastered music, people are quite happy when they get 60dB of range (a bit more than FM radio). Above that you get diminishing improvements.


For us humans

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Optimistically, you could say human hearing has over 100dB of dynamic range.


We can adjust to barely still hear things around 0 dB SPL (we defined SPL that way), and exposure to 130dB SPL may be painful but won't immediately destroy our ears.


So if we wanted a theoretical hifi setup where the entire reproduction chain from original microphone to eventual amplifier to speaker to ear be good enough to be able to accurately alternate between reproductions of 'mosquito in the room' and 'jet engine in the next room', without touching the volume dial, you would need over 100dB of dynamic range in all parts.


...there's a lot of footnotes to that.

After a 100dB concert we won't hear small rustling leaves until half an hour after, due to a protective response[2].


If we avoid that protective response, we notice something akin to automatic gain control, adjusting our hearing levels to the loudest sound we heard recently.

Our ears are effectively moving a ~60dB (70dB at best(verify)) band along with the absolute sound levels presented to us, adapting roughly at the scale of seconds.

Which means we could sit back in our chair when listening to classical music that goes from soft instrument to wall of sound and back (the hifi system's noise floor allowing). So the overall dynamic range might be maybe 80-90dB.

It also means that the mastered music, within the scale of a second, need not present us with more than that dynamic range. We just won't hear it.


Actually, our ears's adaptivity has some interesting patterns, which breaks simple applicability of dynamic range. A major one is that our adaptivity isn't uniform, and also depends on frequency. The more uniform part is at most 60dB wide, whereas each critical band within our ears probably gives only ~30dB(verify).


tl;dr: we are typically pretty satisfied once we hear around 60dB of range. A bit more can be nicer for subtle productions, in a pragmatic "don't want to touch the volume knob" way.


During DSP and mastering, more is useful for reasons relating to accuracy when you do lots of intermediate steps.


Other notes:

  • Apparently we consciously hear 1dB variation, and can be unconsciously aware of down to 0.2dB or so
  • We prefer moderately loud music - up to a point - which is in fact a trick used by HiFi salespeople.

In discrete sampled data

Dynamic range in digital systems is a somewhat simpler case than that in analog devices, because the theoretic limits are well defined.

(Though people manage to misinterpret them all the same)


You know your smallest and largest number, and the only direct source of error is rounding error, which is predictable.

You can decently characterize sampling (ADC), digital processing, storage, as well as the DAC that produces the sound again.


Storage is often done in 16 bit integer form, sometimes 24-bit integers, sometimes the 32-bit float, and historically in 8-bit bytes.

The higher numbers are mostly there for DAWs and mastering and anything else that wants more headroom during processing.


The theoretic dynamic ranges of integer types is (roughly 6dB per bit):

48dB for 8-bit,
96dB for 16-bit,
144dB for 24-bit integers .

These values are idealized, treat them as approximate.

In the real world they both optimistic and a little pessimistic, for different reasons (TODO: list some major ones).



Floating point numbers are a different story. If you'ld take the smallest and largest storable values you'ld get a ridiculously large decibel range (), which is almost worthless for the purpose of indicating resolution.

The reason is that, from a real-number-set-viewpoint, levels are not distributed evenly: they are dense around zero, and sparse around the minimum and maximum.

This can actually be useful to some things, e.g. for storage and simple mixing, e.g. because there is less need for companding than in integer storage.

As a rough indication of floating point dynamic range, the amount of bits in the mantissa is a decent indication (23 in single precision, 52 in double precision), though in most practice a few bits less than that.





Headroom

Headroom is the amount of amplitude that a system can manage above a particular point.

The reference level should lie some amount below the maximum (usual) signal value; the headroom allows peaks above it to record and be transmitted without any distortion.

The term has the clearest meaning in systems that actually define a reference level, such as in broadcasting systems (see e.g. [3]), where it is some standardized extra safety that avoids distortion.


One of the clearer examples of lack of headroom are sound cards that do not consider the possibility of multiple signals and/or loud signals (and may have volume sliders that actually amplify for a small part of their range), this may actually mean it will clip the sound.

Some well-designed sound cards have the digital headroom and output normalization to deal with that amplification gracefully, while others don't and will simply clip loud music, so unless you know your sound card very well, you may be best off leaving that volume slider at ~80%. If your sound sometimes gets noisy when you have sliders all the way up, this may be the reason. Try amplifying with the master volume control, or with the amp/hifi set.


Headroom allows one to have an allowance of amplitude above the average level. It allows for more dynamic range in that, say, if an amplifier can deal with brief percussion hits above the rest of the music, this is generally perceived as a sparkling quality to the music (whereas dynamic compression to reduce those spikes, or distortion from not dealing with it is not).

In recording, you should probably allow for at least 20dB of headroom above the average level you are recording (and preferably a bunch more) so that short spikes will record without distorting. This differs, of course - live sound will be less predictable than a radio station playing music.


Without a real reference level, the headroom you get depends more on the various volume knob involved. In home systems, it depends on various amplifications you may not know about, and arguably even the speakers.


In recording

Studio stuff

Monitors

Monitors, also known as 'studio monitors' and 'near-field monitors', are there to assist mixing. The issue they address is that if the speakers you use in monitoring your edits have some emphasis, you would likely overcompensate on the general/other speakers.

Depending on who you ask, this means one of two things:

  • Idealists say the monitor should be very truthful frequency-wise and accurate at various volumes, so you can hear what is actually on the track in the way that will make it onto the mix.
  • Pragmatists argue you should just get an average speaker so that you'll mix for how things will sound on the average non-quality system - adjusting for average speaker bias out there. (This has on some occasions led to rather badly adjusted mixes)

A monitor being near-field refers to the hearing the directly procuced sound and minimal reflected sound - often by sitting fairly close to it.

Headphones, even quality headphones, are rarely fit as monitors.