Audio and signal processing glossary

From Helpful
(Redirected from Headroom)
Jump to navigation Jump to search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: file format notes · video encoding notes · On display speed · Screen tearing and vsync

Simpler display types · Video display notes · Display DIY
Subtitle format notes


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical and technical terms
MIDI ·
Some history, ways of making noises
Gaming synth ·
VCO, LFO, DCO, DDS notes
microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DIY
physical
Electrical components, small building blocks
Learning from existing devices
Electronic music - modular - DIY


DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images


Related to frequency bands

Band-limited and time-limited

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Frequency-division multiplexing (FDM)

Wireless telecommunication often applies frequency-division multiplexing, meaning that multiple baseband signals are modulated on different carrier waves.

Another, possibly more intuitive way of saying that is to say they exist within a band around that carrier wave. FM radio station has audio maybe ~55kHz around each frequency (and then places those 200kHz apart, largely to avoid interference, which is why radio stations often jump 88.0 88.2, and so on).


These signals can be put on the same medium, because they're not in each other's way, and with some filtering and demodulating you can get back the original baseband signal.

This is particularly useful for media where you must share, like longer-distance aerial EM transmission (radio).

Downsides include that it is relatively inefficient use of the available spectrum.

Non-FDM methods (for comparison)

Time-division multiplexing (TDM) puts different signals on the same medium, interleaving them over time - only one ever transmits.

This is only practical if the collective use is enough to serve everyone fine, and you do not need it to be immediate.


Simple TDM has fixed-order, fixed-length slots, often pre-allocated.

The upside is that it can utilize the medium fully (so efficiently) within each time division.


It can be used on the air, but is arguably even more common on wires.

On wired, dedicated channels you can also leave them as baseband signals, which can be simpler.



Packet mode / Statistical TDM is similar to basic TDM but additionally manages/schedules the time slots / packets.

When channels don't always have the same amount to send this will be more efficient use of the medium, at the cost of more management.

Some systems combine multiplexing. For example, local wireless systems (such as WiFi, bluetooth) combine FDM and TDM, in that they apply TDM methods in assigned bands on the EM spectrum.


There are other variations on the time-based idea, with different upsides and downsides. Common downsides include that:

  • time divisions may go unused,
  • there is some implied lag
(dependent on the speed of the channel and the time slot length; in fibre and similarly fast systems it is rarely large enough to matter)
  • the medium is often used in a lowest-common-demoninator way, in that
    • all participants have to agree and cooperate
    • it is somewhat easier for one misbehaving participant to disrupt the channel
    • it is hard to move to a new technology on an existing channel (consider WiFi)

Wideband; broadband

A relative term, describing that a spectrum contains a wide range of frequencies, often in relation to a channel's (coherence) bandwidth.


Wideband also has the implied sense of approximately equal response across the bands referred to.

Broadband (ignoring the varied meanings to different fields) sometimes suggests that gain may vary across the frequencies, and that the bands may be split into channels or frequency bins, as it is e.g. in various practical communication channels to separate signals (consider TV signal modulation, internet modem cooperation, etc).


See also:

Baseband

Various meanings, often used as an adjective in other terms, including:

  • baseband frequencies often refers to unchanged low frequency content of a signal
e.g. before modulating it for transmission, and so also often means "a directly usable signal"
  • Baseband modulation: typically describes a signal that is not modulated
For example, from the perspective of a TV, which demodulates channels from specific RF frequencies, composite video is the baseband signal you get after such demodulation. Early computers, some VCRs nand some early game consoles would output a composite-video signal, and use a RF modulator to modulate this signal on a specific TV channel.
  • baseband bandwidth often refers to the bandwidth of the signal, the highest frequency in a band-limited signal, low frequency conten (the band-limited result of a lowpass), or specifically the frequency content near 0 Hz;
  • A baseband channel may refer to a medium that will nicely transfer low (baseband) frequencies (e.g. short range wired networks)
....because often enough channels have having noise/distortion in particularly low frequencies (because EMI, because of component behaviour, etc.)


See also:

Narrowband

Has different meanings, primarily:

In audio signals, has a sense of 'not using much of the available spectrum' and suggests a a bandpassed signal.

In telecommunication, refers to a signal not (significantly) exceeding a particular channel's coherence bandwidth. Which is an idealization that makes certain theory simpler. Note that this sense does not imply that little of the relatively channel is used.

Coherence bandwidth

Intuitively, the range of frequencies over which a channel's gain / transfer function remains mostly flat/constant.


e.g. matters when channels are near each other and/or are subject to fading, e.g. in cellular communication.


See also:

Transform coding

The transform is often reversible (lossless) in itself but meant for lossy channels, specifically to have losses be less visible/audible via more useful quantization. so uses knowledge about the typical nature of as signal, the way it is perceived.

For example analog audio and video, especially the baseband variants.


https://en.wikipedia.org/wiki/Transform_coding


Sub-band coding

A form of transform coding that breaks audio (or video) into frequency bands to encode separately.

The point is often that you can treat different bands differentlym or that the amount of information in each can matter significantly.


For example, MP3 and many other lossy audio codecs use sub-band coding (combined with acoustic models) to spend the space on the most audible frequencies. It's also useful for lossless variants.


http://en.wikipedia.org/wiki/Sub-band_coding

Filter / model system terms

Q factor

Impulse response

Convolution

Dynamic range

As a concept

Loosely speaking, dynamic range refers to the minimum and maximum strength that can practically be in your signal.

Often expressed not as two values, but as the distance between them, or rather, the ratio between the largest and smallest useful level.


And because that tends to be a very large ratio, we often express it in decibel (to make it more manageable, but also because of the habits of the fields where the dynamic range concept is useful, e.g. sound, broadcasting, other electromagnetism, which all tend to use dB anyway).


That is still relatively abstract idea. Why is it useful?

Depends a little on the field, really.


In general, though, consider that often,

  • the lower side of the range is defined by "so low that, even if you manage to also catch signal, it will be lost in unavoidable noise"
  • the higher side of the range is defined by "so high it saturates and distorts the sensor, or storage medium, or the components that assume the signal on a wire sticks to some standard".





How much do you need?


Dynamic range around music is probably most useful to approximate the abilities of recording, storage, transmission, and reproduction equipment.

For example, as transmission and storage,

AM radio has around 30 dB,
FM radio has around 50 dB,
vinyl varies and is harder to measure - early 78rpm records were maybe around 35dB, but more recent and decent vinyl is quoted at 60dB to maybe 70dB
cassette tape maybe 50-60dB for cheaper/typical, and fancier tape maybe closer to 70-80dB [1]


A single critical band in our ears has approximately 30dB of instantaneous dynamic range, i.e. that's what you'll hear within a second or so.

...but moderate-speed adaptation means that that can move around,
...so we actually have more like 60dB of range given at least maybe a dozen seconds to adapt to new levels.
...and, if we count the "I can't hear much after that loud concert" self-protection mechanism, you could argue we have over 100dB given a day to recover



For us humans

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Optimistically, you could say human hearing has over 100dB of dynamic range.


We can adjust to barely still hear things around 0 dB SPL (we defined SPL that way), and exposure to 130dB SPL may be painful but won't immediately destroy our ears.


So if we wanted a theoretical hifi setup where the entire reproduction chain from original microphone to eventual amplifier to speaker to ear be good enough to be able to accurately alternate between reproductions of 'mosquito in the room' and 'jet engine in the next room', without touching the volume dial, you would need over 100dB of dynamic range in all parts.


...there's a lot of footnotes to that.

After a 100dB concert we won't hear small rustling leaves until half an hour after, due to a protective response[2].


If we avoid that protective response, we notice something akin to automatic gain control, adjusting our hearing levels to the loudest sound we heard recently.

Our ears are effectively moving a ~60dB (70dB at best(verify)) band along with the absolute sound levels presented to us, adapting roughly at the scale of seconds.

Which means we could sit back in our chair when listening to classical music that goes from soft instrument to wall of sound and back (the hifi system's noise floor allowing). So the overall dynamic range might be maybe 80-90dB.

It also means that the mastered music, within the scale of a second, need not present us with more than that dynamic range. We just won't hear it.


Actually, our ears's adaptivity has some interesting patterns, which breaks simple applicability of dynamic range. A major one is that our adaptivity isn't uniform, and also depends on frequency. The more uniform part is at most 60dB wide, whereas each critical band within our ears probably gives only ~30dB(verify).


tl;dr: we are typically pretty satisfied once we hear around 60dB of range. A bit more can be nicer for subtle productions, in a pragmatic "don't want to touch the volume knob" way.


During DSP and mastering, more is useful for reasons relating to accuracy when you do lots of intermediate steps.


Other notes:

  • Apparently we consciously hear 1dB variation, and can be unconsciously aware of down to 0.2dB or so
  • We prefer moderately loud music - up to a point - which is in fact a trick used by HiFi salespeople.

In discrete sampled data

Dynamic range in digital systems is a somewhat simpler case than that in analog devices, because the theoretic limits are well defined.

(Though people manage to misinterpret them all the same)


You know your smallest and largest number, and the only direct source of error is rounding error, which is predictable.

You can decently characterize sampling (ADC), digital processing, storage, as well as the DAC that produces the sound again.


Storage is often done in 16 bit integer form, sometimes 24-bit integers, sometimes the 32-bit float, and historically in 8-bit bytes.

The higher numbers are mostly there for DAWs and mastering and anything else that wants more headroom during processing.


The theoretic dynamic ranges of integer types is (roughly 6dB per bit):

48dB for 8-bit,
96dB for 16-bit,
144dB for 24-bit integers .

These values are idealized, treat them as approximate.

In the real world they both optimistic and a little pessimistic, for different reasons (TODO: list some major ones).



Floating point numbers are a different story. If you'ld take the smallest and largest storable values you'ld get a ridiculously large decibel range (), which is almost worthless for the purpose of indicating resolution.

The reason is that, from a real-number-set-viewpoint, levels are not distributed evenly: they are dense around zero, and sparse around the minimum and maximum.

This can actually be useful to some things, e.g. for storage and simple mixing, e.g. because there is less need for companding than in integer storage.

As a rough indication of floating point dynamic range, the amount of bits in the mantissa is a decent indication (23 in single precision, 52 in double precision), though in most practice a few bits less than that.






Spurious-Free Dynamic Range

Headroom

For context: When recording, you often have a

  • practical maximum level - often a point of distortion
  • practical minimum level - often settled some low level of noise you cannot avoid

You turn up the amplification (of whatever you get) to sit somewhere between the two, often someone on the stronger end because

part of that noise you can gradually rise out of turning it up,
so you want it louder as long as that has no other side effects
while the high-end point of distortion is a fairly sudden point.
so you do want to stay below that pretty much


It is a practical side of audio recording that you don't really know how much louder people may get, though.

So while technical description of headroom is something like "Headroom is the range/amount of amplitudes that a system can still manage above a particular point", the practical one is something more like "the space you leave between the average levels, below the point it distorts, so that things can get that amount louder without distorting during recording".


In well controlled speaking environments, 10dB is plenty.

In unpredictable situations, on people with less mic technique, or in live music environments, you might sometimes opt for over 20dB of headroom.


The headroom (in the recording) has the clearest meaning in purely digital audio, the maximum is very well defined, the response is perfectly linear right up to that point.

This is why dBFS (dB below full scale) is useful


There are other systems that actually define an absolute reference level, such as in broadcasting systems (see e.g. programme level), where it is standardized extra safety that avoids distortion.

...not a lot of other systems do, though. It's a case of 'watch the VU meter I guess'.


Once things are entirely predictable, e.g. when mastering music or voice recordings, then you can smush it right up to the maximum, should you want to. Such masters do not need headroom per se - because they're done.



Also, in particular analog devices may have a range in which they introduce (often mild) non-linearities, before you make something clip in a noticeable way.

In a lot of recording, and mixing, the headroom you get depends more on the various steps of gains, pads, and volume knobs involved, and since each device could distort, each device needs some headroom.

As such, it is a good habit to leave some headroom at every step of the process, mixing, and arguably even mastering.


Relatedly:

  • This is why gain staging is a thing.
  • This is why sound level indicators are useful, and present on most any recorder -- but those are a topic in themselves, so again, it pays to err on the side of leaving a little headroom.
  • this is why there is a good argument for higher dynamic range in recording, and during some kinds of enditing, than that you need in playback






In recording

Studio stuff

Monitors

Monitors, also known as 'studio monitors' and 'near-field monitors', are there to assist mixing. The issue they address is that if the speakers you use in monitoring your edits have some emphasis, you would likely overcompensate on the general/other speakers.

Depending on who you ask, this means one of two things:

  • Idealists say the monitor should be very truthful frequency-wise and accurate at various volumes, so you can hear what is actually on the track in the way that will make it onto the mix.
  • Pragmatists argue you should just get an average speaker so that you'll mix for how things will sound on the average non-quality system - adjusting for average speaker bias out there. (This has on some occasions led to rather badly adjusted mixes)

A monitor being near-field refers to the hearing the directly procuced sound and minimal reflected sound - often by sitting fairly close to it.

Headphones, even quality headphones, are rarely fit as monitors.