Audio and signal processing glossary

From Helpful
Jump to navigation Jump to search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical terms
MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images


Related to frequency bands

Band-limited and time-limited

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

A signal is band-limited if it has no frequency content beyond a cutoff point (frequency content zero above a certain finite frequency).

A signal is time-limited if it exists within limited time (nonzero for a finite length time interval).


Frequency-division multiplexing (FDM)

Wireless telecommunication often applies frequency-division multiplexing, meaning that multiple baseband signals are modulated on different carrier waves.

These signals can be put on the same medium, because they're not in each other's way, and with some filtering and demodulating you can get back the original baseband signal.

This is particularly useful for media where you must share, like longer-distance aerial EM transmission.

Downsides include that it is relatively inefficient use of the available spectrum.

Non-FDM methods (for comparison)

Time-division multiplexing (TDM) puts different signals on the same medium, interleaving them over time.

Simpler TDM has fixed-order, fixed-length slots, often pre-allocated. Often used on wired telecommunication. The upside is that it can utilizes the medium efficiently within each time division.

On wired, dedicated channels you can also leave them as baseband signals, which can be simpler.


Packet mode / Statistical TDM is similar to basic TDM but additionally manages/schedules the time slots / packets.

When channels don't always have the same amount to send this will be more efficient use of the medium, at the cost of more management.

Some systems combine multiplexing. For example, local wireless systems (such as WiFi, bluetooth) combine FDM and TDM, in that they apply TDM methods in assigned bands on the EM spectrum.


There are other variations on the time-based idea, with different upsides and downsides. Common downsides include that:

  • time divisions may go unused (particularly in TDM),
  • there is some implied lag (though dependent on the speed of the channel and the time slot length; in fibre and simiarly fast systems it is rarely large enough to matter)
  • the medium is often used in a lowest-common-demoninator way, in that
    • all participants have to agree and cooperate
    • it is somewhat easier for one misbehaving participant to disrupt the channel
    • it is hard to move to a new technology on an existing channel (consider WiFi)

Wideband; broadband

A relative term, describing that a spectrum contains a wide range of frequencies, often in relation to a channel's (coherence) bandwidth.


Wideband also has the implied sense of approximately equal response across the bands referred to.

Broadband (ignoring the varied meanings to different fields) sometimes suggests that gain may vary across the frequencies, and that the bands may be split into channels or frequency bins, as it is e.g. in various practical communication channels to separate signals (consider TV signal modulation, internet modem cooperation, etc).


See also:

Baseband

Various meanings, often used as an adjective in other terms, including:

  • baseband frequencies often refers to unchanged low frequency content of a signal
e.g. before modulating it for transmission, and so also often means "a directly usable signal"
  • Baseband modulation: typically describes a signal that is not modulated
For example, from the perspective of a TV, which demodulates channels from specific RF frequencies, composite video is the baseband signal you get after such demodulation. Early computers, some VCRs nand some early game consoles would output a composite-video signal, and use a RF modulator to modulate this signal on a specific TV channel.
  • baseband bandwidth often refers to the bandwidth of the signal, the highest frequency in a band-limited signal, low frequency conten (the band-limited result of a lowpass), or specifically the frequency content near 0 Hz;
  • A baseband channel may refer to a medium that will nicely transfer low (baseband) frequencies (e.g. short range wired networks)
....because often enough channels have having noise/distortion in particularly low frequencies (because EMI, because of component behaviour, etc.)


See also:

Narrowband

Has different meanings, primarily:

In audio signals, has a sense of 'not using much of the available spectrum' and suggests a a bandpassed signal.

In telecommunication, refers to a signal not (significantly) exceeding a particular channel's coherence bandwidth. Which is an idealization that makes certain theory simpler. Note that this sense does not imply that little of the relatively channel is used.

Coherence bandwidth

Intuitively, the range of frequencies over which a channel's gain / transfer function remains mostly flat/constant.


e.g. matters when channels are near each other and/or are subject to fading, e.g. in cellular communication.


See also:

Transform coding

The transform is often reversible (lossless) in itself but meant for lossy channels, specifically to have losses be less visible/audible via more useful quantization. so uses knowledge about the typical nature of as signal, the way it is perceived.

For example analog audio and video, especially the baseband variants.


https://en.wikipedia.org/wiki/Transform_coding


Sub-band coding

A form of transform coding that breaks audio (or video) into frequency bands to encode separately.

The point is often that you can treat different bands differentlym or that the amount of information in each can matter significantly.


For example, MP3 and many other lossy audio codecs use sub-band coding (combined with acoustic models) to spend the space on the most audible frequencies. It's also useful for lossless variants.


http://en.wikipedia.org/wiki/Sub-band_coding

Filter / model system terms

Q factor

Impulse response

Convolution

Dynamic range

As a concept

Dynamic range is a relative simple abstract idea - but not all contexts make it directly meaningful or equally meaningful.

That idea, loosely speaking, is that dynamic range refers to the minimum and maximum strength that can be in your signal.


In idealized systems, this might not even exist.

In real-world ones, it does. In many cases,

the lower side of the range is defined by "so low it is lost in unavoidable noise"
(...or, if you have yet to learn more, some perfectly avoidable noise)
the higher side of the range is defined by "so high it saturates and distorts the sensor or storage medium".


Note that in a lot of cases, neither of these descriptions can be pinned down to a precise fraction of a decibel, but it's close enough for most work. (The difference between pessimistic and optimistic figures are moderate, and this does matter to e.g. reading some marketing when it doesn't mention a testing standard)

In a practical sense, dynamic range can be a limiting property of the recording hardware, the medium you save on, and any devices inbetween.




This is then typically expressed as the ratio between the largest and smallest useful level.

It's an easy choice to express dynamic range in dB,

because contexts are often sound or broadcasting or electromagnetism, which tend to use dB anyway,
and because it describing the extent of what a system can handle tends to be a large number (a lot of things are pretty comfortable with at least 60dB)



Dynamic range around music is probably most useful to approximate the abilities of recording, storage, transmission, and reproduction equipment.

For example, as transmission and storage,

AM radio has around ~30 dB,
FM radio has around ~50 dB,
vinyl varies and is harder to measure - early 78rpm records were merely around 35dB, but more recent and decent vinyl is quoted at 60dB (to 70dB at best)
cassette tape maybe 50-60dB for cheaper/typical, and fancier tape maybe closer to 70-80dB [1]


Halfway decent microphones may have 70, some to maybe 90dB, between acoustic overload, and noise floor - though that's not their full story at all.

The acoustic overload is typically an absolute level, e.g. in dB SPL, and the noise floor also doesn't move so much, so the dynamic range of what you record also depends on the absolute sound levels. (This is a more complex subject that any sound engineer has to read up on)


A single critical band in our ears has approximately 30dB of instantaneous dynamic range, i.e. that's what you'll hear within a second or so.

At the most optimistic we have have 60dB of slightly-less-than-instantaneous range (across the range and over seconds)

In practice, this range shifts relatively slowly over our full range (or, if protective mechanisms kicked in, very slowly). That full range is around 120dB: if you take the lowest level at which you hear anything at all and call it 0dB SPL, then around 130-140dB SPL is pain and hearing damage.


So while 60-70dB is enough for decent quality reproduction, that leaves zero flexibility.

It'd e.g. be fine to have a final mix with 70dB of dynamic range, any situation where levels could vary at all - recording, mixing, processing, interconnecting, or transmitting - you want more leeway for the entirely practical reason that without that, signal would so easily either distort/clip in the ceiling, or fading into the noise floor.

Headroom is the space between loud signal and distortion from loudness. The space on the lower end is less well defined/described.


This is one reason that higher-bit precision makes a lot more sense within DAWs than in the final master they make.



Caveats and fuzziness

Analog and digital both have a somewhat ill-defined nature of "The smallest level at which signal still exists".

There is no true standard to measure or report that, just honest convention.

And that convention, even when honest, may be specific to a medium so not be directly comparable to others.

And this completely ignores noise floors in real setups, which can easily chop off 10-20dB of what you had.


Arguably dynamic range is a little more defined in digital devices than in analog, but mostly just because digital imperfections behave more predictably - usually the worst that happens is rounding error. Analog's limitations can sometimes behave in more varied and interesting ways.


People regularly treat dynamic range as available precision, but that makes an assumption or two that must hold up in your practice for this to be more than a vague approximation.


Dynamic range is sometimes used to describe a signal, for example that mastered music usually often uses no more than 60dB of 'detail', the meaning of that statement, while sometimes useful, is also quite fuzzy.

That is, amplitude range is necessary, but not sufficient, for detail. Nor is there necessarily a direct correlation, given that some media add compression by nature or convention. Plus this may be separately done to the signal in mastering.

It is useful to point out e.g. that for well-mastered music, people are quite happy when they get 60dB of range (a bit more than FM radio). Above that you get diminishing improvements.


For us humans

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Optimistically, you could say human hearing has over 100dB of dynamic range.


We can adjust to barely still hear things around 0 dB SPL (we defined SPL that way), and exposure to 130dB SPL may be painful but won't immediately destroy our ears.


So if we wanted a theoretical hifi setup where the entire reproduction chain from original microphone to eventual amplifier to speaker to ear be good enough to be able to accurately alternate between reproductions of 'mosquito in the room' and 'jet engine in the next room', without touching the volume dial, you would need over 100dB of dynamic range in all parts.


...there's a lot of footnotes to that.

After a 100dB concert we won't hear small rustling leaves until half an hour after, due to a protective response[2].


If we avoid that protective response, we notice something akin to automatic gain control, adjusting our hearing levels to the loudest sound we heard recently.

Our ears are effectively moving a ~60dB (70dB at best(verify)) band along with the absolute sound levels presented to us, adapting roughly at the scale of seconds.

Which means we could sit back in our chair when listening to classical music that goes from soft instrument to wall of sound and back (the hifi system's noise floor allowing). So the overall dynamic range might be maybe 80-90dB.

It also means that the mastered music, within the scale of a second, need not present us with more than that dynamic range. We just won't hear it.


Actually, our ears's adaptivity has some interesting patterns, which breaks simple applicability of dynamic range. A major one is that our adaptivity isn't uniform, and also depends on frequency. The more uniform part is at most 60dB wide, whereas each critical band within our ears probably gives only ~30dB(verify).


tl;dr: we are typically pretty satisfied once we hear around 60dB of range. A bit more can be nicer for subtle productions, in a pragmatic "don't want to touch the volume knob" way.


During DSP and mastering, more is useful for reasons relating to accuracy when you do lots of intermediate steps.


Other notes:

  • Apparently we consciously hear 1dB variation, and can be unconsciously aware of down to 0.2dB or so
  • We prefer moderately loud music - up to a point - which is in fact a trick used by HiFi salespeople.

In discrete sampled data

Dynamic range in digital systems is a somewhat simpler case than that in analog devices, because the theoretic limits are well defined.

(Though people manage to misinterpret them all the same)


You know your smallest and largest number, and the only direct source of error is rounding error, which is predictable.

You can decently characterize sampling (ADC), digital processing, storage, as well as the DAC that produces the sound again.


Storage is often done in 16 bit integer form, sometimes 24-bit integers, sometimes the 32-bit float, and historically in 8-bit bytes.

The higher numbers are mostly there for DAWs and mastering and anything else that wants more headroom during processing.


The theoretic dynamic ranges of integer types is (roughly 6dB per bit):

48dB for 8-bit,
96dB for 16-bit,
144dB for 24-bit integers .

These values are idealized, treat them as approximate.

In the real world they both optimistic and a little pessimistic, for different reasons (TODO: list some major ones).



Floating point numbers are a different story. If you'ld take the smallest and largest storable values you'ld get a ridiculously large decibel range (), which is almost worthless for the purpose of indicating resolution.

The reason is that, from a real-number-set-viewpoint, levels are not distributed evenly: they are dense around zero, and sparse around the minimum and maximum.

This can actually be useful to some things, e.g. for storage and simple mixing, e.g. because there is less need for companding than in integer storage.

As a rough indication of floating point dynamic range, the amount of bits in the mantissa is a decent indication (23 in single precision, 52 in double precision), though in most practice a few bits less than that.





Headroom

Headroom is the amount of amplitude that a system can manage above a particular point.

It is a term used when you specifically stay well under that maximum, because you know that hitting that maximum is the point of distortion.


This is a pragmatic point to audio recording, where you simply do not know how loud people may get - but can guess they probably don't get more than 10 or 20dB louder than their typical playing.

In mastering you might theoretically find a way for the total combined loudness to be perfectly predictable (not easily, though), and you would not need headroom - yet in analog hardware there might be some mild non-linearities before you make something clip. So it was still a good habit to have headroom - you can argue about how much, it's mostly a pragmatic thing.


Without a real reference level, the headroom you get depends more on the various volume knob involved. Without knowing at what volume some part of a set of devices starts distorting, you won't even know where that is, and again, it pays to be conservative.


Headroom has the clearest meaning in systems that actually define an absolute reference level, such as in broadcasting systems (see e.g. programme level), where it is some standardized extra safety that avoids distortion.


And of course in purely digital audio, the maximum is very well defined - and the response is perfectly linear up to that point, and at the end of mastering you're done with needing headroom and could just scale it to fit exactly.


That's not to say digital has it easy.

Say, early games would just sum up multiple samples, which if enough things played at once would sound like a demonic screech regardless of what the original was. (and averaging them isn't want you want either - it's generally a little quieter. Plus early games couldn't spare the computing power)


When OSes introduced a software mixer so that more than one program could play sound, we ran into the same issue

Some sound cards handled this more cleverly. Say, you could ask them to mix channels for you, which they might then do with a few extra bits so that the sum would never clip. Which is really just headroom, but in a way you didn't have to think about, know about, and which you couldn't forget to do.

(on sound cards that didn't, sound sometimes gets noisy when you have sliders all the way up, and you would be better off leaving a program's volume slider at ~80%)


In recording

Studio stuff

Monitors

Monitors, also known as 'studio monitors' and 'near-field monitors', are there to assist mixing. The issue they address is that if the speakers you use in monitoring your edits have some emphasis, you would likely overcompensate on the general/other speakers.

Depending on who you ask, this means one of two things:

  • Idealists say the monitor should be very truthful frequency-wise and accurate at various volumes, so you can hear what is actually on the track in the way that will make it onto the mix.
  • Pragmatists argue you should just get an average speaker so that you'll mix for how things will sound on the average non-quality system - adjusting for average speaker bias out there. (This has on some occasions led to rather badly adjusted mixes)

A monitor being near-field refers to the hearing the directly procuced sound and minimal reflected sound - often by sitting fairly close to it.

Headphones, even quality headphones, are rarely fit as monitors.