The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

Continuous reality and discrete digital form

Analog sound is, by the nature of being variation in air pressure, continuous in value and time: there's a value for any given time, and it varies only smoothly - if looked at in close enough detail.

Digital sampling means discreteness in value and time - which means that there can be steps, discontinuities, and such. These terms are significant because the Sampling theorem which says that (and how) we can go between digital and analog, and states under which conditions the process is and isn't lossless, so when the forms are equivalent or not.

Equidistant pressure levels and equidistant sampling interval describes Pulse Code Modulation (PCM), which is used in places like CDs and in uncompressed audio like the WAV format.

PCM is common, largely because it is mathematically convenient.

Digitization has some predictable imperfections -- which you can minimize. Usually noted:

the limitation of the dynamic range by quantizing (the pressure dimension, helped by the time one)
the possibility for frequencies to alias (the time dimension)

(see following sections)

Note that in digital form (in fact in almost all recorded forms) the amplitude loses real-world meaning. There is no real-world pressure level that it corresponds to. We generally just tweak volume to levels acceptable to us.

When dealing with hardware, decibels refer only to differences in power - ratios more (if amplified) or less (if attenuated). This is the reason that volume indicators (and volume sliders) have a 0dB point way at the top and you don't hear much once you go to perhaps -80dB .

Things like replay gain pretend to explain things in SPL terms, but really work in terms relative to itself - which is the point anyway: making everything play just as loudly as anything else corrected the same way.

Capture

Some sampling theory

The Nyquist theorem says (roughly) that if you sample a signal of a particular bandwidth, you must do so with at least twice that rate to be able to reproduce it later.

Seen from the other direction and somewhat more practically, a digitized signal at a particular sample rate can contain frequency content up to half that sample rate. For example, sound at 44100Hz can represent frequencies up to 22050Hz.

The Nyquist frequency is the largest that can be represented by a discrete-time-sampled signal, which is half the sample rate.

Practical choices relating to our ears

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

tl;dr:

for music reproduction, ~40kHz sample rate is plenty and 16 bits is enough

higher has some uses in sound recording and music production - but only for some specific uses

Choice: Sample rate

The precise sampling rate doesn't matter, as long as playback uses the same speed.

That leads to the question "how much is enough?"

The physiology of the human cochlea means objectively-equal amplitudes of different frequencies are heard with varying loudness. It starts falling gently above 3kHz, and sharply above 15kHz (also gently and sharply at lower frequencies - gently below 50Hz and sharply below 20Hz or so - but those we get those almost for free).

So we hear very little in 16kHz..20kHz, and by 20kHz it has fallen off enough that it becomes harder to test for even intentionally.

Above 10kHz is already not considered very musical. The highest base note stops around 8 or 10 kHz (the highest voices and violins).

Yes, there will be harmonics above that. While not loud, they are certainly there, and it's a good idea to capture whatever still falls within the human-audible range, as we typically interpret that as texture and clarity.

This is why sampling to also reproduce 10kHz..16kHz remains useful even if it's very subtle to us (and if it's loud in that range, more annoying than useful).

Note that even if you increase the sample rate of your sampler, you won't necessarily get the frequencies that Nyquist says you would, because microphones tend not to bother either -- and also because most audio-sampling equipment has an anti-aliasing filter - a softish lowpass around the same area). Chech the frequency response, because most audio equipment just blindly lists up to 20kHz without actually saying what the response there is. Most mics only bother to get a flat response up to 15kHz or so, and at 20kHz their frequency response tends to have fallen 10 to 20dB, which isn't nothing but not very much.

What sort of sample rate choices have existed / I'm wondering about that dialog box of options

Historically the aim is to be generous and play safe.

While 40kHz sampling is basically enough to store up to 20kHz (see Nyquist), real-world designs of devices like ADCs, DACs and filters (antialiasing or otherwise) require some leeway (and are easier to design with more than minimal leeway).

Apparently, early digital sampling was at rates like 37kHz and 50 kHz(verify), but this seems be more related to the hardware we adapted for this at the time(verify).

So, 44100 Hz is indeed a weirdly specific number.

It turns out it was chosen to combine easier with TV standards of the time (44100Hz for PAL and black and white NTSC, 44056Hz for color NTSC), and presumably became more established by PCM adaptors offering those rates too(verify).

Some early formats used 44056 kHz, but seeing this now is rare), and 44.1kHz seemed to have become more settled, becoming a de facto standard for years, especially once it was adopted by audio CDs.

The idea of 48kHz seems to be motivated by 24fps film. In terms of real recording it might(verify) have been instead established by DAT, used in music industry, offered recording at 48kHz (also 44.1 and 32).

these days, 48kHz is also easier to convert to/from 96kHz should you use it, seen in music production and audiophile areas - but they also often have 88.2 and others.

The choice of 32kHz seems to mostly be correlated with radio broadcasting(verify) (in part because AM didn't have the bandwidth for more(verify))

Actually, the leeway is much less relevant to modern designs, because modern ADCs will supersample at least 2x, not because they need higher frequencies but to effectively create leeway for their own filter(verify). (Similarly, DACs use oversampling reconstruction[1] to make filtering after it simpler)

Choice: Bit depth

Roughly speaking, when we have samples in linear PCM (which is typical), each bit we add to our bit depth gives us another 6dB of range.

Combine that with a typical cochlear critical band, and the ballpark of what we need is at least 10 to 12 bits.

And for tightly mixed material this can be enough - FM radio doesn't give much more than that on a good day, and doesn't really need to given that most pop music has enough compression on it on it, as it effectively reduces how much of the bit depth you actually use.

But if volume dials aren't perfect, then you want a little more. If you want to do things like have a really quiet bit and a really loud bit in the same recording, you want a little more.

The next-higher convenient number for computers is 16, and it corresponds to 96dB, which is plenty for everyday playback - and a lot of recording as well.

Higher bit rates can be useful, but not for the reasons you might think.

Sure 24 bits might give you 144dB of range and that's more. It's also more enough to encompass perfectly quiet for us humans to permanent hearing damage (and probably amplifier damage). So we don't need that for playback.

When recording, however, you may never know precisely what will happen. When we have no idea exactly how loud someone may get, we might want to record at a level where it could dip reasonably much without falling into the digital noise floor, and get louder reasonably much without distorting. If we maybe 20dB on both sides, 16 bit's 96dB suddenly feels cramped.

Giving us more bits means it is easier to

catch loud transients without distortion (...if you know what you're doing),

avoid turning the signal into the digital sampling's noise floor (...if you know what you're doing).

Even when the rest of the setup means we don't even use half the range, this still gives us peace of mind. We can edit later. (Practice is a little messier because there are multiple devices (not least of which the mic) and adjustments involved. Sound engineer is a respectable job, you know...)

Oversampling

Oversampling input

Oversampling, a.k.a. supersampling, refers to fetching more samples than you strictly need, because it turns out to be useful for something.

Lower the amount of sampled noise

One is to smooth sensor readings because you expect a source of noise, to get a slightly stabler output value. You could sample that same temperature sensor 100 times per second and take the average.

The validity of this depends on a bunch of assumptions, and a lot of uses aren't strictly correct - but still useful. Say,

that the noise is reasonably random so averages closer to zero with repeated reads than with just one.
Won't change real-world value within this time (or that change is something we want to smooth over as well)

Avoid aliasing, and/or easier input filtering

Imitate higher resolution, and lower noise

Oversampling output

Semi-sorted

Sample storage

Mixing and volume

Resampling

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Downsampling, upsampling refers to converting a time series to another time series that represents the same freqencies, but as if it were sampled at a different rate.

This can be to integer multiples or integer fractions of the original sample rate, or to arbitrarily other sampling rates.

You typically want to express the same frequency content, as accurate as possible. This makes it a a nontrivial task.

'Point sampled'

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Digital sample capture, storage, reproduction

Contents

Continuous reality and discrete digital form

Capture

Some sampling theory

Practical choices relating to our ears

Oversampling

Oversampling input

Lower the amount of sampled noise

Avoid aliasing, and/or easier input filtering

Imitate higher resolution, and lower noise

See also (oversampling)

Oversampling output

Semi-sorted

Sample storage

Mixing and volume

Resampling

'Point sampled'

See also

Navigation menu