Digital sample capture, storage, reproduction

From Helpful
(Redirected from Supersampling)
Jump to: navigation, search
This page is in a collection about both human and automatic dealings with audio, video, and images, including


Audio physics and physiology

Digital sound and processing


Image

Video

Stray signals and noise


For more, see Category:Audio, video, images


Continuous reality and discrete digital form

Analog sound is, by the nature of being variation in air pressure, continuous in value and time: there's a value for any given time, and it varies only smoothly - if looked at in close enough detail.

Digital sampling means discreteness in value and time - which means that there can be steps, discontinuities, and such. These terms are significant because the Sampling theorem which says that (and how) we can go between digital and analog, and states under which conditions the process is and isn't lossless, so when the forms are equivalent or not.


Equidistant pressure levels and equidistant sampling interval describes Pulse Code Modulation (PCM), which is used in places like CDs and in uncompressed audio like the WAV format.

PCM is common, largely because it is mathematically convenient.


Digitization has some predictable imperfections -- which you can minimize. Usually noted:

  • the limitation of the dynamic range by quantizing (the pressure dimension, helped by the time one)
  • the possibility for frequencies to alias (the time dimension)

(see following sections)



Note that in digital form (in fact in almost all recorded forms) the amplitude loses real-world meaning. There is no real-world pressure level that it corresponds to. We generally just tweak volume to levels acceptable to us.

When dealing with hardware, decibels refer only to differences in power - ratios more (if amplified) or less (if attenuated). This is the reason that volume indicators (and volume sliders) have a 0dB point way at the top and you don't hear much once you go to perhaps -80dB .


Things like replay gain pretend to explain things in SPL terms, but really work in terms relative to itself - which is the point anyway: making everything play just as loudly as anything else corrected the same way.


Capture

Some sampling theory

Practical choices relating to our ears

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

tl;dr:

for music reproduction, ~40kHz sample rate is plenty and 16 bits is enough


Sample rate

The physiology of the human cochlea means objectively-equal amplitudes of different frequencies are heard with varying loudness. It starts fallling gently in the 3kHz..15kHz range, then drops sharply. So we hear very little in 16kHz..20kHz, and by 20kHz it has fallen off so much that it's hard to test.

Above 10kHz is already not considered musical (e.g. the highest voice stops at 8kHz, the high violin around 10kHz), but that's about the base tone of these instruments - it's a good idea to capture the overtones within the human-audible range, as we typically interpret that as clarity. This is why 10kHz..16kHz is useful even if it's very subtle to us (and very annoying if loud there).



Historically the aim is to be generous and play safe. While ~40kHz sampling is basically enough to store 20kHz (see Nyquist), while real-world designs of devices like ADCs, DACs and filters (antialiasing or otherwise) require some leeway (and are easier to design with some more).

The number 44.1kHz was chosen to combine easier with TV standards (PAL, NTSC).

48kHz was chosen later, apparently mostly for a bit more leeway for designs. (reasons are unclear to me. It may also relate to TV)

These days, it's also easier to convert to/from 96kHz should you use it.


Actually, the leeway is basically irrelevant to modern designs, because modern ADCs will supersample at least 2x, not because they need higher frequencies but to effectively create this leeway for their own filter.

(Similarly, DACs use oversampling reconstruction[1] to make filtering after it simpler)



Bit depth

Using bits for linearly-spaced amplitudes, combined with the range we typically hear within a cochleal critical band, means we'd want at least 10-12 bits.

It'd e.g. be enough for FM radio reproduction of the now-typical mastering of pop music.

Buuut that little is inflexible at best, and if anyone in the process isn't an expert it'd probably touch the quality.


So 16 bits is enough leeway for most any music reproduction. The number 16 comes from computers, but also happens to give enough range for almost every use.


Higher bit rates are useful only for some very specialized things,

For example, in recording, 24-bits rather than 16-bits doesn't give you a better-quality recording of the same thing - but is practical for the person recording

Gives you more breathing space in terms of levels - it means they don't have to be as perfectly tweaked for your medium. There's more headroom to store transients without distortion (if you know what you're doing), and the noise floor can be further away (if you know what you're doing).


Oversampling

Oversampling input

Oversampling, a.k.a. supersampling, refers to fetching more samples than you strictly need for a purpose, but can use for other reasons.


For example, you might want 10 temperature samples per second, but actually sample 1000 times per second and average every 100, to get a slightly stabler output value.


This is useful for a few different but related reasons.


Avoid aliasing, and/or easier input filtering

Imitate higher resolution, and lower noise

See also

Oversampling output

Oversampling PWM

Upsampling DACs

Semi-sorted

Sample storage

Mixing and volume

Resampling

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Downsampling, upsampling refers to converting a time series to another time series that represents the same freqencies, but as if it were sampled at a different rate.

This can be to integer multiples or integer fractions of the original sample rate, or to arbitrarily other sampling rates.

You typically want to express the same frequency content, as accurate as possible. This makes it a a nontrivial task.


'Point sampled'

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

See also

Sampling


Resampling

Other: