Sound programming, sound coding, sound codecs

From Helpful
Jump to navigation Jump to search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical terms
MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

conversion

  • sox (command line tool)
  • libaudiofile [2]


Sample rate conersion:

  • libsamplerate [3] (a.k.a. Secret Rabbit Code)


General/wider purpose audio programming


Helpers, codecs simple and complex

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

See e.g.:


The following table is meant as a smaller table to give an overview of the appoaches, algoritms, and codecs that are out there, from all fields of application, what they basically do

Note:

  • 'Music' usually means focus on quality,
  • 'Speech' usually focuses on low latency and space-efficient coding, for only speech


Name Used for Techniques, bitrates

Notes
Links
AAC usually refers to AAC-LC (defined in MPEG-2, MPEG-4) Music Also known as MPEG-2 NBC(verify).
Can be seen as improvement over MP3
See also MPEG-4 HE-AAC.
[4]
AC-3 / Dolby Digital Known as AC-3 (often seen without the dash), Dolby Digital (sometimes DD), ATSC A/52
Its early introduction (1986, before MP3) and 5.1 support made it a common choice for that to store 5.1. It being supported (alongside DTS) in digital interconnects like TOSLINK helped too

Quality/limitations is comparable to MP2, MP3 (verify), including that it has some trouble with impulses, high frequency stereo(verify)
Best above 192kbps(verify)
Cf. its successor, Dolby Digital Plus
See also Electronics_project_notes/Audio_notes_-_multichannel_and_surround#AC-3

[5]
AC-4 e.g. one of the various codecs allowed in ATSC 3.0 broadcast [6]
ADX Has an ADPCM variant and MPEG-2 variant [7]
AIFF-C, AIFC Compressed variant[8] of AIFF (or, on macOS, possibly uncompressed but little-endian instead)
AIFF IFF-based format used for audio
AMBE Speech Proprietary [9]
AMR, AMR-NB Patented.
AMR-WB Speech Patented. See also G.722.2 [10]
AMR-WB+ Patented.
Apple Lossless (ALAC) Lossless Proprietary. Made for iOS and iTunes, which also chose not to support FLAC. tl;dr: Apple people prefer it, purely for ease. [11]
apt-X Lossless variant exists. [12]
ATRAC Family of codecs [13]
CELP and variants speech Can be a codec in itself, but now usually understood as a group of variants (ACELP, RCELP, LD-CELP, VSELP, others), or used as a part of some codec (QCELP, many), though the lines between these can be blurry.
CELT general, low latency royalty-free, open standard. Basically merged into Opus now [14]
CVSD, CVSDM Speech [15]


Dolby Digital Plus a.k.a. DD+, DDP. E-AC-3 (Enhanced AC-3), EC-3 [16]


DSS (Digital Speech Standard) [17]
DTS Patented.
Seen on DVDs, comparable to AC3(verify), also in that it needs a comparatively high bitrate to sound decent.(verify)
[18]
DTS-HD (DTS++) Extension of DTS [19]
DRA [20]
Dolby TrueHD Lossless Uses (and expands on) Meridian Lossless Packing [21]


EVRC Speech Used in CMDA2000 [22]
EVRC-B Speech Used in CMDA2000 [23]
FLAC Lossless [24]
GSM-EFR, GSM 06.60 Speech [25]
GSM-FR, GSM 06.10 Speech [26]
GSM-HR, GSM 06.20 [27]
HILN Speech [28]
iLBC (Internet Low Bit Rate Codec) [29]
Impala Used in FORscene [30]
iSAC (Internet Speech Audio Codec) Speech Proprietary [31]
ITU-T G.721 (Superceded by G.726) Speech ADPCM at 32 kbit/s.
ITU-T G.723 (Superceded by G.726) Speech ADPCM at 24 and 40 kbit/s [32]
ITU-T G.723.1 Speech [33]
ITU-T G.726 Speech ADPCM at 16, 24, 32, and 40 kbit/s (meant to supersede G.721, and G.723) [34]
ITU-T G.711 Speech A-Law/mu-Law PCM at 64kbit/s [35]
ITU-T G.718 Speech 32 to 128 kbit/s for speech and decent-quality music [36]
ITU-T G.719 Speech [37]
ITU-T G.722 Speech SB-ADPCM at 48, 56 and 64 kbit/s [38]
ITU-T G.722.1 Speech (24 and 32 kbit/s) [39]
ITU-T G.722.2 Speech Often refers to AMR-WB (Adaptive Multi-Rate Wideband) [40]
ITU-T G.728 Speech 16kbit/s, LD-CELP [41]


ITU-T G.729 Speech CS-ACELP [42]
ITU-T G.729.1 Speech [43]
Monkey's Audio (APE) Lossless [44]
MPEG-1(/MPEG-2) Layer I, Layer II, Layer III (MP3) Music You usually want Layer III
(MPEG-2 extends options somewhat)
Sometimes has trouble with impulses, stereo high frequency content (regardless of bitrate)(verify)
[45] [46] [47]


MPEG-1(/MPEG-2) Layer II Music (MPEG-2 extends options somewhat)
MPEG-1(/MPEG-2) Layer III (MP3) Music (MPEG-2 extends options somewhat)
MPEG-4 ALS Lossless [48]
MPEG-4 DST [49]
MPEG-4 HVXC Speech [50]
MPEG-4 HE-AAC [51]
MPEG-4 SLS [52]
MPEG-4 Structured Audio
NIST (niche?) [53]
OSQ Lossless (proprietary, so generally Steinberg-only) [54]
OptimFROG Lossless (proprietary lossless format) [55]
MusePack (MPC) Music [56]
NeXT/Sun (.au) Originally headerless (8000 Hz 8-bit μ-law PCM data), later with header, varied bith depth, and various lossy compression options [57]
SVOPC Speech Designed to deal with packet loss. Previously used by Skype (where it has been replaced with SILK) [58]
SILK Speech Used e.g. by skype. Patented, royalty free uses possible. [59]
VMR-WB Speech [60]


TTA (True Audio) Lossless [61]
Truespeech Speech Proprietary [62]


QDesign/Ravesound Previously known as LBPack. Used by older Quicktime variants [63]


WAVPack Lossless [64]
LPAC, LTAC Lossless Lossless Predictive/Transform Audio Compression. Has mostly become MPEG-4 Audio Lossless Coding instead. [65] [66]
Meridian Lossless Packing (MLP) Lossless Used in DVD-Audio, and in HD DVD, Blu-Ray through Dolby TrueHD [67]
Opus Music/general patent-free open standard. Successor to Vorbis, more applicable to lower latency(verify) [68] [69]


RTAudio Speech Proprietary [70]


RealAudio Speech, music [71]
SHN (Shorten) Lossless [72]


Siren 7, Siren 14, Siren 22 Speech Patented (royalty free use possible). See also G.722.1 [73]
Speex speech Beats various older speech codecs at low-bitrate speech [74] [75]
TwinVQ Proprietary (Yamaha, NTT) [76]


SMV (Selectable Mode Vocoder) Speech Used in CMDA2000 [77]
Vorbis Music/general patent-free open standard [78] [79]
VOX (Dialogic ADPCM) 4-bit ADPCM, often 8000Hz sampling, less commonly 6000Hz [80]


WMA Music Proprietary format.
There are variants on basic WMA targeting higher quality audio (WMA Professional), voice coding (WMA Voice), lossless coding (WMA Lossless). Hardware players tend to not support these.
Optional DRM.
Music quality comparable to Ogg, AAC(verify)
[81]



Lossless codecs tend to compress to perhaps half of what the 44kHz Stereo PCM samples would take. There is variation, but not very much.




libraries:

  • libsamplerate (and wrappers, like pysamplerate)


file format reading:

  • libaudiofile (AIFF, AIFC, WAVE, and NeXT/Sun) [82]
  • various mpeg libraries primarily for MPEG1 (which includes MP3), sometimes MPEG2

And see also:


Perceptive quality measures:

  • MOS - Mean Opinion Score [83]
  • PEAQ [84]
  • PSQM (ITU-T P.861) [85]
  • PESQ (ITU-T P.862) [86]




File formats

MP3

WAV and friends

The WAVE format (often WAV due to the file extension) is a specific use of RIFF

WAV is usually raw, uncompressed linear PCM - but it can also store ADPCM, uLaw (verify)(verify).


As a fairly simple file format carrying raw data, it is useful as an interchange format.


W64

BWF

BWF is an extension of WAV that is used e.g. by (non-linear) digital recorders.

BWF files still use the .WAV extension, and is backwards compatible because it mostly just adding metadata in extension chunks.

It adds things like timecodes


ITU recommendation ITU-R BS.1352-3, Annex 1.

https://en.wikipedia.org/wiki/Broadcast_Wave_Format


RF64

Like BWF, but breaks WAV/BWF's 4GB limitation, and allows more channels(verify).

Used in broadcasting, and audio archiving.

https://en.wikipedia.org/wiki/RF64