Sound programming, sound coding, sound codecs
conversion
- sox (command line tool)
- libsndfile [1]
- libaudiofile [2]
Sample rate conersion:
- libsamplerate [3] (a.k.a. Secret Rabbit Code)
General/wider purpose audio programming
- Nyquist
- C++ Library for Audio and Music (CLAM)
- CREATE Signal Library
- Max, MSP(, Jitter)
- Synz
- Open Sound World
- SndObj
- Create Signal Library (CSL)
- Synthesis Toolkit (STK)
Helpers, codecs simple and complex
See e.g.:
- http://en.wikipedia.org/wiki/List_of_codecs#Audio_codecs
- http://en.wikipedia.org/wiki/Comparison_of_audio_codecs
The following table is meant as a smaller table to give an overview of the appoaches, algoritms, and codecs that are out there, from all fields of application, what they basically do
Note:
- 'Speech' usually
- focuses on space-efficient coding
- focuses on low delay in encoding and decoding
- assumes relatively isolated speech
- 'Music' usually means focus on
- close (perceptual) reproduction of the original
Name | Used for | Techniques, bitrates Notes |
Links |
---|---|---|---|
AAC usually refers to AAC-LC (defined in MPEG-2, MPEG-4) | Music | Also known as MPEG-2 NBC(verify). Can be seen as improvement over MP3 See also MPEG-4 HE-AAC. |
[4] |
AC-3 / Dolby Digital | Known as AC-3 (often seen without the dash), Dolby Digital (sometimes DD), ATSC A/52 Its early introduction (1986, before MP3) and 5.1 support made it a common choice for that to store 5.1. It being supported (alongside DTS) in digital interconnects like TOSLINK helped too Quality/limitations is comparable to MP2, MP3 (verify), including that it has some trouble with impulses, high frequency stereo(verify) |
[5] | |
AC-4 | e.g. one of the various codecs allowed in ATSC 3.0 broadcast | [6] | |
ADX | Has an ADPCM variant and MPEG-2 variant | [7] | |
AIFF-C, AIFC | Compressed variant[8] of AIFF (or, on macOS, possibly uncompressed but little-endian instead) | ||
AIFF | IFF-based format used for audio | ||
AMBE | Speech | Proprietary | [9] |
AMR, AMR-NB | Patented. | ||
AMR-WB | Speech | Patented. See also G.722.2 | [10] |
AMR-WB+ | Patented. | ||
Apple Lossless (ALAC) | Lossless | Proprietary. Made for iOS and iTunes, which also chose not to support FLAC. tl;dr: Apple people prefer it, purely for ease. | [11] |
apt-X | Lossless variant exists. | [12] | |
ATRAC | Family of codecs | [13] | |
CELP and variants | speech | Can be a codec in itself, but now usually understood as a group of variants (ACELP, RCELP, LD-CELP, VSELP, others), or used as a part of some codec (QCELP, many), though the lines between these can be blurry. | |
CELT | general, low latency | royalty-free, open standard. Basically merged into Opus now | [14] |
CVSD, CVSDM | Speech | [15]
| |
Dolby Digital Plus | a.k.a. DD+, DDP. E-AC-3 (Enhanced AC-3), EC-3 | [16]
| |
DSS (Digital Speech Standard) | [17] | ||
DTS | Patented. Seen on DVDs, comparable to AC3(verify), also in that it needs a comparatively high bitrate to sound decent.(verify) |
[18] | |
DTS-HD (DTS++) | Extension of DTS | [19] | |
DRA | [20] | ||
Dolby TrueHD | Lossless | Uses (and expands on) Meridian Lossless Packing | [21]
|
EVRC | Speech | Used in CMDA2000 | [22] |
EVRC-B | Speech | Used in CMDA2000 | [23] |
FLAC | Lossless | [24] | |
GSM-EFR, GSM 06.60 | Speech | [25] | |
GSM-FR, GSM 06.10 | Speech | [26] | |
GSM-HR, GSM 06.20 | [27] | ||
HILN | Speech | [28] | |
iLBC (Internet Low Bit Rate Codec) | [29] | ||
Impala | Used in FORscene | [30] | |
iSAC (Internet Speech Audio Codec) | Speech | Proprietary | [31] |
ITU-T G.721 (Superceded by G.726) | Speech | ADPCM at 32 kbit/s. | |
ITU-T G.723 (Superceded by G.726) | Speech | ADPCM at 24 and 40 kbit/s | [32] |
ITU-T G.723.1 | Speech | [33] | |
ITU-T G.726 | Speech | ADPCM at 16, 24, 32, and 40 kbit/s (meant to supersede G.721, and G.723) | [34] |
ITU-T G.711 | Speech | A-Law/mu-Law PCM at 64kbit/s | [35] |
ITU-T G.718 | Speech | 32 to 128 kbit/s for speech and decent-quality music | [36] |
ITU-T G.719 | Speech | [37] | |
ITU-T G.722 | Speech | SB-ADPCM at 48, 56 and 64 kbit/s | [38] |
ITU-T G.722.1 | Speech | (24 and 32 kbit/s) | [39] |
ITU-T G.722.2 | Speech | Often refers to AMR-WB (Adaptive Multi-Rate Wideband) | [40] |
ITU-T G.728 | Speech | 16kbit/s, LD-CELP | [41]
|
ITU-T G.729 | Speech | CS-ACELP | [42] |
ITU-T G.729.1 | Speech | [43] | |
Monkey's Audio (APE) | Lossless | [44] | |
MPEG-1(/MPEG-2) Layer I, Layer II, Layer III (MP3) | Music | You usually want Layer III (MPEG-2 extends options somewhat) Sometimes has trouble with impulses, stereo high frequency content (regardless of bitrate)(verify) |
[45] [46] [47]
|
MPEG-1(/MPEG-2) Layer II | Music | (MPEG-2 extends options somewhat) | |
MPEG-1(/MPEG-2) Layer III (MP3) | Music | (MPEG-2 extends options somewhat) | |
MPEG-4 ALS | Lossless | [48] | |
MPEG-4 DST | [49] | ||
MPEG-4 HVXC | Speech | [50] | |
MPEG-4 HE-AAC | [51] | ||
MPEG-4 SLS | [52] | ||
MPEG-4 Structured Audio | |||
NIST | (niche?) | [53] | |
OSQ | Lossless | (proprietary, so generally Steinberg-only) | [54] |
OptimFROG | Lossless | (proprietary lossless format) | [55] |
MusePack (MPC) | Music | [56] | |
NeXT/Sun (.au) | Originally headerless (8000 Hz 8-bit μ-law PCM data), later with header, varied bith depth, and various lossy compression options | [57] | |
SVOPC | Speech | Designed to deal with packet loss. Previously used by Skype (where it has been replaced with SILK) | [58] |
SILK | Speech | Used e.g. by skype. Patented, royalty free uses possible. | [59] |
VMR-WB | Speech | [60]
| |
TTA (True Audio) | Lossless | [61] | |
Truespeech | Speech | Proprietary | [62]
|
QDesign/Ravesound | Previously known as LBPack. Used by older Quicktime variants | [63]
| |
WAVPack | Lossless | [64] | |
LPAC, LTAC | Lossless | Lossless Predictive/Transform Audio Compression. Has mostly become MPEG-4 Audio Lossless Coding instead. | [65] [66] |
Meridian Lossless Packing (MLP) | Lossless | Used in DVD-Audio, and in HD DVD, Blu-Ray through Dolby TrueHD | [67] |
Opus | Music/general | patent-free open standard. Successor to Vorbis, more applicable to lower latency(verify) | [68] [69]
|
RTAudio | Speech | Proprietary | [70]
|
RealAudio | Speech, music | [71] | |
SHN (Shorten) | Lossless | [72]
| |
Siren 7, Siren 14, Siren 22 | Speech | Patented (royalty free use possible). See also G.722.1 | [73] |
Speex | speech | Beats various older speech codecs at low-bitrate speech | [74] [75] |
TwinVQ | Proprietary (Yamaha, NTT) | [76]
| |
SMV (Selectable Mode Vocoder) | Speech | Used in CMDA2000 | [77] |
Vorbis | Music/general | patent-free open standard | [78] [79] |
VOX (Dialogic ADPCM) | 4-bit ADPCM, often 8000Hz sampling, less commonly 6000Hz | [80]
| |
WMA | Music | Proprietary format. There are variants on basic WMA targeting higher quality audio (WMA Professional), voice coding (WMA Voice), lossless coding (WMA Lossless). Hardware players tend to not support these. Optional DRM. Music quality comparable to Ogg, AAC(verify) |
[81]
|
Lossless codecs tend to compress to perhaps half of what the 44kHz Stereo PCM samples would take.
There is variation, but not very much.
libraries:
- libsamplerate (and wrappers, like pysamplerate)
file format reading:
- libaudiofile (AIFF, AIFC, WAVE, and NeXT/Sun) [82]
- various mpeg libraries primarily for MPEG1 (which includes MP3), sometimes MPEG2
And see also:
Perceptive quality measures:
File formats
MP3
WAV and friends
The WAVE format (often WAV due to the file extension) is a specific use of RIFF
WAV is usually raw, uncompressed linear PCM - but it can also store ADPCM, uLaw (verify)(verify).
As a fairly simple file format carrying raw data, it is useful as an interchange format.
W64
BWF
BWF is an extension of WAV that is used e.g. by (non-linear) digital recorders.
BWF files still use the .WAV extension, and is backwards compatible because it mostly just adding metadata in extension chunks.
It adds things like timecodes
ITU recommendation ITU-R BS.1352-3, Annex 1.
https://en.wikipedia.org/wiki/Broadcast_Wave_Format
RF64
Like BWF, but breaks WAV/BWF's 4GB limitation, and allows more channels(verify).
Used in broadcasting, and audio archiving.
https://en.wikipedia.org/wiki/RF64