Sound programming, sound coding, sound codecs

From Helpful
Jump to: navigation, search
This page is in a collection about both human and automatic dealings with audio, video, and images, including

Audio physics and physiology

Digital sound and processing



Stray signals and noise

For more, see Category:Audio, video, images


  • sox (command line tool)
  • libsndfile [1]
  • libaudiofile [2]

Sample rate conersion:

  • libsamplerate [3] (a.k.a. Secret Rabbit Code)

General/wider purpose audio programming

Helpers, File formats, codecs simple and complex

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

See e.g.:

The following table is meant as a smaller table to give an overview of the appoaches, algoritms, and codecs that are out there, from all fields of application, what they basically do


  • 'Music' usually means focus on quality,
  • 'Speech' usually focuses on low latency and space-efficient coding, for only speech

Name Used for Techniques, bitrates

AAC(-LC) (MPEG-2, MPEG-4) Music Also known as MPEG-2 NBC(verify). See also MPEG-4 HE-AAC.
Can be seen as improvement over MP3
AC3 / DD (Dolby Digital) Known as AC3, Dolby Digital, ATSC A/52
See also DD+
Best above 192kbps(verify)
Comparable quality/limitations to MP2, MP3 (verify)
Has trouble with impulses, high frequency stereo(verify)
ADX Has an ADPCM variant and MPEG-2 variant [6]
AMBE Speech Proprietary [7]
AMR, AMR-NB Patented.
AMR-WB Speech Patented. See also G.722.2 [8]
AMR-WB+ Patented.
Apple Lossless (ALAC) Lossless [9]
apt-X Lossless variant exists. [10]
ATRAC Family of codecs [11]
CELP and variants speech (often) Can be a codec in itself, but now usually understood as a group of variants (ACELP, RCELP, LD-CELP, VSELP, others), or used as a part of some codec (QCELP, many), though the lines between these can be blurry.
CELT [12]
CVSD, CVSDM Speech [13]

DD+, E-AC-3 (Dolby Digital Plus, Enhanced AC-3) [14]
DSS (Digital Speech Standard) [15]
DTS Patented.
Seen on DVDs, comparable to AC3(verify), also in that it needs a comparatively high bitrate to sound decent.(verify)
DTS-HD (DTS++) Extension of DTS [17]
DRA [18]
Dolby TrueHD Lossless Uses (and expands on) Meridian Lossless Packing [19]

EVRC Speech Used in CMDA2000 [20]
EVRC-B Speech Used in CMDA2000 [21]

GSM-EFR, GSM 06.60 Speech [22]
GSM-FR, GSM 06.10 Speech [23]
GSM-HR, GSM 06.20 [24]
HILN Speech [25]
iLBC (Internet Low Bit Rate Codec) [26]
Impala Used in FORscene [27]
iSAC (Internet Speech Audio Codec) Speech Proprietary [28]
ITU-T G.721 (Superceded by G.726) Speech ADPCM at 32 kbit/s.
ITU-T G.723 (Superceded by G.726) Speech ADPCM at 24 and 40 kbit/s [29]
ITU-T G.723.1 Speech [30]
ITU-T G.726 Speech ADPCM at 16, 24, 32, and 40 kbit/s (meant to supersede G.721, and G.723) [31]
ITU-T G.711 Speech A-Law/mu-Law PCM at 64kbit/s [32]
ITU-T G.718 Speech 32 to 128 kbit/s for speech and decent-quality music [33]
ITU-T G.719 Speech [34]
ITU-T G.722 Speech SB-ADPCM at 48, 56 and 64 kbit/s [35]
ITU-T G.722.1 Speech (24 and 32 kbit/s) [36]
ITU-T G.722.2 Speech Often refers to AMR-WB (Adaptive Multi-Rate Wideband) [37]
ITU-T G.728 Speech 16kbit/s, LD-CELP [38]

ITU-T G.729 Speech CS-ACELP [39]
ITU-T G.729.1 Speech [40]
MPEG-1(/MPEG-2) Layer I, Layer II, Layer III (MP3) Music You usually want Layer III
(MPEG-2 extends options somewhat)
Sometimes has trouble with impulses, stereo high frequency content (regardless of bitrate)(verify)
[41] [42] [43]

MPEG-1(/MPEG-2) Layer II Music (MPEG-2 extends options somewhat)
MPEG-1(/MPEG-2) Layer III (MP3) Music (MPEG-2 extends options somewhat)
MPEG-4 ALS [44]
MPEG-4 DST [45]
MPEG-4 HVXC Speech [46]
MPEG-4 HE-AAC [47]
MPEG-4 SLS [48]
MPEG-4 Structured Audio
OSQ Lossless [49]
OptimFROG [50]
MusePack (MPC) Music [51]
SVOPC Speech Designed to deal with packet loss. Previously used by Skype (where it has been replaced with SILK) [52]
SILK Speech Usd by skype. Patented, royalty free uses possible. [53]
VMR-WB Speech [54]

TTA (True Audio) Lossless [55]
Truespeech Speech Proprietary [56]

FLAC Lossless [57]
Monkey's Audio Lossless [58]
QDesign/Ravesound Previously known as LBPack. Used by older Quicktime variants [59]

WAVPack Lossless [60]
LPAC, LTAC Lossless Lossless Predictive/Transform Audio Compression. Has mostly become MPEG-4 Audio Lossless Coding instead. [61] [62]
Meridian Lossless Packing (MLP) Lossless Used in DVD-Audio, and in HD DVD, Blu-Ray through Dolby TrueHD [63]
SHN (Shorten) Lossless [64]
RTAudio Speech Proprietary [65]

RealAudio Speech, music [66]
Siren 7, Siren 14, Siren 22 Speech Patented (royalty free use possible). See also G.722.1 [67]
TwinVQ Proprietary (Yamaha, NTT) [68]

SMV (Selectable Mode Vocoder) Speech Used in CMDA2000 [69]
VOX (Dialogic ADPCM) 4-bit ADPCM, often 8000Hz sampling, less commonly 6000Hz [70]

WMA Music Proprietary format.
There are variants on basic WMA targeting higher quality audio (WMA Professional), voice coding (WMA Voice), lossless coding (WMA Lossless). Hardware players tend to not support these.
Optional DRM.
Music quality comparable to Ogg, AAC(verify)

Speex speech Beats various older speech codecs at low-bitrate speech [72] [73]
Vorbis [74] Music [75]

Lossless codecs tend to compress to a file size perhaps half of what the 44kHz Stereo PCM samples would take. There is variation, but not horribly much.


  • libsamplerate (and wrappers, like pysamplerate)

file format reading:

  • libaudiofile (AIFF, AIFC, WAVE, and NeXT/Sun) [76]
  • various mpeg libraries primarily for MPEG1 (which includes MP3), sometimes MPEG2

And see also:

Perceptive quality measures:

  • MOS - Mean Opinion Score [77]
  • PEAQ [78]
  • PSQM (ITU-T P.861) [79]
  • PESQ (ITU-T P.862) [80]