Sound programming, sound coding, sound codecs

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

See e.g.:

The following table is meant as a smaller table to give an overview of the appoaches, algoritms, and codecs that are out there, from all fields of application, what they basically do

Note:

'Music' usually means focus on quality,
'Speech' usually focuses on space-efficient coding and low latency, and often assumes relatively isolated speech

Name	Used for	Techniques, bitrates Notes	Links
AAC usually refers to AAC-LC (defined in MPEG-2, MPEG-4)	Music	Also known as MPEG-2 NBC(verify). Can be seen as improvement over MP3 See also MPEG-4 HE-AAC.	[4]
AC-3 / Dolby Digital		Known as AC-3 (often seen without the dash), Dolby Digital (sometimes DD), ATSC A/52 Its early introduction (1986, before MP3) and 5.1 support made it a common choice for that to store 5.1. It being supported (alongside DTS) in digital interconnects like TOSLINK helped too Quality/limitations is comparable to MP2, MP3 (verify), including that it has some trouble with impulses, high frequency stereo(verify) Best above 192kbps(verify) Cf. its successor, Dolby Digital Plus See also Electronics_project_notes/Audio_notes_-_multichannel_and_surround#AC-3	[5]
AC-4		e.g. one of the various codecs allowed in ATSC 3.0 broadcast	[6]
ADX		Has an ADPCM variant and MPEG-2 variant	[7]
AIFF-C, AIFC		Compressed variant[8] of AIFF (or, on macOS, possibly uncompressed but little-endian instead)
AIFF		IFF-based format used for audio
AMBE	Speech	Proprietary	[9]
AMR, AMR-NB		Patented.
AMR-WB	Speech	Patented. See also G.722.2	[10]
AMR-WB+		Patented.
Apple Lossless (ALAC)	Lossless	Proprietary. Made for iOS and iTunes, which also chose not to support FLAC. tl;dr: Apple people prefer it, purely for ease.	[11]
apt-X		Lossless variant exists.	[12]
ATRAC		Family of codecs	[13]
CELP and variants	speech	Can be a codec in itself, but now usually understood as a group of variants (ACELP, RCELP, LD-CELP, VSELP, others), or used as a part of some codec (QCELP, many), though the lines between these can be blurry.
CELT	general, low latency	royalty-free, open standard. Basically merged into Opus now	[14]
CVSD, CVSDM	Speech		[15]
Dolby Digital Plus		a.k.a. DD+, DDP. E-AC-3 (Enhanced AC-3), EC-3	[16]
DSS (Digital Speech Standard)			[17]
DTS		Patented. Seen on DVDs, comparable to AC3(verify), also in that it needs a comparatively high bitrate to sound decent.(verify)	[18]
DTS-HD (DTS++)		Extension of DTS	[19]
DRA			[20]
Dolby TrueHD	Lossless	Uses (and expands on) Meridian Lossless Packing	[21]
EVRC	Speech	Used in CMDA2000	[22]
EVRC-B	Speech	Used in CMDA2000	[23]
FLAC	Lossless		[24]
GSM-EFR, GSM 06.60	Speech		[25]
GSM-FR, GSM 06.10	Speech		[26]
GSM-HR, GSM 06.20			[27]
HILN	Speech		[28]
iLBC (Internet Low Bit Rate Codec)			[29]
Impala		Used in FORscene	[30]
iSAC (Internet Speech Audio Codec)	Speech	Proprietary	[31]
ITU-T G.721 (Superceded by G.726)	Speech	ADPCM at 32 kbit/s.
ITU-T G.723 (Superceded by G.726)	Speech	ADPCM at 24 and 40 kbit/s	[32]
ITU-T G.723.1	Speech		[33]
ITU-T G.726	Speech	ADPCM at 16, 24, 32, and 40 kbit/s (meant to supersede G.721, and G.723)	[34]
ITU-T G.711	Speech	A-Law/mu-Law PCM at 64kbit/s	[35]
ITU-T G.718	Speech	32 to 128 kbit/s for speech and decent-quality music	[36]
ITU-T G.719	Speech		[37]
ITU-T G.722	Speech	SB-ADPCM at 48, 56 and 64 kbit/s	[38]
ITU-T G.722.1	Speech	(24 and 32 kbit/s)	[39]
ITU-T G.722.2	Speech	Often refers to AMR-WB (Adaptive Multi-Rate Wideband)	[40]
ITU-T G.728	Speech	16kbit/s, LD-CELP	[41]
ITU-T G.729	Speech	CS-ACELP	[42]
ITU-T G.729.1	Speech		[43]
Monkey's Audio (APE)	Lossless		[44]
MPEG-1(/MPEG-2) Layer I, Layer II, Layer III (MP3)	Music	You usually want Layer III (MPEG-2 extends options somewhat) Sometimes has trouble with impulses, stereo high frequency content (regardless of bitrate)(verify)	[45] [46] [47]
MPEG-1(/MPEG-2) Layer II	Music	(MPEG-2 extends options somewhat)
MPEG-1(/MPEG-2) Layer III (MP3)	Music	(MPEG-2 extends options somewhat)
MPEG-4 ALS	Lossless		[48]
MPEG-4 DST			[49]
MPEG-4 HVXC	Speech		[50]
MPEG-4 HE-AAC			[51]
MPEG-4 SLS			[52]
MPEG-4 Structured Audio
NIST		(niche?)	[53]
OSQ	Lossless	(proprietary, so generally Steinberg-only)	[54]
OptimFROG	Lossless	(proprietary lossless format)	[55]
MusePack (MPC)	Music		[56]
NeXT/Sun (.au)		Originally headerless (8000 Hz 8-bit μ-law PCM data), later with header, varied bith depth, and various lossy compression options	[57]
SVOPC	Speech	Designed to deal with packet loss. Previously used by Skype (where it has been replaced with SILK)	[58]
SILK	Speech	Used e.g. by skype. Patented, royalty free uses possible.	[59]
VMR-WB	Speech		[60]
TTA (True Audio)	Lossless		[61]
Truespeech	Speech	Proprietary	[62]
QDesign/Ravesound		Previously known as LBPack. Used by older Quicktime variants	[63]
WAVPack	Lossless		[64]
LPAC, LTAC	Lossless	Lossless Predictive/Transform Audio Compression. Has mostly become MPEG-4 Audio Lossless Coding instead.	[65] [66]
Meridian Lossless Packing (MLP)	Lossless	Used in DVD-Audio, and in HD DVD, Blu-Ray through Dolby TrueHD	[67]
Opus	Music/general	patent-free open standard. Successor to Vorbis, more applicable to lower latency(verify)	[68] [69]
RTAudio	Speech	Proprietary	[70]
RealAudio	Speech, music		[71]
SHN (Shorten)	Lossless		[72]
Siren 7, Siren 14, Siren 22	Speech	Patented (royalty free use possible). See also G.722.1	[73]
Speex	speech	Beats various older speech codecs at low-bitrate speech	[74] [75]
TwinVQ		Proprietary (Yamaha, NTT)	[76]
SMV (Selectable Mode Vocoder)	Speech	Used in CMDA2000	[77]
Vorbis	Music/general	patent-free open standard	[78] [79]
VOX (Dialogic ADPCM)		4-bit ADPCM, often 8000Hz sampling, less commonly 6000Hz	[80]
WMA	Music	Proprietary format. There are variants on basic WMA targeting higher quality audio (WMA Professional), voice coding (WMA Voice), lossless coding (WMA Lossless). Hardware players tend to not support these. Optional DRM. Music quality comparable to Ogg, AAC(verify)	[81]

Lossless codecs tend to compress to perhaps half of what the 44kHz Stereo PCM samples would take. There is variation, but not very much.

libraries:

libsamplerate (and wrappers, like pysamplerate)

file format reading:

libaudiofile (AIFF, AIFC, WAVE, and NeXT/Sun) [82]
various mpeg libraries primarily for MPEG1 (which includes MP3), sometimes MPEG2

And see also:

http://www.cnpbagwell.com/audio.html

Perceptive quality measures:

MOS - Mean Opinion Score [83]
PEAQ [84]
PSQM (ITU-T P.861) [85]
PESQ (ITU-T P.862) [86]

File formats

MP3

WAV and friends

The WAVE format (often WAV due to the file extension) is a specific use of RIFF

WAV is usually raw, uncompressed linear PCM - but it can also store ADPCM, uLaw (verify)(verify).

As a fairly simple file format carrying raw data, it is useful as an interchange format.

W64

BWF

BWF is an extension of WAV that is used e.g. by (non-linear) digital recorders.

BWF files still use the .WAV extension, and is backwards compatible because it mostly just adding metadata in extension chunks.

It adds things like timecodes

ITU recommendation ITU-R BS.1352-3, Annex 1.

https://en.wikipedia.org/wiki/Broadcast_Wave_Format

RF64

Like BWF, but breaks WAV/BWF's 4GB limitation, and allows more channels(verify).

Used in broadcasting, and audio archiving.

https://en.wikipedia.org/wiki/RF64

Sound programming, sound coding, sound codecs

Contents

conversion

General/wider purpose audio programming

Helpers, codecs simple and complex

File formats

MP3

WAV and friends

W64

BWF

RF64

Navigation menu