Descriptions used for sound and music

From Helpful
Revision as of 17:14, 24 June 2018 by Helpful (Talk | contribs)

Jump to: navigation, search

Template:Audio and signal processing

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Physical effects and/or fairly well studied


Difference in energy (amplitude) of signal.

Attenuation in the widest sense refers to the concept in physics where loss of energy (i.e. amplitude reduction) occurs in a medium (be it electronic equipment, a wall affecting your wifi signal, or what happens when you hear yourself chew).

Attenuation is often measured in decibel.

In some contexts it is decibel per length measure or such, for example to specify expected signal loss in electrical wiring, or perhaps in sound isolation.

In electrical signal transmission, it can refer to problems relating to analog transmission over larger distances, and can be related to the expectable SNR (though there are more aspects to both signal and noise in transmission).

Physical attenuation often also varies with frequency, in which case you can make a graph, or give an average in the most relevant frequency region.

For example,

  • attenuation is the major reason we hear our own voice differently on recordings: we hear a good part of the lower frequencies through our body, while others only hear us through air (another reason is that some frequencies make it more directly to our ears)
  • microphones with stands made just of hard materials throughout are likely to pick up the vibrations of the things they stand on, which anything or anyone not in direct contact won't hear
  • materials used for sound insulation can be seen as bandstop filters (often relatively narrowband)

See also:

Tone versus noise content

Reflection, absorption, echo, reverb

Sound hitting a hard surface will be reflected.

Larger rooms are likely to be mostly hard (and also to have reverb)

An echo is an easily identifiable and usually quite singular copy of a sound, arriving later because it was reflected.

The delay is a significant aspects. Near walls, it is minimal, and you may easily receive more energy from reflections than from a source directly. (also note that localization is not affected horribly much)

When many echoes combine to be blurred and hard to identify, this is called reverb.

Free and diffuse field

These describe environments instead of qualities, but should probably be named closely to reflection and absorption, as rooms are almost always diffuse fields with reverb of some sort.

A diffuse field describes an environment with a high number of reflections, with reverberating repeats that arrive from all directions, in a more or less uniform way. (can also be used to refer to EM, e.g. light)

A free field refers to environments without boundaries and reflections, which is reverb-free.

Rooms in general are diffuse fields. Note that empty rooms, cathedrals and such have more noticeable reverb than rooms filled with as there are few soft objects to absorb sound.

Anechoic chambers are rooms that attempt to remove all echo and reverb, to simulate a space with only the source in question, and at the same time have the environment act as a free field. It is very common to see porous, wedge-shaped sound absorbers.



Amplitude modulation (a.k.a. tremolo)

Frequency modulation (a.k.a. vibrato)

Amplitide envelope (attack, decay, sustain, release)

(also in terms of attention)

Harmonic content

Beat and tempo

(Terms like beat and tempo are not very strictly defined or very consistently used. Learn the concepts, and read between people's lines.)

Not necessarily a hard-hitting thing, tempo or 'the beat' , is the thing we intuit into our foot-tapping when playing and listening to music. It is often a common interval, a base roster for musical events.

Meter is a rhythmic structure (often within a bar, or a few, whereas tempo was classically for a piece), groove more as the rhythmic feel that meter/tempo has.

For DJs mixing techno songs, it can be equated with the onsets of the loudest beats. For other music it is a more complex thing. The extremes are somewhat interesting to mention, if only to point out that at some point it becomes so ill defined that even humans can't make anything of it.

The tempo of most music lies within the 40-200 beats per minutes (BPM), with a median varying with music style, but often somewhere around 105 BPM.

You can also use measures per minute (or bars per minute, but that would be the same acronym), e.g. used in ballroom dancing as it is a more practical measure of speed to dance at than BPM.

For context: onset is primarily the start of a sound, specifically its sudden increase in amplitude. It is primarily an analytical thing, and research into human judgment of onsets is ongoing.

Onset is one way to go about computational beat/tempo detection, but not robust for all music types
Onsets don't always match the perception of tempo - consider e.g. blues with guitars, where fast strumming would easily make algorithms decide a factor higher than most humans would.

Musical key

Less studied, less well defined, and/or more perceptual qualities

Humans are quick to recognize and follow other properties, better than algorithmic approaches. They include:


Timbre often appears in lists of sound qualities, but it is very subjective and has been used as a catch-all term that generally it means something like "whatever qualities allow us to distinguish these two sounds (that are similar in pitch and amplitide)".

A large factor in this is the harmonic/overtone structure, but a lot more gets shoved in.

tonal contours/tracks (ridges in the spectrogram)

(particularly when continuous and followable)

Spectral envelope; its changes


Some different sounds / categories

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

There are various typologies of sounds, but many are very subjective in that they are not unambiguously resolvable to signal properties -- they are often somewhat forced.


  • continuous harmonic sounds, such as sines and other simple predictable signals
  • continuous noise (unpredictable in the time domain)
  • impulses (short lived)

Pulses, noises, and tones cold be seen as some simpler extremes in a continuum, wherevarious inbetweens could be described, such as:

  • tonal pulses / wavelets
  • tonal/narrow-band noise
  • pulsed noise bursts
  • chirp
  • various real-world noises, such as
    • rustle noise [1]
    • babble noise

You can argue about the perceptual use of these categories as they do not distinguish the same way we do.

Some useful-to-know music theory

On fingerprinting

Analysis and/or fingerprinting

See also
[1] Cano et al. (2002) "A review of algorithms for audio fingerprinting"
[2] Wood (2005), "On techniques for content-based visual annotation to aid intra-track music navigation"

Software and ideas

This list focuses on software and ideas that a project of yours may have some hope of using. There are more (see links below) that are purely licensed services.

Acoustid notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Acoustid is the overall project.

Chromaprint is the fingerprinting part. The standalone fingerprinter is called fpcalc (which hashes the start of a file).

Used by MusicBrainz (based on submission, e.g. via Picard, Jaikoz, or anything else that uses the API), making it interesting for music identification and tagging.


The client is LGPL
the server is MIT license
the data is Creative Commons Attribution-ShareAlike (verify)

See also: details on how it works


Things you can do offline:

  • calculate chromaprint
mostly meaningful for lookup in acoustid database

API calls: (limit to 3/sec, and see also )
  • look up chromaprint to acousttid track
will return a list of (acoustid track ID, certainty)
optionally further metadata (essentially adds the next call:)
  • look up metadata for acoustid track
often either
if you didn't ask for that metadata in the chromaprint lookup
or you have previously resolved to a track, and e.g. see whether its name or release details have changed since then
  • List AcoustIDs by MBID

  • submit acoustid
with or even without your file's tags. Basically for statistics of what's out there. (AcoustID fingerprinter is a program that makes this simpler)
you can wait for it to be processed (by default it will return when added to the queue, which is usually a few seconds of work)
(requires registration mostly for for statistics, quality filtering)
  • get status of submitted acoustid(s)
for when you submitted without waiting, but want to know

To understand the meaning and usefulness of AcoustIDs, you probably want to think from the perspective of MusicBrainz's and acoustid's data model


a recording is acoustically unique (a specific recording/mix)
a release is something you can buy (CD, LP, single, relreleases, etc.)
a release's tracks also have identifiers, to be able to tie in recordings
and often present as tracks on multiple releases . So it has identifiers for recordings, tracks, and releases (note that a track is on exactly one release).

Acoustid's centers around tracks (in MusicBrainz's model that would be a recording)

Different enumerated things:

  • tracks (acoustid track ID, a uuid)
  • recordings
  • fingerprints (fingerprint ID, basically enumerating unique fingerprint submissions)

For example, see - at the time of writing (different now),

  • is a single acoustid track
  • Has five (fingerprint,duration) that people have submitted and were assigned to this acoustid track
  • while this is a UUID, it is unrelated to MB's UUIDs.
  • has two musicbrainz recordings
the first being a (musicbrainz) track on one (musicbrainz) release
the second being five different musicbrainz tracks, each on their own musicbrainz release
in this case all with the same names, though that's not always exactly so

All this mostly matters because you can ask acoustid for MB details, and you have to decide to what degree you want to resolve this.

E.g. when tagging, you might choose to combine this with musicbrainz's metadata to see what release the combined set fits into best, by looking at other tag details. (Picard does a simple form of this)

When you are building a music player and just want to look up the artist and title text you can ignore the structure of the MB details you get back.

See also:

Echoprint notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Echonest is the whole company.

Echoprint is produced by its acoustic code generator (codegen), which was open sourced in 2011. Their metadata storage/searching server is also available.

Echonest's data is owned by them but publicly available - the license basically says "if you use our data and add to it, you must give us your additions".

They also have a lot of metadata and fingerprints, and a public service to look up songs from ~20 seconds of audio, and would often work on microphone-based recording.

In late 2014 (and basically because Spotify had bought echonest), echonest's closed that service.

You can still look at their metadata, you can still use their data, codegen is still available (being MIT-licensed code), you can build your own from their components, but you can no longer use their data or their lookup.

The Echo Nest

See also:

pHash notes

A few algorithms, for image, video, audio. See

Audioscout is based on the audio

See also:


See also:

fdmf's fingerprinter

Combination of fingerprinter and lookup client. Available as source.

Fingerprinter is based on [3]

Fingerprinter license: GPL3

Client lookup: "Basically you can do pretty much whatever you want as long as it's not for profit."

See also:

Fooid notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Fooid is a fairly simple, and FOSS music fingerprinting library, allowing fuzzy comparisons between songs.

It seems to work pretty well for near-duplicate detection.

While still available, it seems defunct now? (website's been dead for a while)

MusicIP, MusicDNS, AmpliFIND

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Proprietary, and its latest lifecycle seems to be a licensed B2B service without public interface.

The company was first called Predixis, later and best known as MusicIP (~2000), died in 2008, relaunched as AmpliFIND(verify) Music Services (in 2009?), sold its intellectual property to Gracenote (2011? 2006?).

Probably most known for the MusicDNS service (which was at some point rebranded as AmpliFIND(verify), which mostly consists of:

  • Their servers - for comparison, and which returned PUID (Portable Unique IDentifiers) on close-enough matches
  • a client library - which generates an acoustic summary and queries using it

The acoustic part is proprietary (The MusicDNS client library implements Open Fingerprinting Architecture, but this is only about the querying).

When an acoustic query to their databases matches something closely enough, a PUID is returned, which seems to be a randomly generated identifier (not a fingerprint).

Relatable TRM


Used by Musicbrainz for a while, which found it useful to find duplicates, but its lookup had problems with collisions, and scaling (meaning its server was unreliably slow), and Relatable did not seem to want to invest in it, so its use in MusicBrainz was replaced.


See also



Assigns a single color to fragments within music, to produce a color-over-time thing that informs of the sort of sound.

Mostly a CLI tool that reads audio files (using gstreamer) and outputs a file that essentially contains a simplified spectrogram.

Apparently the .mood generator's implementation

  • mainly just maps energy in low, medium, and high frequency bands to blue, green, and red values.
  • always outputs 1000 fragments, which means
useful to tell apart parts of songs
visual detail can be misleading if time-length is significantly different
not that useful for rhythmic detail for similar reasons

Something else renders said .mood file into an image, e.g. Amarok, Clementine, Exaile, gjay (sometimes with some post-processing).

The file contains r,g,b uint8 for each of the (filesize/3) fragments.

See also: