Music fingerprinting and identification

From Helpful
Jump to: navigation, search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Digital sound and processing: capture, storage, reproduction · programming and codescs · some glossary · Audio and signal processing - unsorted stuff

Image: file formats · image processing

Video: format notes · encoding notes

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions

Music related: Studio and stage notes · Notes on audio latency · Effects · Gaming synth · Some history, ways of making noises

Unsorted: Signal analysis, modeling, processing (some audio, some more generic)

For more, see Category:Audio, video, images

Analysis and/or fingerprinting

See also
[1] Cano et al. (2002) "A review of algorithms for audio fingerprinting"
[2] Wood (2005), "On techniques for content-based visual annotation to aid intra-track music navigation"

Software and ideas

This list focuses on software and ideas that a project of yours may have some hope of using. There are more (see links below) that are purely licensed services.

Acoustid notes

Acoustid is the overall project.

Chromaprint is the fingerprinting part. The standalone fingerprinter is called fpcalc (which hashes the start of a file).


The client is LGPL
the server is MIT license
the data is Creative Commons Attribution-ShareAlike (verify)

Used e.g. by MusicBrainz (based on submission, e.g. via Picard, Jaikoz, or anything else that uses the API), making it interesting for music identification and tagging.

See also:

Echoprint notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

tl;dr: pointless (for consumers)

Echonest is the company.

Echoprint is a fingerprint-like thing, produced by its acoustic code generator (codegen), which was open sourced in 2011.

Their metadata storage/searching server is also available.

Echonest's data is owned by them, but publicly available - the license basically says "if you use our data and add to it, you must give us your additions".

They also have a lot of metadata and fingerprints.

However, their service -- to look up songs from ~20 seconds of audio -- was closed in late 2014, basically because Spotify had bought echonest.

You can still look at their metadata, you can still use their data, codegen is still available (being MIT-licensed code), so would have to build your own database/search service from their components.

The Echo Nest

See also:

pHash notes

A few algorithms, for image, video, audio. See

Audioscout is based on the audio

See also:


See also:

fdmf's fingerprinter

Combination of fingerprinter and lookup client. Available as source.

Fingerprinter is based on [2]

Fingerprinter license: GPL3

Client lookup: "Basically you can do pretty much whatever you want as long as it's not for profit."

See also:

Fooid notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Fooid is a fairly simple, and FOSS music fingerprinting library. It's mostly a simplified spectrogram, allowing fuzzy comparisons between songs and is pretty decent at near-duplicate detection.

While still available, it seems defunct now? (website's been dead for a while)

foosic seems related?

What a signature represents

To summarize: libfooid

takes the first 90 seconds (skipping silence at the start)
resamples to 8KHz mono (helps reduce influence of sample-rate differences, high frequency noise, and some encoder peculiarities)
does an FFT
sorts that into 16 Bark bands

Since these strengths are about to be packed into few bits (name 2 bits), it is first rescaled so that the most typical variation will will be relatively distinguishing (based on a bunch of real-world music).

Per frame, the information you end up with is:

  • the strength in 16 Bark bands (2-bit),
  • which band was dominant in this frame(verify).

Fingerprint and matching

A full fingerprint consists of 424 bytes (printable as 484-character hex), consisting of

  • A 10-byte header, recording
    • version (little-endian 2-byte integer, should currently be zero)
    • song length in hundreths of seconds (little-endian 4 byte integer)
    • average fit (little-endian 2 byte integer)
    • average dominant line (little-endian 2 bytes integer)
  • 414 bytes of data: 87 frames worth of data (each frame totals 38 bits, so the last six bits of those 414 bytes are unused. For each frame, it stores:
    • fit: a 2-bit value for each of 16 bark bands
    • dom: a 6-bit value denoting the dominant spectral line

Fit and dom are non-physical units in a fixed scale (differentin the averages), so that they are directly comparable between fingerprints.

The header is useful in itself for discarding likely negatives - if two things have a significantly different length, average fit, or average line, it's not going to be the same song (with some false-negative rate for different values of 'significantly').

You can:

  • do some mildly fuzzy indexing to select only those that have any hope of matching
  • quickly discard potentials based on just the header values
  • get an fairly exact comparison value by decoding the fingerprint data and comparing those values too.

With the detailed comparison, which yields a 0.0..1.0 value, it seems that(verify):

  • >0.95 means it's likely the same song
  • <0.35 means it's likely a different song
  • inbetween means it could be a remix, in a similar style, or just accidentally matches in some detail (long same-instrument intro)

See also

  • forks on github

MusicIP, MusicDNS, AmpliFIND

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Proprietary, and its latest lifecycle seems to be a licensed B2B service without public interface.

The company was first called Predixis, later and best known as MusicIP (~2000), died in 2008, relaunched as AmpliFIND(verify) Music Services (in 2009?), sold its intellectual property to Gracenote (2011? 2006?).

Probably most known for the MusicDNS service (which was at some point rebranded as AmpliFIND(verify), which mostly consists of:

  • Their servers - for comparison, and which returned PUID (Portable Unique IDentifiers) on close-enough matches
  • a client library - which generates an acoustic summary and queries using it

When an acoustic query to their databases matches something closely enough, a PUID is returned, which seems to be a randomly generated identifier (not a fingerprint).

All the interseting parts are proprietary. The MusicDNS client library implements 'Open Fingerprinting Architecture', but this is only about the querying, which is sort of useless without the acoustical analysis, lookup method, or the data.

Relatable TRM


Used by Musicbrainz for a while, which found it useful to find duplicates, but its lookup had problems with collisions, and scaling (meaning its server was unreliably slow), and Relatable did not seem to want to invest in it, so its use in MusicBrainz was replaced.


See also