Music fingerprinting and identification

From Helpful
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical terms
MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images


Analysis and/or fingerprinting

See also

http://en.wikipedia.org/wiki/Acoustic_fingerprint
[1] Cano et al. (2002) "A review of algorithms for audio fingerprinting"
[2] Wood (2005), "On techniques for content-based visual annotation to aid intra-track music navigation"


Software and ideas

This list focuses on software and ideas that a project of yours may have some hope of using. There are more (see links below) that are purely licensed services.


Acoustid notes

Acoustid is the overall project.

Chromaprint is the fingerprinting part. The standalone fingerprinter is called fpcalc (which hashes the start of a file).

Licenses[1]:

The client is LGPL
the server is MIT license
the data is Creative Commons Attribution-ShareAlike (verify)


Used e.g. by MusicBrainz (based on submission, e.g. via Picard, Jaikoz, or anything else that uses the API), making it interesting for music identification and tagging.



See also:

Echoprint notes

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

tl;dr: pointless (for consumers)


Echonest is the company.

Echoprint is a fingerprint-like thing, produced by its acoustic code generator (codegen), which was open sourced in 2011.

Their metadata storage/searching server is also available.

Echonest's data is owned by them, but publicly available - the license basically says "if you use our data and add to it, you must give us your additions".

They also have a lot of metadata and fingerprints.


However, their service -- to look up songs from ~20 seconds of audio -- was closed in late 2014, basically because Spotify had bought echonest.

You can still look at their metadata, you can still use their data, codegen is still available (being MIT-licensed code), so would have to build your own database/search service from their components.


The Echo Nest



See also:

pHash notes

A few algorithms, for image, video, audio. See http://www.phash.org/docs/design.html

Audioscout is based on the audio

See also:


Audioscout

See also:

fdmf

http://www.w140.com/audio/


last.fm's fingerprinter

Combination of fingerprinter and lookup client. Available as source.

Fingerprinter is based on [2]


Fingerprinter license: GPL3

Client lookup: "Basically you can do pretty much whatever you want as long as it's not for profit."

See also:

Fooid notes

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Fooid is a fairly simple, and FOSS music fingerprinting library. It's mostly a simplified spectrogram, allowing fuzzy comparisons between songs and is pretty decent at near-duplicate detection.


While still available, it seems defunct now? (website's been dead for a while)

foosic seems related?



What a signature represents

To summarize: libfooid

takes the first 90 seconds (skipping silence at the start)
resamples to 8KHz mono (helps reduce influence of sample-rate differences, high frequency noise, and some encoder peculiarities)
does an FFT
sorts that into 16 Bark bands


Since these strengths are about to be packed into few bits (name 2 bits), it is first rescaled so that the most typical variation will will be relatively distinguishing (based on a bunch of real-world music).

Per frame, the information you end up with is:

  • the strength in 16 Bark bands (2-bit),
  • which band was dominant in this frame(verify).

Fingerprint and matching

A full fingerprint consists of 424 bytes (printable as 484-character hex), consisting of

  • A 10-byte header, recording
    • version (little-endian 2-byte integer, should currently be zero)
    • song length in hundreths of seconds (little-endian 4 byte integer)
    • average fit (little-endian 2 byte integer)
    • average dominant line (little-endian 2 bytes integer)
  • 414 bytes of data: 87 frames worth of data (each frame totals 38 bits, so the last six bits of those 414 bytes are unused. For each frame, it stores:
    • fit: a 2-bit value for each of 16 bark bands
    • dom: a 6-bit value denoting the dominant spectral line

Fit and dom are non-physical units in a fixed scale (differentin the averages), so that they are directly comparable between fingerprints.


The header is useful in itself for discarding likely negatives - if two things have a significantly different length, average fit, or average line, it's not going to be the same song (with some false-negative rate for different values of 'significantly').

You can:

  • do some mildly fuzzy indexing to select only those that have any hope of matching
  • quickly discard potentials based on just the header values
  • get an fairly exact comparison value by decoding the fingerprint data and comparing those values too.


With the detailed comparison, which yields a 0.0..1.0 value, it seems that(verify):

  • >0.95 means it's likely the same song
  • <0.35 means it's likely a different song
  • inbetween means it could be a remix, in a similar style, or just accidentally matches in some detail (long same-instrument intro)

MusicIP, MusicDNS, AmpliFIND

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Proprietary, and its latest lifecycle seems to be a licensed B2B service without public interface.


The company was first called Predixis, later and best known as MusicIP (~2000), died in 2008, relaunched as AmpliFIND(verify) Music Services (in 2009?), sold its intellectual property to Gracenote (2011? 2006?).


Probably most known for the MusicDNS service (which was at some point rebranded as AmpliFIND(verify), which mostly consists of:

  • Their servers - for comparison, and which returned PUID (Portable Unique IDentifiers) on close-enough matches
  • a client library - which generates an acoustic summary and queries using it

When an acoustic query to their databases matches something closely enough, a PUID is returned, which seems to be a randomly generated identifier (not a fingerprint).

All the interseting parts are proprietary. The MusicDNS client library implements 'Open Fingerprinting Architecture', but this is only about the querying, which is sort of useless without the acoustical analysis, lookup method, or the data.

Relatable TRM

Proprietary.

Used by Musicbrainz for a while, which found it useful to find duplicates, but its lookup had problems with collisions, and scaling (meaning its server was unreliably slow), and Relatable did not seem to want to invest in it, so its use in MusicBrainz was replaced.


http://www.relatable.com/tech/trm.html


MusicURI

See also