Music fingerprinting and identification
- http://labrosa.ee.columbia.edu/projects/coversongs/ paper task
- http://www.foosic.org/libfooid.php
- http://werner.onlinux.be/Papers/bpm04/index.html
Analysis and/or fingerprinting
See also
- [1] Cano et al. (2002) "A review of algorithms for audio fingerprinting"
- [2] Wood (2005), "On techniques for content-based visual annotation to aid intra-track music navigation"
Software and ideas
This list focuses on software and ideas that a project of yours may have some hope of using. There are more (see links below) that are purely licensed services.
Acoustid notes
Acoustid is the overall project.
Chromaprint is the fingerprinting part. The standalone fingerprinter is called fpcalc (which hashes the start of a file).
Licenses[1]:
- The client is LGPL
- the server is MIT license
- the data is Creative Commons Attribution-ShareAlike (verify)
Used e.g. by MusicBrainz (based on submission, e.g. via Picard, Jaikoz, or anything else that uses the API), making it interesting for music identification and tagging.
See also:
Echoprint notes
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
tl;dr: pointless (for consumers)
Echonest is the company.
Echoprint is a fingerprint-like thing, produced by its acoustic code generator (codegen), which was open sourced in 2011.
Their metadata storage/searching server is also available.
Echonest's data is owned by them, but publicly available - the license basically says "if you use our data and add to it, you must give us your additions".
They also have a lot of metadata and fingerprints.
However, their service -- to look up songs from ~20 seconds of audio -- was closed in late 2014,
basically because Spotify had bought echonest.
You can still look at their metadata, you can still use their data, codegen is still available (being MIT-licensed code), so would have to build your own database/search service from their components.
See also:
pHash notes
A few algorithms, for image, video, audio. See http://www.phash.org/docs/design.html
Audioscout is based on the audio
See also:
Audioscout
See also:
fdmf
last.fm's fingerprinter
Combination of fingerprinter and lookup client. Available as source.
Fingerprinter is based on [2]
Fingerprinter license: GPL3
Client lookup: "Basically you can do pretty much whatever you want as long as it's not for profit."
See also:
Fooid notes
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
Fooid is a fairly simple, and FOSS music fingerprinting library. It's mostly a simplified spectrogram, allowing fuzzy comparisons between songs and is pretty decent at near-duplicate detection.
While still available, it seems defunct now? (website's been dead for a while)
foosic seems related?
What a signature represents
To summarize: libfooid
- takes the first 90 seconds (skipping silence at the start)
- resamples to 8KHz mono (helps reduce influence of sample-rate differences, high frequency noise, and some encoder peculiarities)
- does an FFT
- sorts that into 16 Bark bands
Since these strengths are about to be packed into few bits (name 2 bits), it is first rescaled so that the most typical variation will will be relatively distinguishing (based on a bunch of real-world music).
Per frame, the information you end up with is:
- the strength in 16 Bark bands (2-bit),
- which band was dominant in this frame(verify).
Fingerprint and matching
A full fingerprint consists of 424 bytes (printable as 484-character hex), consisting of
- A 10-byte header, recording
- version (little-endian 2-byte integer, should currently be zero)
- song length in hundreths of seconds (little-endian 4 byte integer)
- average fit (little-endian 2 byte integer)
- average dominant line (little-endian 2 bytes integer)
- 414 bytes of data: 87 frames worth of data (each frame totals 38 bits, so the last six bits of those 414 bytes are unused. For each frame, it stores:
- fit: a 2-bit value for each of 16 bark bands
- dom: a 6-bit value denoting the dominant spectral line
Fit and dom are non-physical units in a fixed scale (differentin the averages), so that they are directly comparable between fingerprints.
The header is useful in itself for discarding likely negatives - if two things have a significantly different length, average fit, or average line, it's not going to be the same song (with some false-negative rate for different values of 'significantly').
You can:
- do some mildly fuzzy indexing to select only those that have any hope of matching
- quickly discard potentials based on just the header values
- get an fairly exact comparison value by decoding the fingerprint data and comparing those values too.
With the detailed comparison, which yields a 0.0..1.0 value, it seems that(verify):
- >0.95 means it's likely the same song
- <0.35 means it's likely a different song
- inbetween means it could be a remix, in a similar style, or just accidentally matches in some detail (long same-instrument intro)
MusicIP, MusicDNS, AmpliFIND
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
Proprietary, and its latest lifecycle seems to be a licensed B2B service without public interface.
The company was first called Predixis, later and best known as MusicIP (~2000), died in 2008, relaunched as AmpliFIND(verify) Music Services (in 2009?), sold its intellectual property to Gracenote (2011? 2006?).
Probably most known for the MusicDNS service (which was at some point rebranded as AmpliFIND(verify), which mostly consists of:
- Their servers - for comparison, and which returned PUID (Portable Unique IDentifiers) on close-enough matches
- a client library - which generates an acoustic summary and queries using it
When an acoustic query to their databases matches something closely enough, a PUID is returned, which seems to be a randomly generated identifier (not a fingerprint).
All the interseting parts are proprietary. The MusicDNS client library implements 'Open Fingerprinting Architecture', but this is only about the querying, which is sort of useless without the acoustical analysis, lookup method, or the data.
Relatable TRM
Proprietary.
Used by Musicbrainz for a while, which found it useful to find duplicates, but its lookup had problems with collisions, and scaling (meaning its server was unreliably slow), and Relatable did not seem to want to invest in it, so its use in MusicBrainz was replaced.
http://www.relatable.com/tech/trm.html
MusicURI
See also