Playlist file notes

From Helpful
Jump to: navigation, search
These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.


File extension: .m3u and .m3u8

MIME: audio/x-mpegurl

Plain text file, line-based.

Simple example:

Greatest Hits\Example.ogg

Extended-m3u example:

#EXTINF:123,Sample Artist - Sample title
#EXTINF:321,Example Artist - Example title
Greatest Hits\Example.ogg

The number in the EXTINF line is the length in seconds.


.m3u8 is used as a file extension that indicates use of UTF-8 encoding -- regular .m3u files are unspecified, but historically were frequently Latin1.


You can use EXTINF lines before the path/URL, to add length and title. When you use EXTINF, the file must start with EXTM3U

Many players save EXTINF-style M3U, but some seem to refuse (e.g. foobar2000).

See also:


File extension: .pls

MIME type: audio/x-scpls

Plain text file, ini-like.


Title1=Remote MP3
Title2=Radio stream

See also:

XSPF (XML Shareable Playlist Format)

File extension: .xspf

MIME type: application/xspf+xml



<?xml version="1.0" encoding="UTF-8"?>
<playlist version="1" xmlns="">
      <title>Foo - Bar</title>
      <title>Radio stream</title>

Most minimal form is track with locations. You can tack further information onto the playlists and the tracks.

See also:


File extension: .asx

MIME type: video/x-ms-asf

XML-based. Can be considered a metafile specific to microsoft's media streaming setup.

Simplified example:

<asx version="3.0">
  <title>Example Stream</title>
    <title>Short Announcement</title>
    <ref href="" />
    <title>Example radio</title>
    <ref href="" />

Can be useful to point windows-style players to RTSP, MMS, or HTTP streams (video, audio).

See also:


File extension: .ram, sometimes .rpm(verify)

Function much like ASX, but specific to RealMedia.

A plain-text one-per-line format, contaiing one or more URL references to realmedia files (typically http://, rtsp://, sometimes file://).


File extension: .smil

MIME type: application/smil+xml

A presentation description, can use text, images (including SVG), audio, video, other SMIL, and add timing and other playback control. Players that do SMIL include Quicktime, RealPlayer, Windows Media Player, and others.

There are a few cases of use for music playlists, but not many. I'm guessing supporting a small part of a standard to (ab)use it just for playlists, and to use a presentation description as an information carrier, seems weird.

SMIL sees other related uses, e.g. pointing to streams, in podcasting, and such.


  <audio src="" dur="12s" />
  <audio src="" dur="86400s" />

See also:

.fpl (foobar2000)

These playlists store a path to the file to play, and a bunch of the metadata as foobar itself had.

These are structured for fast loading into your own instance of foobar again, and are not made for exchange.

It's tied to some internals, because it has changed before and may do so again, so specs are not available. You can only really assume that foobar itself knows how to handle them.

As such, the below is not in any way definitive, and probably won't ever be. They are my findings on how my .fpl files worked, and this is here only as hints for your own decyphering attempts.

Overall structure

In the widest sense we have three things:

  • 16 bytes that seem to be file magic (e1 a0 9c 91 f8 3c 77 42 85 2c 3b cc 14 01 d3 f2) for this playlist format
  • a list of strings (the unique strings used, likely to avoid a lot of repetition in the stored file)
  • a bunch of playlist entries (spans the rest of the file)

List of strings

The list of strings starts with a 4-byte int (note: all ints are little-endian) that stores the byte length of the whole list-of-string-block's size. The block itself is just a bunch of back-to-back 0x00-terminated strings (which seem to be encoded in UTF-8). These strings are later referenced by their offset within this block (probably to allow a strcpy() with minimal pointer arithmetic).

Playlist block

The playlist block is the interesting one.

It starts with a 4-byte int indicating the number of entries/songs in the playlist, followed by that many entries.

Each entry

  • starts with 56 bytes of header,
  • followed by a variable-length key-value store (with offsets into the string block mentioned earlier), which has
    • its own header (amount of entries of each in the following and something I don't know)
    • non-interleaved section (allows for case of multiple values per key)
    • interleaved section (a simple sequence of key,value pairs)
(I'm not sure these are the clearest names here. I use them because some other code did)

Header (I've seen code with different offsets, suggesting that some things were added, and that this header has changed size over time)

byte offset     type      purpose
 0..3            int      ?    (often 1)
 4..7            int      filename  (offset within string block)
 8..11           int      sub song index
12..15           int      file size, in bytes
16..19           int?     ?    (often 0)
20..23           int?     ?    (often large)
24..27           int?     ?    (often large)
28..35           double   song duration (in seconds) 
36..39           float    album replay gain, adjustment
40..43           float    track replay gain, adjustment 
44..47           float    album replay gain, peak value
48..51           float    track replay gain, peak value
52..55           int      amount of 4-byte units in key-value section
                          (arguably is part of that section, not this header. Details.)

Key-value bit

The section as a whole starts with three 4-byte ints:

  • amount of noninterleaved keys
  • amount of interleaved keys
  • unknown. low-valued integer, presumably some amount

The non-interleaved section seems intended to be read into two lists, keys and values.

It starts with

  • integer pairs (list_index,string_offset) that seem to mean 'this list_index is meant for this key (by string_offset)'. Presumably, when list_index increments more than one, that means the previous key has multiple values(verify). From just this you don't know the size of the list you end up with(verify) -- though it is mentioned immediately after, so 2*(amount of noninterleaved keys) items later.

It seems to continue with

  • an integer that indicates the amount of values that follow(verify)
  • that many integers, offsets into the string block for all the values

The interleaved section seems to be just pairs of integers, both offsets into the string block.

For example, consider that we've just started parsing the key-value part (starting immediately after the 'amount of that follow' (which in this case had value 45)) and got the integers:

8, 8, 26, 0, 52, 1, 58, 2, 65, 3, 2650, 5, 28425, 6, 75, 7, 81, 8, 87, 9, 40311, 40328, 40328, 41434,
41525, 22895, 40515, 41580, 841, 212, 40539, 224, 230, 234, 243, 249, 258, 260, 271, 277, 1786, 314,
39864, 332, 2775

For readability, say that we've looked up the strings: (technically we only know which are strings slightly later, but this is here to illustruate file structure, not code)

8, 8,  26, 
0, 'album', 1, 'artist', 2, 'band', 3, 'comment', 5, 'date', 6, 'genre', 7, 'title', 
8, 'tracknumber', 9, 'MyAlbumName', 'MyArtistName', 'MyArtistName', 'MyComment1',
'MyComment2', '2005', 'MyGenreName', 'MyTitle', '5', 
'bitrate', '128',  'codec', 'MP3',  'encoding', 'lossy',  'channels', '2',
'samplerate', '44100', 'mp3_stereo_mode', 'joint stereo',  'codec_profile', 'CBR',
'tagtype', 'id3v2|id3v1'


  • 8 pairs of non-interleaved key information
  • 8 interleaved pairs
  • 26 - don't know (verify)
  • non-interleaved pairs:
    • album goes to key/val position 0
    • artist to position 1
    • band to position 2
    • comment to position 3, and also to position 4 (...the latter because the next position is 5)
    • date to position 5
    • genre to position 6
    • title to position 7
    • tracknumber to position 8
    • 9 to indicate 9 values that follow, according to the 9 positions we just got the keys for (depending on how you want to parse, this can be very useful information before you start reading in the key information - and you can calculate its position trivially based on the amount of non-interleaved keys)
    • that many strings, which match in position to the key positions
  • interleaved part:
    • 8 pairs of strings (starting at 'bitrate') that are keys and values

Thanks to the following, and others: