Image file format notes

From Helpful
(Redirected from Webp)
Jump to: navigation, search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Digital sound and processing: capture, storage, reproduction · programming and codescs · some glossary · Audio and signal processing - unsorted stuff

Image: file formats · image processing

Video: format notes · encoding notes

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions

Unsorted: Signal analysis, modeling, processing (some audio, some more generic)

For more, see Category:Audio, video, images

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Method / limitations:

  • Basically stores DCT factors for 8x8 pixel tiles, so best for things that resemble gradients, which is why it's better at photos than at line art, text, and other sharp edges
Compression comes mainly from quantizing those factors and huffman-coding them (some more from some image-aware cleverness)


See also:

Notes on JPEG file structure

At one point I wanted to detect whether a JPEG was lossless or not, so wanted to parse the basic file structure (without decoding the image).

A JPEG file consists of a number of segments.

All segments start with FF and a marker byte.

Some markers imply a fixed-structure, fixed-size segment, including:

  • D8 - Start Of Image (SOI). No data follows.
Probably all JPEG files start with the SOI
  • D9 - End Of Image (EOI). No data follows.
Some JPEGs have been seen without EOI, but typically it's there.
  • D0 through D7 - restarts. No data follows.
  • DD - sets restart interval. Size is (verify)

Most other segments mention their size, and the structure is:

  • 0xFF (1 byte)
  • marker (1 byte)
  • datasize (2 bytes, big-endian)
...which refers to the whole payload, including these two size-indicating bytes
  • data (byte length as indicated by datasize, minus two (the size of the datasize field))

Some common markers:

  • 0xE0 - APP0, most typically used to store some generic file metadata (e.g. JPEG version)
  • 0xE1 - APP1, most typically used to store EXIF metadata metadata
  • 0xC0 - Start Of Frame for baseline JPEG
  • 0xC2 - Start Of Frame for progressive JPEG
  • 0xC4 - huffman tables
  • 0xDB - quantization tables
  • 0xFE - comment
  • 0xDA - start of scan

Some specific notes:

  • Start Of Frame:
These also mention image size, channels, bits per channel
There were originally two Start Of Frame choices (the ones mentioned above). While those two are still the most common, there now seem to be approximately nine. (verify)
Typical JFIF files store 8 bits-per-channel RGB pixel data
There are 12-bit JPEGs, part of the spec but only supported by some specialist software (e.g. medical applications).
You can omit the chrominance data for a grayscale JPG (verify)
you can use a paletted mode
  • APP0
some webpages suggest the JPEG header is always the SOI followed by APP0 (i.e. FF D8 FF E0, you may recognize it as ÿØÿà). JPEG FIF does demand both its presence and this position, so probably 99%+ of JPEGs have an APP0 there. File-structure-wise it is technically optional, so a truly generic parser might parse the segments as they appear.
  • quantization tables (0xdb)
Most JPEG compressors store two tables, one for Y, and one for both Cr and Cb.
Digital cameras are more likely to store three.
each segment may contain one or more quantization tables. Typically all tables are in the same segment, but there are cases where they are stored separately.


JPEG originally only referred to a compression method, and separately a file container format.

It did not fully specify how to treat a compressed stream as an image, though. For this and a few other reasons, JFIF {{{1}}} was created as a better-defined file format programs could more easily support.

What we now call JPEG files are basically always JFIFs, because that's what anyone sane would want.

Revisions of JPEG/FJIF span from roughly 1991 to 2016(verify)

See also:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

A true color, lossless compression format aimed at sharp contrast, but also compresses photographic images decently. Allows an alpha channel.

On quality and size:

  • for diagram-style images, compression is comparable with GIF, and is better quality than JPEG
if you don't need animation, it's generally preferable over GIF
if you need the transparency channel, it's preferable over GIF and JPEG
  • for gradient/photographic images it compresses worse compared to medium-quality ('good enough') JPEG, and comparable to JPEG at highest settings (sometimes larger than JPEG, because JPEG can fudge over fine noise, while PNG necessarily preserves it)
For web content, smaller size can be more important than quality, which is a tradeoff you can't make with PNG, and you'll probably still want JPEG.

All web browsers now support PNG (IE was the last to solve a list of related bugs, but has decent support since IE7), and operating systems have widely accepted it now (e.g. for icons).

Method / limitations:

  • lossless compression
  • paletted, greyscale, or RGB
  • No animation in the standard (see APNG, also MNG, but they are not widely supported yet and may not be anytime soon)
  • Many specific formats, with different features/support in older browsers (see e.g. [1])

Compressors include

Lossfull (smart palette quantization)

See also:

See also:

PNG structure notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

File structure is broadly:

  • Eight-byte header: 137 80 78 71 13 10 26 10
  • an IHDR chunk
  • content chunks
  • an IEND chunk

A chunk is:

  • 4-byte uint - data field's length (of payload, so excluding length, type, and crc fields)
  • 4-byte uint - type
  • data bytes
  • 4-byte CRC

On types:

mostly text, but uses some specific bits to convey further information.
including whether it is critical or ancillary
Types allow extension of the format with older readers and/or editors, having tules about known / unknown safe-to-copy / unsafe-to-copy critical / ancillary chunks


Animated PNG is an unofficial extension to PNG, which produces images backwards compatible, so PNG decoders that don't know about APNG will still open it (just show the first frame).

APNG is supported by a number of browsers and some image editors, but not widely enough to use it professionally.

See also:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

MNG (Multiple-image Network Graphics) is a close relation of PNG (written by the same team).

It allows animation - to address the possibility for its replacement of GIF in web and other areas.

Not widely adopted. While various software has adopted it (without us really noticing), web browsers generally haven't.

See also:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

TIFF has grown over time in terms of what it can store. Version 6 of the TIFF specs (~1992, and the document encompasses previous versions), the version most things today use, allows things like varied bit depth, color spaces, compression, also making it a useful lossless format.

TIFF is a container format, and by itself has blocks of metadata (IFDs), containing tags detailing the format of image blocks those also point to.

Since IFDs can store relatively flexible sets of tag-value pairs, and pointers to data within the file, and chain to more IFDs, this allows for one or more images (e.g. raw lossless image data plus a lossy JPG thumbnail).

It also means you can easily tell simpler decoders "just ignore the parts you don't understand"

Notes on TIFF file structure

A TIFF file starts with: a header, then an initial IFD.

The 8-byte header is

  • 2 bytes indicating the endianness used in the IFDs
namely "II" (0x49 0x49) for little-endian (referring to Intel) and "MM" (0x4D 0x4D) for big-endian (Motorola)
  • 2 bytes: the 16-bit integer with value 42, in the file's endianness
  • int32: absolute offset of the first IFD

So TIFFs start with either

0x49 0x49 0x2A 0x00
0x4D 0x4D 0x00 0x2A
also, the first IFD is typically immediately after the header, so its offset, stored in the next four bytes, is typically 8

IFDs (basically 'a block of metadata') are:

  • int16: amount of entries in this IFD
  • each entry is 12 bytes, namely:
    • 2-byte tag (the thing TIFF is named for), see also TIFF tag reference
    • 2-byte type
    • 4 bytes: amount of values in this entry (frequently 1, but depends on tag)
    • int32: data
      • for datatypes storing in <=4 bytes, the data itself ('inlined')
      • otherwise the absolute offset to value
  • int32: absolute offset of the next IFD (0 if it's the last)

Many IFDs detail an image, so typically point to a large chunk of data, and contain enough tags to interpret that data.

IFDs can also point to further IFDs. This ability to chain IFDs allows for multiple images and, well, anything else.

The simplest TIFFs have just one data block, and one IFD pointing to it, and give just enough detail on how to interpret that data (pixel data, compression, etc).

Larger data chunks that are pointed to can be anywhere in the file, though seem to often come right after the IFD that refers to them(verify), just because that's easier to manage by the TIFF writer ('write IFD and its data, then we also immediately know the offset the next IFD can come in on').

Beyond an IFD chaining to the next TIFF-standard IFD, you also have properties that can point to private IFDs (or to non-TIFF structures - at which point it's a custom use of TIFF, with data that standard TIFF parsers will, as per standard, will ignore).

This is all perfectly valid because the offsets in the standard parts just don't point to said custom data blocks. But it can only be fully understood by a reader for this custom format.

Digital camera's raw files (Canon's CRW and CR2, Nikon's NEF, Pentax's PEF, Samsung's PEF, Sony's SR2 and ARW, Fuji's RAF) may e.g. have an IFD and data for EXIF, IPTC, XMP, GPS, a tiny thumbnail, a medium thumbnail, and camera-raw data (thumbnails are often JPEG, raw often does not fit standard TIFF image types). Usually only the company that made it knows quite everything about it.

While most of these are based on TIFF structure, they are not expected to be opened in generic TIFF viewers/editors, so can can deviate somewhat. (Some do not, strictly speaking, conform to TIFF6, e.g. omitting some tags that TIFFs should have. This may be more of a technical point - it may still be readable.)

(...which is one reason for DNG, which is a more open standard (also based on TIFF))

BigTIFF refers to a variant that uses 64-bit offsets, meaning files can be larger than 4GB.

Not formally part of the standard, but present in libtiff for a while now (and note it's not too hard to adapt other implementations).


  • The header mentions 43 instead of 42 (and has two extra fields you're probably not going to ever use)
  • IFD and tag structures are larger, but follow the same rules
note that since the 'data' field is larger, more value types are now inlined

Notes on compression

See also:

See also


Method / limitations:

  • Paletted, with a maximum of 256 colors, so best for line art and grayscale images, not for photo quality
    • (technically you can use many frames and many palette updates to render one image in way more colors, but it's inefficient and not all decoders like doing this)
  • Two versions, GIF87a and GIF89a, the latter being adding delays so being the most interesting for showing animation
  • Uses LZW - which was patented in 1985 and expired in 2003 or 2004 (varying with country)
in theory you could skip compression but this makes for rather large files
so this spurred development of PNG. PNG now seems the handier choice for most things, except animation (GIF is still the only widely supported image formats that does this).

Notes on GIF file structure

Broad structure

For the structure and possible order of related chunks, see diagrams like that on [2] - can be interesting when writing a parser.

  • The GIF file header consists of:
    • The six-byte file header, either
    • the logical screen descriptor (LSD), which is seven bytes, and which must be present, so is effectively part of the file header (see next section for details)

  • A sequence of one of the following three:
    • an application extension (see below)
    • a comment extension
    • stuff that you can see, which itself can consist of:
      • An optional GCE (Graphics Control Extension, see below) before one of the following two:
        • Image Desciptor + optional local color table + image data
        • (optional) Plain Text Extension (rarely used)
  • A 0x3B byte (; character) as a GIF file trailer.

Notes on extension blocks:

  • The Netscape extension block is common for animated GIFs (though not required)
    • This extension may only be placed directly after the LSD
    • It only really contains the loop instruction
  • The Graphic Control Extension (GCE) is very common in animated GIFs, as it mentions each frame's duration (and also its transparency)
  • Comment extension - used for for attribution, 'created by this-and-that software', and such
  • Plain text extension mentions text to be rendered. Rarely used, and a number of decoders simply ignore it.


The Logical Screen Descriptor consists of:

  • 2 bytes: logical screen width (LSB-order 16-bit int)
  • 2 bytes: logical screen height (LSB-order 16-bit int)
  • 1 byte with bit packing:
    • 1 bit: whether the global color table is present (and follows). Most of the below is only relevant if it is.
    • 3 bits: color resolution - "Number of bits per primary color available to the original image". 000 means 1, 001 means 2, ..., 111 means 8
    • 1 bit: sort flag - whether the colors in the global color table are sorted in order of decreasing use, which can help the decoder
    • 3 bits: size of color table - which will take 2(value+1) bytes
  • 1 byte: which color is the background color (only meaningful if there is a global color table)
  • 1 byte: aspect ratio. Most images use 0, meaning using the storage's ratio (square pixels). If nonzero, the spec says the ratio should be calculated like (byte-value + 15) / 64
  • optional global colour table, if mentioned by the LSD (almost every animated GIF has this)

Extension blocks consists of:

  • ! character (0x21 byte)
  • Extension label, a single byte to distinguish between GCE, comment extension, plain text extension, application extension
  • data (many first include a size, but comment doesn't)
  • 0x00 byte

Image Desciptor + optional local color table + actual image data consists of:

  • optional Graphics Control Extension (GCE)
    • has local pallette? (if not, the global one will be used)
    • has transparent color? -
    • user input required? - usually ignored
    • duration, in hundreths of seconds, though
    • removal controls (verify) (do nothing, replace with previous image, replace with background color)
  • image data block (OR plain text block, for text to be rendered, but these are generally not supported, and so ingored)
    • , character (0x2C byte) signifies start of Image Descriptor
    • 2 bytes: left postition on logical screen (LSB-order 16-bit int)
    • 2 bytes: top postition on logical screen (LSB-order 16-bit int)
    • 2 bytes: frame wdidth (LSB-order 16-bit int)
    • 2 bytes: frame height (LSB-order 16-bit int)
    • 1 byte: packed
      • 1: color color table present/follows
      • 1: interlaced?
      • 1: sort flag (see notes on global color table)
      • 2: reserved
      • 3: size of local color table
    • optional local color table
    • image data

See also

Other notes

(Animated) GIF rendering

The LSD defines a logical image. You can consider it the canvas that individual frames draw on. Individual frames need only draw onto part of it (optimized animated GIFs use that fact). Also, each frame may have its own distinct disposal method (specified by the GCE that is probably before it).

Because of transparency, background, frame disposal, and the partial-frame-update detail, animated GIFs should be rendered incrementally for them to display correctly. Viewers do this, but this can be a significant detail to programmatically reading (and efficiently writing) GIF files.

You can sometimes do without that, if all frames update all of the logical screen, and have no transparency (but that often means the GIF is larger than necessary).

Because most real-world GIF rendering is done onto a true color target, you can have a true-color GIF, by updating it in small frames (at worst 256 pixels at a time, each with their own palette) with zero delay. This is rarely done, because there's rarely any point.

Roughly speaking, rendering is done by:

  • Allocating the logical screen
  • For each frame:
    • update only non-transparent regions (each frame may have transparency and may have a unique color index for it)
      • ...using the local color table if present, or the global color table if not
    • dispose as specified by the GCE (usually "don't do anything", but not always)

Individual/overall palette

For an animated gif, each frame can have its own optimized pallete.

In theory, adaptive palette choice makes each individual frame as close to the original as possible, but this is only useful for slow-updating things, for example diagram sequences.

For short-interval, video-like GIF (probably the more common case), the same independent adaptiveness means every frame may have the same input color be somewhat different, which can look like continuous flickering.

You can instead calculate a palette based on all the input frames and use it for all frames. This means fewer colors and more change, but the transitions will be smooth. Of course, this only works for shortish sequences that don't change scene too much - the more colors, the worse it'll look. (Note that it does mean you can omit the local palette)

See also

GIFs that aren't GIFs

GIF were originally popular primarily because it was the only standard for animation, supported early and widely.

The recent internet has started using 'GIF' to mean 'any looping animation', which now occasionally are actually HTML5 video tags pointing at H.264 video and/or WebM video.

This may be done by companies to save bandwidth.

They seem to consider GIF a brand name, to attract people with. (when they feel the need to label it a GIF, it probably isn't)

Purists may twitch, and point out that the restrictions of GIF are part of the charm, that throwing refined video compressor at the problem means everything is not equal and people will probably have people make larger files, that CPU use is higher, that browser policies around video tags (e.g. being the main reason you want
autoplay loop playsinline muted
for iOS)
may break things more easily and that those seem to change every now and then, that proper fallbacks take much more knowledge, and that older browsers may not support them while supporting GIF well, whereas GIFs have just worked everywhere since the nineties.

Yet in popular and functional use, you can do exactly the same, they will often look better and/or takes less bandwidth, so there's certainly something to be said for it.


Can be seen as

  • an improvement on JPEG (~25% smaller at similar perceptual quality)
  • now with some PNG-like features (~25% smaller when lossless)

Supports alpha channel.

Animation was added later (though support was not immediate), also making it an alternative over GIF (different quality/size choices).

Created by google for the web, it's mainly read by browsers though support in some (e.g. Firefox and Edge) came much later, Safari still doesn't like it (though seems to maybe plan it?[3]), so you always want to use it it in a *<picture> with alternatives, or JS polyfill, or whatnot.

A bunch of image editors also open it.

Related to VP8 and WebM

See also:

PBM, PNM, etc.

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

These files store uncompressed Per-pixel values, in either ASCII or a simple binary form. They are very easy for programs to write and read, and arguably very CPU-efficient as there is no decompression to do. They are of course not space-efficient, and they contains almost no metadata.

They are probably most interesting as an intermediate pixel format, as there are converters to/from many formats, which means you can rely on external conversion instead of dozens of specific libraries. Seen used in quick-'n'-dirty experiments, in scripts, and such.

PNM is a collective name for:

  • PBM ('portable bit map'), two-color (1 bit per pixel)
  • PGM ('portable gray map') (typically 8 bits per pixel, 16-bit also seen in later versions(verify))
  • PPM ('portable pixel map'), color in the form of r,g,b triplets (typically 8 bits per channel, 16 also seen in later versions(verify))

They are often implicitly BT.709 coded, with gamma applied. Some programs allow specifying/assuming non-gamma'd form too.

The 16-bit variants imply some endianness decisions for the binary forms - which isn't standardized, though MSB

PNM files start with two-byte magic, in ASCII

  • P1 - PBM, ASCII
  • P2 - PGM, ASCII
  • P3 - PPM, ASCII
  • P4 - PBM, binary
  • P5 - PGM, binary
  • P6 - PPM, binary
  • P7 - PAM, binary

PAM is a generalization of PNM, which adds depth and tuple type to the basic metadata, and only uses binary form.

The netPBM package deals with:

  • PNM (see above), and
  • PAM, a more general version of pbm-style formats, which can store all PNM formats, and some more options.

See also:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

DjVu is a free and open file format. It has different codecs for bitonal and gradient images, and applies well to high-resolution scanned text, manuscripts, magazines, line art, and such, for which is reaches noticeably higher compression than most general-purpose/photo image formats (and/or, at the same size, better legibility than various other formats).

It can optionally contain a text layer (often OCRed).

Layers and types

Document and document pages are layered/composited (see also ITU-T T.44), and each layer can be coded in a different way. In some cases, those layers may have come from different color channels from the same original image.

DjVu's coders are probably best at are two-tone images, such as text. The separation between background layer(s), foreground layer(s), and masks helps - it allows e.g. highly compressable two-color coding for the text and a different choice for, say the paper texture (for scanned text).

Images and documents

DjVu images are valid single-page DjVu documents.

Multi-page DjVu files are either Bundled (into a single file), or Indirect (where the main document is an index pointing to individual DjVu document files, which is handier when serving individual pages sparsely, a little less handy for sending things around).


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Command examples

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Viewing, converting

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
  • Browser plugins
  • Separate viewers WinDjView and MacDjView [4]
  • There are also various viewers for PDAs, phones and such


  • CGI on-the-fly conversion
  • If you have a to-PDF-printer such as the Adobe's, PrimoPDF, doPDF(?) or similar, you can create a PDFs using any DjVu viewer (or anything else) that can print.

See also

See also:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
  • WMF (Windows MetaFile): Primarily calls into Windows's GDI[5] library
  • EMF: more commands than WMF
  • EMF+: calls into GDI+ (newer version of GDI)
  • .wmz is gzipped wmf
  • .emz is gzipped emf

Since they mostly call an API, they basically executables and are easier to support on windows than elsewhere. For the same reason, there have been security exploits of the API using these image files.

The format is not very common, though various programs can still export it, and it's seen in contexts like clipart libraries.

The ability to load/convert on non-windows:

  • libwmf allows rasterization, and has some command line conversions, like wmf2eps and wmf2svg
  • openoffice imports it
  • UniConvertor [6]
  • inkscape? (depends on wmf2svg from libwmf?)

See also:

RAW photo formats

See also:


LDF: Bi-tonal format

WBMP (Wireless Bitmap) - monochrome, 1 bit per pixel data.

JPEG 2000