Difference between revisions of "Image file format notes"

From Helpful
Jump to: navigation, search
m (GIFs that aren't GIFs)
m (Notes on JPEG file structure)
 
Line 150: Line 150:
  
 
A JPEG file consists of a number of '''segments'''.
 
A JPEG file consists of a number of '''segments'''.
 +
  
 
All segments start with a FF byte and a marker byte to say what type it is.
 
All segments start with a FF byte and a marker byte to say what type it is.
  
Some segments have '''no data''', and implicitly no size field.  
+
Some segments have '''no data''', and will have no size field.  
 
This includes SOI, RST0 through RST7 (D0..D7), EOI, and more.
 
This includes SOI, RST0 through RST7 (D0..D7), EOI, and more.
  
Line 165: Line 166:
  
  
Probably the most notable exception is ''Start of Scan'', which ''does'' store a size, but it's what you could call a header, because in the file this is followed by one image worth of data, and you cannot know its length of without decoding it -- or guessing that it's the rest of the file minus the EOI.
+
Probably the most notable exception is ''Start of Scan'', which ''does'' store a size, but it's the size of what you could call a header, because in the file this is directly followed by one image worth of data, and you cannot know its length of without decoding it -- or guessing that it's the rest of the file minus the EOI.
  
  
Line 182: Line 183:
 
: EOI (D9) - end of image
 
: EOI (D9) - end of image
  
There's a bunch more markers defined, but aside from some more APP0..APP15 metadata and a few things like COM, it seems they're not used very often.
+
There's a bunch more markers defined, but aside from some more APP0..APP15 metadata and a few things like COM, most are not used very often.
  
  
Line 204: Line 205:
  
 
* You also see COM (FE), comment
 
* You also see COM (FE), comment
 +
:: limited to ~64k (via the segment size)
 +
:: any amount can appear{{verify}}, but since this is ''generally'' treated as an arbitrary-text field (the standard doesn't really say), various taggers may add to existing COM tags / merge COM tags.
  
  

Latest revision as of 13:38, 24 October 2021

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · image processing

Video: format notes · encoding notes · On display speed


Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Digital sound and processing: capture, storage, reproduction · programming and codescs · some glossary · Audio and signal processing - unsorted stuff


Electronic music: Some history, ways of making noises · Gaming synth · on APIs (and latency) ··· microphones · studio and stage notes · Effects · sync ·

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround ·

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

JPEG

Basically stores DCT factors for 8x8 pixel tiles, so best for things that resemble gradients, which is why it's better at photos than at line art, text, and other sharp edges

Compression comes mainly from quantizing those factors, which makes them well compressible using (lossless) Huffman coding. It's also assisted by working in a color space that leads to middling values(verify).


JPEG in practice is a little more ad-hoc than you'ld think.

The original standard has a lot of variants that are basically research versions that have never been used in practice, meaning that implementing the original standard is impractical. Or involves patents that people wanted to avoid(verify).

At the same time, there are sort-of-proprietary extensions that have become de facto standards because they're not too difficult to support, and some that are just specific-purpose. As well as niche uses, like the DICOM standard (used around medical imaging) allowing embedding of JPEG frames that bare JPEG rarely if ever uses (e.g. lossless).

There are multiple ways of doing some of the same things, such as lossless compression. The original lossness format was a late and not-really-standard(verify) addition in 1993, but later there is also JPEG-LS, and JPEG 2000 (if you even consider JPEG2000 that the same format). But then some decoders may not decode any of them.

And some completely non-standard things.


So even when many implementations may be more widely capable, programs seem to try to write JPEGs according to the original core JPEG(verify), to produce images that can be read anywhere.


On JPEG, JIF, JFIF, etc.

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Early days, JPEG referred mostly to a bunch of compression methods, and JIF as its basic container(verify).

JIF (JPEG Interchange Format[1]) came in the original standard, which was complex to implement, and also did not fully specify how to treat pixels as an image, e.g. not specifying color space, aspect ratio, or subsampling registration(verify)), and some parts of that standard may have involved patents(verify).


For this and a few other reasons, JFIF (= JPEG FIF = JPEG File Interchange Format) was was created both

to reduce that standard to the more useful set of methods, making a simpler, easier to implement overall format (removing the more fanciful and rarely used encoding modes)
and extend it with some metadata to make it better defined

At byte level it works basically the same (segments and markers), the main difference is what you put in there. What we now call JPEG files are basically always JFIFs, because that's what anyone sane would want.


Exif, being a more generic "store metadata around sound or image formats" can be seen as another container, an alternative mostly to JFIF (in that it also addresses the shortcomings of JIF) that is not directly compatible with JFIF.


Exif is occasionally used to store stores JPEG data, but Exif is a wider container format, also allowing TIFF for uncompressed images (and also extending to audio - RIFF PCM, IMA-ADPCM).

This seems not very common. While it can store more metadata, it's extra steps to view.


Note that exif-style metadata is a smaller part of exif, and is frequently seen embedded within standard (non-Exif) JPEGs, in APP1.

Revisions

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Revisions of JPEG/FJIF span from roughly 1991 to 1993ish for the original format, and up to now if you count all possible extensions(verify)


Initial versions

JFIF 1.00 - first version (not released?)
JFIF 1.01 - December 10, 1991
JFIF 1.02 [2] - September 1, 1992
not too many changes. e.g. added an optional JFXX segment, capable of storing a compressed thumbnail image

...soon standardized into

ISO 10918

and also

ITU-T T.81, T.83, T.84, T.86, T.871
ECMA TR/98 (JFIF)



The original versions were largely by C-Cube Microsystems[3] (now defunct), but it was probably the Independent JPEG Group (IJG)'s implementation that helped popularize it.

IJG developed it on, and submitted extensions to ISO an ITU-T (verify)

IJG's libjpeg versioning[4]:

  • version 1 (1991)
  • version 2 (1991)
  • version 3 (1992)
  • version 4 (1992)
  • version 4a (1993)
  • version 5 (1994)
  • version 5a (1994)
  • version 5b (1995)
  • version 6 (1995)
  • version 6a (1996)
  • version 6b (1998)
  • version 7 (2009)
  • version 8 (2010)
  • version 8a (2010)
  • version 8b (2010)
  • version 8c (2011)
  • version 8d (2012)
  • version 9 (2013)
  • version 9a (2014)
  • version 9b (2016)
  • version 9c (2018)
  • version 9d (2020)

Versions since 7, and particularly 8 and 9, include entirely new methods, some never standardized, so use of most of these will make files that many JPEG readers (any not based on a recent IJG libjpeg) will not be able to read.


Then there are

JPEG XT [5]

intends to add extensions (like higher bit depth, alpha channel, lossless) in a backwards compatible way (file structure wise, not necessarily decodable?(verify))
ISO 18477 (verify)

JPEG LS [6]

lossless / near lossless coding.
different and better than the original lossless JPEG coding, and faster than JPEG 2000 at comparable compression levels
ISO 14495

JPEG 2000 [7]

ISO 15444

JPEG XR [8]

seems to be about more dynamic range in the pixel values?
ISO 29199


...which are based on JPEG but at best so extended that basic JPEG library will probably not decode them unless these specifications are specifically implemented.



See also:

Notes on JPEG file structure

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

At one point I wanted to detect whether a (not very standard) JPEG dataset was lossless or not, so wanted to parse the basic file structure (without decoding the image). These are notes from that.


A JPEG file consists of a number of segments.


All segments start with a FF byte and a marker byte to say what type it is.

Some segments have no data, and will have no size field. This includes SOI, RST0 through RST7 (D0..D7), EOI, and more.


Most other segments mention their size, and their structure is typically:

  • 0xFF (1 byte)
  • marker (1 byte)
  • datasize (2 bytes, big-endian)
...size of data and these two size-indicating bytes
  • data (byte length as indicated by datasize, minus two (the size of the datasize field))


Probably the most notable exception is Start of Scan, which does store a size, but it's the size of what you could call a header, because in the file this is directly followed by one image worth of data, and you cannot know its length of without decoding it -- or guessing that it's the rest of the file minus the EOI.



A fairly minimal JPEG file will look something like:

SOI (D8) - start of image
APP0 (E0) - storing version (will be 1.00, 1.01, or 1.02), and density, and an optional thumbnail (seems quite rarely used)
a SOF variant (SOF0..SOF15 are C0..CF)
but probably SOF0 (baseline sequential, huffman) or SOF2 (progressive, huffman)
stores image size, channels, bits per channel
DQT (DB) - quantization tables, one or more (and can come before SOF)
DHT (C4) - huffman tables (DHT), one or more
SOS (DA) - start of scan
compressed image data following the SOS
EOI (D9) - end of image

There's a bunch more markers defined, but aside from some more APP0..APP15 metadata and a few things like COM, most are not used very often.


Some notes:

  • APP0
some webpages suggest the JPEG header is FF D8 FF E0 you may recognize it as ÿØÿà) which is the SOI and the first two bytes of APP0.
JPEG FIF does demand both its presence and this position, so probably almost all JPEGs have an APP0 there.
  • There are actually two standard(verify) APP0 variants, one with JFIF as identifier, and one with JFXX
JFIF stores density and optional uncompressed thumbnail
JFXX stores optional thumbnail (which can be compressed, basically as a simplified embedded JPEG)
  • There are a bunch of other APP0..APP15 uses, and then with varied identifiers [9], mostly for a lot of brand/device specific metadata, things like ICC profiles, or indicating specific types of data
In the wild apparently more commonly:
APP1 (E1), most typically used to store Exif metadata (TIFF based data), sometimes XMP metadata (XML) (verify)
APP14 (EE), used by Adobe to mentions some color transform stuff
APP13 (ED), used by Adobe for TIFF-style(verify) tags (IPTC?)
APP12 (EC), used by some older cameras(verify)
  • You also see COM (FE), comment
limited to ~64k (via the segment size)
any amount can appear(verify), but since this is generally treated as an arbitrary-text field (the standard doesn't really say), various taggers may add to existing COM tags / merge COM tags.


  • SOF
apparently
mostly SOF0 (baseline sequential, huffman)
occasionally SOF2 (progressive, huffman)
rarely SOF1 (extended sequential, huffman) or others, perhaps SOF9 (extended sequential, arithmetic)(verify), (many other of the 16 possible are defined, a few are used in niche cases, and most never[10])
(...in one large sample of recent images, two thirds were C0, a third was C2, only a handful were C1s, and nothing else)
SOF0 and SOF1 are decoded the same way, and the only difference is that baseline restricts the amount of huffman and quantization tables to 2 each, extended to 4 each.(verify)
lossless is SOF3 (original?), or possibly SOF7, SOF11, or SOF15 (verify)
  • Some JPEGs have been seen without EOI, but typically it's there.
  • quantization tables (0xdb)
Most JPEG compressors store two tables, one for Y, and one for both Cr and Cb.
Digital cameras are more likely to store three.
each segment may contain one or more quantization tables. Typically all tables are in the same segment, but there are cases where they are stored separately.
  • Adobe CYMK JPEGs are not standard JPEGs



See also:

MJPEG

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Motion JPEG is essentially a stream of separate JPEG images.

It's not really a video format, being iframe-only, and not even specifying a framerate. Say, IP cameras presumably just push out frames as they appear.

It's also not a standard itself, but uses in specific contexts are typically documented, e.g.

Microsoft documents how they use in in AVI files
Apple documents how they use it in QuickTime files
RFC2435 documents how they use it in RTP streams
web browsers tend to support it
(is this wrapped in MIME?[11])

See also

https://en.wikipedia.org/wiki/Motion_JPEG
https://www.loc.gov/preservation/digital/formats/fdd/fdd000063.shtml


Motion JPEG 2000 is more standard, but also rather more complex, being based on MP4/Quicktime, which also makes the choice for a more efficient video codec easier.

https://en.wikipedia.org/wiki/Motion_JPEG_2000



PNG

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

A true-color, lossless compression format aimed at sharp contrast diagram style images, but also compresses photographic images decently. Allows an alpha channel.


On quality and size:

  • for diagram-style images, compression is comparable with GIF, and is better quality than JPEG
if you don't need animation, it's generally preferable over GIF
if you need the transparency channel, it's preferable over GIF and JPEG
  • for gradient/photographic images it compresses worse compared to medium-quality ('good enough') JPEG, and comparable to JPEG at highest settings (sometimes larger than JPEG, because JPEG can fudge over fine noise, while PNG necessarily preserves it)
For web content, smaller size can be more important than quality, which is a tradeoff you can't make with PNG, and you'll probably still want JPEG.

All web browsers now support PNG (IE was the last to solve a list of related bugs, but has decent support since IE7), and operating systems have widely accepted it now (e.g. for icons).


Method / limitations:

  • lossless compression
  • paletted, greyscale, or RGB
  • No animation in the standard (see APNG, also MNG, but they are not widely supported yet and may not be anytime soon)
  • Many specific formats, with different features/support in older browsers (see e.g. [12])


Compressors include

Lossfull (smart palette quantization)


See also:


See also:


PNG structure notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

File structure is broadly:

  • Eight-byte header: 137 80 78 71 13 10 26 10
  • an IHDR chunk
  • content chunks
  • an IEND chunk


A chunk is:

  • 4-byte uint - data field's length (of payload, so excluding length, type, and crc fields)
  • 4-byte uint - type
  • data bytes
  • 4-byte CRC


On types:

the four bytes that make up type aer typically ASCII characters (e.g. IHDR, cHRM, gAMA, tRNS, PLTE, hIST, pHYs, IDAT, tEXt, tIME, IEND)
though uses some specific bits (basically casing?(verify)) to convey further information, including whether it is critical or ancillary
Types allow extension of the format with older readers and/or editors, having rules about known / unknown safe-to-copy / unsafe-to-copy critical / ancillary chunks

APNG

Animated PNG is an unofficial extension to PNG, which produces images backwards compatible with PNG, so that classical PNG decoders will just decode the first frame.

APNG is supported by many browsers and some image editors, but arguably not quite widely enough to use as a generic format, but has its specific uses.

See also:

MNG

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

MNG (Multiple-image Network Graphics) is a close relation of PNG (written by the same team).

It allows animation - to address the possibility for its replacement of GIF in web and other areas.

Not widely adopted. While various software has adopted it (without us really noticing), web browsers generally haven't.

See also:

TIFF

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


TIFF has grown over time in terms of what it can store.

Version 6 of the TIFF specs (~1992, and the document encompasses previous versions), the version most things today use, allows things like varied bit depth, color spaces, compression, also making it a useful lossless format.

TIFF is a container format, and by itself has blocks of metadata (IFDs), containing tags detailing the format of image blocks those also point to.



Notes on TIFF file structure

A TIFF file starts with: a header, then an initial IFD.

The 8-byte header is

  • 2 bytes indicating the endianness used in the IFDs
namely "II" (0x49 0x49) for little-endian (referring to Intel) and "MM" (0x4D 0x4D) for big-endian (referring to Motorola)
  • 2 bytes: the 16-bit integer with value 42, in the file's endianness
  • int32: absolute offset of the first IFD


So TIFFs start with either

0x49 0x49 0x2A 0x00
0x4D 0x4D 0x00 0x2A
people looking for file magic may like to know the first IFD is typically immediately after the header, so its offset, stored in the next four bytes, is typically 8


IFDs (basically 'a block of metadata') are:

  • int16: amount of entries in this IFD
  • each entry is 12 bytes, namely:
    • 2-byte tag (the thing TIFF is named for), see also TIFF tag reference
    • 2-byte type
    • 4 bytes: amount of values in this entry (frequently 1, but depends on tag)
    • int32: data
      • for datatypes storing in <=4 bytes, the data itself ('inlined')
      • otherwise the absolute offset to value
  • int32: absolute offset of the next IFD (0 if it's the last)


Since IFDs can store relatively flexible sets of tag-value pairs, and pointers to data within the file, and chain to more IFDs, this allows for one or more images (e.g. raw lossless image data plus a lossy JPG thumbnail).

It also means you can easily tell simpler decoders "just ignore the parts you don't understand"


Many IFDs detail an image, so typically point to a large chunk of data, and contain enough tags to interpret that data.


IFDs can also point to further IFDs. This ability to chain IFDs allows for multiple images and, well, anything else you care to (ab)use it for.


The simplest TIFFs have just one data block, one IFD pointing to it, and give just enough detail on how to interpret that data (pixel format, compression, etc).


Things can be ordered arbitrarily as long as all the offsets are correct. Sometimes there is reason to order things in a specific way (e.g. all metadata together to scan through things with fewer seeks), but generally it's whatever was easiest for the TIFF writer code, which is why an IFD and its data are often adjacent.


Beyond an IFD chaining to the next TIFF-standard IFD, you also have properties that can point to private IFDs (or to non-TIFF structures - at which point it's entirely custom use of TIFF, with data that standard TIFF parsers will ignore because it's not an IFD and/or a tag known to them).

This is perfectly valid in terms of file structure, though such files can only be fully understood by a reader for this custom format.

For example, digital camera's raw files (Canon's CRW and CR2, Nikon's NEF, Pentax's PEF, Samsung's PEF, Sony's SR2 and ARW, Fuji's RAF) may e.g. have an IFD and data for EXIF, IPTC, XMP, GPS, a tiny thumbnail, a medium thumbnail, and camera-raw data (thumbnails are often JPEG, raw often does not fit standard TIFF image types so is often custom). Usually only the company that made it knows quite everything about it. (Some do not, strictly speaking, conform to TIFF6, e.g. omitting some tags that TIFFs are required to have. This may be more of a technical point - it may still be readable.)

(...which is one reason for DNG, which is a more shared standard (also based on TIFF))



BigTIFF refers to a variant that uses 64-bit offsets, meaning files can be larger than 4GB.

Not formally part of the standard, but present in libtiff for a while now (and it's not too hard to adapt in other implementations).

Differences:

  • The header mentions 43 instead of 42 (and has two extra fields you're probably not going to ever use)
  • IFD and tag structures are larger, but follow the same rules
note that since the 'data' field is larger, more value types are now inlined

Notes on compression

See also:


See also

GIF

Method / limitations:

  • Paletted, with a maximum of 256 colors, so best for line art and grayscale images, not for photo quality
    • (technically you can use many frames and many palette updates to render one image in way more colors, but it's inefficient and not all decoders like doing this)
  • Two versions, GIF87a and GIF89a, the latter being adding delays so being the most interesting for showing animation
  • Uses LZW - which was patented in 1985 and expired in 2003 or 2004 (varying with country)
in theory you could skip compression but this makes for rather large files
so this spurred development of PNG. PNG now seems the handier choice for most things, except animation (GIF is still the only widely supported image formats that does this).


Notes on GIF file structure

Broad structure

For the structure and possible order of related chunks, see diagrams like that on [13] - can be interesting when writing a parser.


  • The GIF file header consists of:
    • The six-byte file header, either
      GIF89a
      or
      GIF87a
    • the logical screen descriptor (LSD), which is seven bytes, and which must be present, so is effectively part of the file header (see next section for details)


  • A sequence of one of the following three:
    • an application extension (see below)
    • a comment extension
    • stuff that you can see, which itself can consist of:
      • An optional GCE (Graphics Control Extension, see below) before one of the following two:
        • Image Desciptor + optional local color table + image data
        • (optional) Plain Text Extension (rarely used)
  • A 0x3B byte (
    ;
    character) as a GIF file trailer.


Notes on extension blocks:

  • The Netscape extension block is common for animated GIFs (though not required)
    • This extension may only be placed directly after the LSD
    • It only really contains the loop instruction
  • The Graphic Control Extension (GCE) is very common in animated GIFs, as it mentions each frame's duration (and also its transparency)
  • Comment extension - used for for attribution, 'created by this-and-that software', and such
  • Plain text extension mentions text to be rendered. Rarely used, and a number of decoders simply ignore it.

Details

The Logical Screen Descriptor consists of:

  • 2 bytes: logical screen width (LSB-order 16-bit int)
  • 2 bytes: logical screen height (LSB-order 16-bit int)
  • 1 byte with bit packing:
    • 1 bit: whether the global color table is present (and follows). Most of the below is only relevant if it is.
    • 3 bits: color resolution - "Number of bits per primary color available to the original image". 000 means 1, 001 means 2, ..., 111 means 8
    • 1 bit: sort flag - whether the colors in the global color table are sorted in order of decreasing use, which can help the decoder
    • 3 bits: size of color table - which will take 2(value+1) bytes
  • 1 byte: which color is the background color (only meaningful if there is a global color table)
  • 1 byte: aspect ratio. Most images use 0, meaning using the storage's ratio (square pixels). If nonzero, the spec says the ratio should be calculated like (byte-value + 15) / 64
  • optional global colour table, if mentioned by the LSD (almost every animated GIF has this)


Extension blocks consists of:

  • ! character (0x21 byte)
  • Extension label, a single byte to distinguish between GCE, comment extension, plain text extension, application extension
  • data (many first include a size, but comment doesn't)
  • 0x00 byte


Image Desciptor + optional local color table + actual image data consists of:

  • optional Graphics Control Extension (GCE)
    • has local pallette? (if not, the global one will be used)
    • has transparent color? -
    • user input required? - usually ignored
    • duration, in hundreths of seconds, though
    • removal controls (verify) (do nothing, replace with previous image, replace with background color)
  • image data block (OR plain text block, for text to be rendered, but these are generally not supported, and so ingored)
    • , character (0x2C byte) signifies start of Image Descriptor
    • 2 bytes: left postition on logical screen (LSB-order 16-bit int)
    • 2 bytes: top postition on logical screen (LSB-order 16-bit int)
    • 2 bytes: frame wdidth (LSB-order 16-bit int)
    • 2 bytes: frame height (LSB-order 16-bit int)
    • 1 byte: packed
      • 1: color color table present/follows
      • 1: interlaced?
      • 1: sort flag (see notes on global color table)
      • 2: reserved
      • 3: size of local color table
    • optional local color table
    • image data

See also

Other notes

(Animated) GIF rendering

The LSD defines a logical image, which you can consider it the canvas that individual frames draw on, and they may choose to only draw onto part of it (optimized animated GIFs may well use this). Also, each frame may have its own distinct disposal method (specified by the GCE that is probably before it).

Because of transparency, background, frame disposal, and the partial-frame-update detail, animated GIFs can only really be rendered incrementally. Viewers do this, but this can be a significant detail to code reading (and efficiently writing) GIF files. It can be correct without incremental rendering, if all frames update all of the logical screen, and have no transparency (but that often means the GIF is larger than necessary).


Because most real-world GIF rendering is done onto a true color target, you can have a true-color GIF, by being an animated gif with zero delays, and updating it in small frames (at worst 256 pixels at a time, each with their own palette). This is rarely done, mostly because there's more suitable image formats you can be using.


Roughly speaking, rendering is done by:

  • Allocating the logical screen
  • For each frame:
    • update only non-transparent regions (each frame may have transparency and may have a unique color index for it)
      • ...using the local color table if present, or the global color table if not
    • dispose as specified by the GCE (usually "don't do anything", but not always)


Individual/overall palette

For an animated gif, each frame can have its own optimized pallete.

In theory, adaptive palette choice makes each individual frame as close to the original as possible, but this is only useful for slow-updating things, for example diagram sequences.

For short-interval, video-like GIF (probably the more common case), the same independent adaptiveness means every frame may have the same input color be somewhat different, which can look like continuous flickering.

You can instead calculate a palette based on all the input frames and use it for all frames. This means fewer colors and more change, but the transitions will be smooth. Of course, this only works for shortish sequences that don't change scene too much - the more colors, the worse it'll look. (Note that it does mean you can omit the local palette)


See also

GIFs that aren't GIFs

GIF were originally popular because it was the only standard for animation, supported in the web early and widely.


Various sites have started using 'GIF' to mean "looping animation without (or with) audio", which are now frequently HTML5 video tags pointing at VP8 video, H.264 video, and/or WebM video.

This may be done by companies to save bandwidth on larger/longer animations, and/or because they consider GIF to be a brand name to attract people with. (when they feel the need to label it as GIF, it more likely isn't). If it's for bandwidth, then such sites are likely to convert actual GIFs to video.

Purists may twitch at this, point out that it's confusing, or that the restrictions of GIF are part of the charm, that throwing refined video compressor at the problem means people will probably make larger files because they are unaware of this, that there are fewer places you can easily embed them, that browsers will have a heavier job, that the mixed support for the varied formats it may be means it's harder to save and reuse, that browser policies around video tags (e.g. being the main reason you want
autoplay loop playsinline muted
for iOS
may break things more frequently, and break without warning (Apple have changed that multiple times), that proper fallbacks to other formats take more knowledge or support fewer browsers, because/and that older browsers may not support them while supporting GIF well, whereas GIFs have just worked everywhere since the nineties.

And, of course, it didn't take long before we had "gifs" that were H.264 videos of hard to encode properly ordered dithering that few gif producers have used since the nineties. Because who's keeping track of what we're throwing at what walls?


Yet in popular and functional use, you can do exactly the same, they will often look better, and usually work, so hey.


https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/replace-animated-gifs-with-video

WebP

Can be seen as

  • an improvement on JPEG
~25% smaller when lossy at similar perceptual quality
  • now with some PNG-like features
~25% smaller when lossless
Supports alpha channel.


Created by google for the web. Mainly read by browsers, though support in some (e.g. Firefox, Edge, Safari) came later, and particularly Apple (Safari) seemed to specifically avoid support for years, so you should probably use it it in a HTML5 <picture> with fallbacks, or JS polyfill, or such.

Animation was added later. Support was not immediate but is now there [14], also making it an alternative over GIF (different quality/size choices).


A bunch of image editors also open it.


Related to VP8 and WebM

See also:

AVIF

A new (~2019) open source image format, royalty free format (based on AV1 video codec).


Comparable in idea to WebP.

Not widely supported yet[15], but expanding.


See also https://en.wikipedia.org/wiki/AV1#AV1_Image_File_Format_(AVIF)

PBM, PNM, etc.

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


These files store uncompressed Per-pixel values, in as ASCII numbers or a simple binary form. They are very easy for programs to write, and read, and arguably CPU-efficient as there is no decompression to do. They are not space-efficient, and can contain almost no metadata.

They are interesting

in simple scripts, in quick-'n'-dirty experiments
as an universal intermediate format, e.g. as input to compressors, so that those don't have to support all other formats to be able to handle them
as an intermediate format to hand around between programs you duct-tape together


PNM is a collective name for:

  • PBM ('portable bit map'), two-color (1 bit per pixel)
  • PGM ('portable gray map') (typically 8 bits per pixel, 16-bit also seen in later versions(verify))
  • PPM ('portable pixel map'), color in the form of r,g,b triplets (typically 8 bits per channel, 16 also seen in later versions(verify))

They are often implicitly BT.709 coded, with gamma applied. Some programs allow specifying/assuming non-gamma'd form too.

The 16-bit variants imply some endianness decisions for the binary forms - which isn't standardized, though MSB


PNM files start with two-byte magic, in ASCII

  • P1 - PBM, ASCII
  • P2 - PGM, ASCII
  • P3 - PPM, ASCII
  • P4 - PBM, binary
  • P5 - PGM, binary
  • P6 - PPM, binary
  • P7 - PAM, binary


PAM is a generalization of PNM, which adds depth and tuple type to the basic metadata, and only uses binary form.


The netPBM package deals with:

  • PNM (see above), and
  • PAM, a more general version of pbm-style formats, which can store all PNM formats, and some more options.



See also:

DjVu

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

DjVu is a free and open file format.

It has different codecs for bitonal and gradient images, and applies well to high-resolution scanned text, manuscripts, magazines, line art, and such, for which is reaches noticeably higher compression than most general-purpose/photo image formats (and/or, at the same size, better legibility than various other formats).

It can optionally contain a text layer (often OCRed).

This combination of things make it a useful format for archiving images of books.


Layers and types

Document and document pages are layered/composited (see also ITU-T T.44), and each layer can be coded in a different way. In some cases, those layers may have come from different color channels from the same original image.

DjVu's coders are probably best at two-tone images, such as text. The separation between background layer(s), foreground layer(s), and masks helps - it allows e.g. highly compressable two-color coding for the text and a different choice for, say the texture of a book's paper.


Images and documents

DjVu images are valid single-page DjVu documents.

Multi-page DjVu files are either Bundled (into a single file), or Indirect (where the main document is an index pointing to individual DjVu document files, which is handier when serving individual pages sparsely, a little less handy for sending things around).



Creating

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Command examples

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Viewing, converting

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
  • Browser plugins
  • Separate viewers WinDjView and MacDjView [16]
  • There are also various viewers for PDAs, phones and such


Conversion:

  • CGI on-the-fly conversion
  • If you have a to-PDF-printer such as the Adobe's, PrimoPDF, doPDF(?) or similar, you can create a PDFs using any DjVu viewer (or anything else) that can print.


See also

See also:

WMF, EMF, EMF+

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
  • WMF (Windows MetaFile): Primarily calls into Windows's GDI[17] library
  • EMF: more commands than WMF
  • EMF+: calls into GDI+ (newer version of GDI)
  • .wmz is gzipped wmf
  • .emz is gzipped emf


Since they mostly arguments towards a known drawing API, they can be seen as code that will be executed, and there have been several security exploits of the API using these image files.

The format is not very common, though various programs can still export it, and it's seen in contexts like clipart libraries.


The ability to load/convert on non-windows:

  • libwmf allows rasterization, and has some command line conversions, like wmf2eps and wmf2svg
  • openoffice imports it
  • UniConvertor [18]
  • inkscape? (depends on wmf2svg from libwmf?)


See also:

RAW photo formats

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


There seems to be a different RAW format for each camera manufacturer, some more recognizable than others, most only handled fully by bundled software, and some of the more professional image/photo tools.

There's also Adobe's DNG ('Digital NeGative'), meant to be a more portable format than the manufacturer specific ones. It looks like it was never quite as easy as it should have been, and so has not caught on in a big way.

Many are readable through conversion libraries like libraw or dcraw), though often not with full understanding/preservation of metadata.


Most of these formats are a variation on TIFF (structure-wise, not pixel-data-wise), but tend to be more restricted than general TIFF (presumably because TIFF is such a flexible format that a full parser is annoying to write).

Aside from possibly storing image data in somewhat unusual ways, they may also not strictly be compliant TIFFs (e.g. Sony's ARW does not store image ImageLength, which TIFF6 requires). --> See also:

Unsorted

LDF: Bi-tonal format

WBMP (Wireless Bitmap) - monochrome, 1 bit per pixel data.

JPEG 2000