Image file format notes

From Helpful
(Redirected from AVIF)
Jump to navigation Jump to search

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: file format notes · video encoding notes · On display speed · Screen tearing and vsync

Simpler display types · Video display notes · Display DIY
Subtitle format notes


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical and technical terms
MIDI ·
Some history, ways of making noises
Gaming synth ·
VCO, LFO, DCO, DDS notes
microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DIY
physical
Electrical components, small building blocks
Learning from existing devices
Electronic music - modular - DIY


DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.

JPEG

Basically stores DCT factors for 8x8 pixel tiles, so best for things that resemble gradients, which is why it's better at photos than at line art, text, and other sharp edges

Compression comes mainly from quantizing those factors, which makes them well compressible using (lossless) Huffman coding. It's also assisted by working in a color space that leads to middling values(verify).


JPEG in practice is a little more ad-hoc than you'ld think.

The original standard has a lot of variants that are basically research versions that have never been used in practice, meaning that implementing the original standard fully is sort of a waste of time. Or involves patents that people wanted to avoid(verify).

At the same time, there are sort-of-proprietary extensions that have become de facto standards because they're not too difficult to support.

As well as niche uses, like the DICOM standard (used around medical imaging) allowing embedding of the sort of JPEG frames that bare JPEG would rarely if ever use (e.g. lossless).


There are often multiple ways of doing some of the same things, such as lossless compression. The original lossness format was a late and not-really-standard(verify) addition in 1993, but later there is also JPEG-LS (and JPEG 2000 though that's probably better considered and entirely distinct sformat).

And some completely non-standard things.

But then, some decoders may not decode any of these variations. Actually, a good amount of common decoders are more widely capable, but programs seem to try to write JPEGs according to the original core JPEG(verify), to produce images that can be read anywhere.


On JPEG, JIF, JFIF, etc.

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Early days, JPEG referred mostly to a bunch of compression methods, and JIF as its basic container(verify).

JIF (JPEG Interchange Format[1]) came in the original standard, but managed to not fully specify how to treat pixels as an image, e.g. not specifying color space, aspect ratio, or subsampling registration(verify)). Also, some parts of that standard may have involved patents(verify).


For this and a few other reasons, JFIF (= JPEG FIF = JPEG File Interchange Format) was was created to update JPEG (in the general sense) with a more clearly defined container, more metadata and to remove some methods nobody wanted or used.

This made it a little simpler to implement and a little more portable.

At byte level (segments and markers) it works basically the same, the main difference is some clearer restrictions on what you can put in there and how.


What we call JPEG files pretty much always follow JFIF. Which is why sometimes JPEG files have JFIF extension. It's technically more correct than using .jpeg (which technically could also be JIF), but in a way almost no one cares about


Exif is a different container format, also meant as a generic "store metadata around image or sound".

It's effectively another alternative that also addresses the shortcomings of JIF, but came almost two decades after JFIF(verify). It's structured in a similar way (e.g. main data in APP1 instead of APP0), and you can write a parser to deal with both, but it effectively defines some new uses of blocks (including "here's an embedded TIFF stream", which seems to be how it extends to audio - RIFF PCM, IMA-ADPCM(verify))) that older JFIF readers probably won't understand.

To further make things more complex, "Exif metadata" refers to metadata that you see in Exif files, but also within other files, including JFIFs (using an extra APP1(verify)).


Revisions

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Revisions of JPEG/FJIF span from roughly 1991 to 1993ish for the original format, and up to now if you count all possible extensions(verify)


Initial versions

JFIF 1.00 - first version (not released?)
JFIF 1.01 - December 10, 1991
JFIF 1.02 [2] - September 1, 1992
not too many changes. e.g. added an optional JFXX segment, capable of storing a compressed thumbnail image

...soon standardized into

ISO 10918

and also

ITU-T T.81, T.83, T.84, T.86, T.871
ECMA TR/98 (JFIF)



The original versions were largely by C-Cube Microsystems[3] (now defunct), but it was probably the Independent JPEG Group (IJG)'s implementation that helped popularize it.

IJG developed it on, and submitted extensions to ISO an ITU-T (verify)

IJG's libjpeg versioning[4]:

  • version 1 (1991)
  • version 2 (1991)
  • version 3 (1992)
  • version 4 (1992)
  • version 4a (1993)
  • version 5 (1994)
  • version 5a (1994)
  • version 5b (1995)
  • version 6 (1995)
  • version 6a (1996)
  • version 6b (1998)


  • version 7 (2009)
  • version 8 (2010)
  • version 8a (2010)
  • version 8b (2010)
  • version 8c (2011)
  • version 8d (2012)
  • version 9 (2013)
  • version 9a (2014)
  • version 9b (2016)
  • version 9c (2018)
  • version 9d (2020)


Versions since 7, and particularly 8 and 9, include entirely new methods, some never standardized, so effectively proprietary, and note that JPEG readers that are not based on a recent libjpeg from IJG may not be able to read them.

So 6b is the latest standard version, 9d is the lastest non-standard version.



Different but somewhat related

JPEG XT [5]

intends to add extensions (like higher bit depth, alpha channel, lossless) in a backwards compatible way (file structure wise, not necessarily decodable?(verify))
ISO 18477 (verify)

JPEG LS [6]

lossless / near lossless coding.
different from and more efficient than the original lossless JPEG coding (and apparently faster than JPEG 2000 at comparable compression levels)
ISO 14495

JPEG 2000 [7]

ISO 15444

JPEG XR [8]

seems to be about more dynamic range in the pixel values?
ISO 29199


...which are based on JPEG but at best so extended that basic JPEG library will probably not decode them unless these specifications are specifically implemented.


See also:


Notes on JPEG file structure

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

At one point I wanted to detect whether a (not very standard) JPEG dataset was lossless or not, so needed to parse the basic file structure (without decoding the image). These are notes from that.


A JFIF style JPEG file consists of a number of segments.


All segments start with a FF byte, and a marker byte to say what type it is.


A few segments have no data, and will have no size field.

This includes SOI, RST0 through RST7 (D0..D7), EOI, and more.


Most other segments mention their size, and their structure is typically:

  • 0xFF (1 byte)
  • marker (1 byte)
  • datasize (2 bytes, big-endian)
...size of data and these two size-indicating bytes
  • data (byte length as indicated by datasize, minus two (the size of the datasize field))


Probably the most notable (and basically only?) exception to "fixed size or mentions their size" is Start of Scan, which does store a size, but it's the size of what you could call its own header - but what follows that is a bytestream that is one image worth of data, and you cannot know its length of without decoding it -- though you can often guess that it's the rest of the file, minus the often-but-not-always-there EOI segment.



A fairly minimal JPEG file will have a sequence of segments like:

  • SOI (D8) - start of image
  • APP0 (E0), including
version (will be 1.00, 1.01, or 1.02)
pixel density
an optional thumbnail (seems rarely used)
  • a SOF variant (SOF0..SOF15 are C0..CF)
usually either SOF0 (baseline sequential, huffman) or SOF2 (progressive, huffman)
stores image size, channels, bits per channel
  • DQT (DB) - quantization tables, one or more (and can come before SOF)
  • DHT (C4) - huffman tables (DHT), one or more
  • SOS (DA) - start of scan
  • compressed image data following the SOS
  • EOI (D9) - end of image

There's a bunch more markers defined, but aside from some more APP0..APP15 metadata and a few things like COM, most are not used very often.


Some notes:

  • APP0
some webpages suggest the JPEG header is FF D8 FF E0 you may recognize it as ÿØÿà) which is the SOI and the first two bytes of APP0.
JPEG FIF does demand both its presence and this position, so probably almost all JPEGs have an APP0 there.
  • There are actually two standard(verify) APP0 variants, one with JFIF as identifier, and one with JFXX
JFIF stores density and optional uncompressed thumbnail
JFXX stores optional thumbnail (which can be compressed, basically as a simplified embedded JPEG)
  • There are a bunch of other APP0..APP15 uses, and then with varied identifiers [9], mostly for a lot of brand/device specific metadata, things like ICC profiles, or indicating specific types of data
In the wild apparently more commonly:
APP1 (E1), most typically used to store Exif metadata (TIFF based data), sometimes XMP metadata (XML) (verify)
APP14 (EE), used by Adobe to mentions some color transform stuff
APP13 (ED), used by Adobe for TIFF-style(verify) tags (IPTC?)
APP12 (EC), used by some older cameras(verify)
  • You also see COM (FE), comment
limited to ~64k (via the segment size)
any amount can appear(verify), but since this is generally treated as an arbitrary-text field (the standard doesn't really say), various taggers may add to existing COM tags / merge COM tags.


  • SOF
apparently
mostly SOF0 (baseline sequential, huffman)
occasionally SOF2 (progressive, huffman)
rarely SOF1 (extended sequential, huffman) or others, perhaps SOF9 (extended sequential, arithmetic)(verify), (many other of the 16 possible are defined, a few are used in niche cases, and most never[10])
(...in one large sample of recent images, two thirds were C0, a third was C2, only a handful were C1s, and nothing else)
SOF0 and SOF1 are decoded the same way, and the only difference is that baseline restricts the amount of huffman and quantization tables to 2 each, extended to 4 each.(verify)
lossless is SOF3 (original?), or possibly SOF7, SOF11, or SOF15 (verify)
  • Some JPEGs have been seen without EOI, but typically it's there.
  • quantization tables (0xdb)
Most JPEG compressors store two tables, one for Y, and one for both Cr and Cb.
Digital cameras are more likely to store three.
each segment may contain one or more quantization tables. Typically all tables are in the same segment, but there are cases where they are stored separately.
  • Adobe CYMK JPEGs are not standard JPEGs



See also:

MJPEG

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Motion JPEG is essentially a stream of separate JPEG images.

Say, IP cameras presumably just push out frames as they appear.


It's not really a video format, being iframe-only, and not even specifying a framerate.


It's also not a standard itself, but uses in specific contexts are typically documented, e.g.

Microsoft documents how they use in in AVI files
Apple documents how they use it in QuickTime files
RFC2435 documents how they use it in RTP streams
web browsers tend to support it
(is this wrapped in MIME?[11])

See also

https://en.wikipedia.org/wiki/Motion_JPEG
https://www.loc.gov/preservation/digital/formats/fdd/fdd000063.shtml


Motion JPEG 2000 is more standard, but also rather more complex, being based on MP4/Quicktime, which also makes the choice for a more efficient video codec easier.

https://en.wikipedia.org/wiki/Motion_JPEG_2000



PNG

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A true-color, lossless compression format aimed at sharp contrast diagram style images, but also compresses photographic images decently. Allows an alpha channel.


On quality and size:

  • for diagram-style images, compression is comparable with GIF, and is better quality than JPEG
if you don't need animation, it's generally preferable over GIF
if you need the transparency channel, it's preferable over GIF and JPEG
  • for gradient/photographic images it compresses worse compared to medium-quality ('good enough') JPEG, and comparable to JPEG at highest settings (sometimes larger than JPEG, because JPEG can fudge over fine noise, while PNG necessarily preserves it)
For web content, smaller size can be more important than quality, which is a tradeoff you can't make with PNG, and you'll probably still want JPEG.

All web browsers now support PNG (IE was the last to solve a list of related bugs, but has decent support since IE7), and operating systems have widely accepted it now (e.g. for icons).


Method / limitations:

  • lossless compression
  • paletted, greyscale, or RGB
  • No animation in the standard (see APNG, also MNG, but they are not widely supported yet and may not be anytime soon)
  • Many specific formats, with different features/support in older browsers (see e.g. [12])


Compressors include

Lossfull (smart palette quantization)


See also:


See also:


PNG structure notes

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

File structure is broadly:

  • Eight-byte header: 137 80 78 71 13 10 26 10
  • an IHDR chunk
  • content chunks
  • an IEND chunk


A chunk is:

  • 4-byte uint - data field's length (of payload, so excluding length, type, and crc fields)
  • 4-byte uint - type
  • data bytes
  • 4-byte CRC


On types:

the four bytes that make up type aer typically ASCII characters (e.g. IHDR, cHRM, gAMA, tRNS, PLTE, hIST, pHYs, IDAT, tEXt, tIME, IEND)
though uses some specific bits (basically casing?(verify)) to convey further information, including whether it is critical or ancillary
Types allow extension of the format with older readers and/or editors, having rules about known / unknown safe-to-copy / unsafe-to-copy critical / ancillary chunks

APNG

Animated PNG is an unofficial extension to PNG, which produces images backwards compatible with PNG, so that classical PNG decoders will just decode the first frame.

APNG is supported by many browsers and some image editors, but arguably not quite widely enough to use as a generic format, but has its specific uses.

See also:

MNG

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

MNG (Multiple-image Network Graphics) is a close relation of PNG (written by the same team).

It allows animation - to address the possibility for its replacement of GIF in web and other areas.

Not widely adopted. While various software has adopted it (without us really noticing), web browsers generally haven't.

See also:

TIFF

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

TIFF (Tag Image File Format) is a container format that stores images, and sometimes a little more.


TIFF has grown over time in terms of what it can store.

Version 6 of the TIFF specs (~1992, and the document encompasses previous versions), the version most things today use, allows things like varied bit depth, color spaces, compression, also making it a useful lossless format.

TIFF is a container format, and by itself has blocks of metadata (IFDs), containing tags detailing the format of image blocks those also point to.


See also:


libtiff

The 4.x are more current - functionally mostly the same as 3.x but fixed a number of 3.x flaws

the API and behaviour is so similar that most code can move from 3 to 4 with little to no changes.


The package naming can be interesting -- because it doesn't correspond to internal version.

The soname for 3.x has 4, the soname for 4.x has a 5,
so e.g. in debian/ubuntu, tiff 3 is package libtiff4, tiff 4 is in package libtiff5.

Confusing, yes.


Notes on TIFF file structure

A TIFF file starts with: a header, then an initial IFD.

The 8-byte header is

  • 2 bytes indicating the endianness used in the IFDs
namely "II" (0x49 0x49) for little-endian (referring to Intel) and "MM" (0x4D 0x4D) for big-endian (referring to Motorola)
  • 2 bytes: the 16-bit integer with value 42, in the file's endianness
  • int32: absolute offset of the first IFD


People looking for file magic:

  • TIFFs start with either 0x49 0x49 0x2A 0x00 or 0x4D 0x4D 0x00 0x2A
  • the first IFD is typically immediately after the header, so its offset, stored in the next four bytes, is typically 8


IFDs (basically 'a block of metadata') are:

  • int16: amount of entries in this IFD
  • each entry is 12 bytes, namely:
    • 2-byte tag (the thing TIFF is named for), see also TIFF tag reference
    • 2-byte type
    • 4 bytes: amount of values in this entry (frequently 1, but depends on tag)
    • int32: data
      • for datatypes storing in <=4 bytes, the data itself ('inlined')
      • otherwise the absolute offset to value
  • int32: absolute offset of the next IFD (0 if it's the last)


Notes:

  • Chaining IFDs allows for one or more images (e.g. raw lossless image data plus a lossy JPG thumbnail).
  • one of the things you can store in IFDs is pointers to data within the file
  • The last two also mean you can easily tell simpler decoders "just ignore the parts you don't understand" (quite similar to TLV)


  • Many IFDs detail an image, so typically point to a large chunk of data, and contain enough tags to interpret that data.
The simplest TIFFs have just one data block, and one IFD.


  • The IFD chaining allows for multiple images and, well, anything else you care to (ab)use it for.
  • ...the "custom pointer to just data" more so, particularly when that is IFD-like itself
...or to non-TIFF data structures - at which point it's entirely custom use of TIFF, with data that standard TIFF parsers will ignore because it's not an IFD and/or a tag known to them.
This is perfectly valid in terms of file structure. And no standard TIFF reader will read it.
  • Many digital camera's raw files (Canon's CRW and CR2, Nikon's NEF, Pentax's PEF, Samsung's PEF, Sony's SR2 and ARW, Fuji's RAF) are in fact mostly TIFFs, with an IFD and data for EXIF, IPTC, XMP, GPS, a tiny thumbnail, a medium thumbnail, and camera-raw data (thumbnails are often JPEG, raw often does not fit standard TIFF image types so is often custom).
Usually only the company that made it knows quite everything about it.
(Some of these formats do not strictly conform to TIFF6, e.g. omitting some tags that TIFFs are required to have. This may be more of a technical point - it may still be readable anyway, and/or you may never want to read it as a standard TIFF because you'll miss some interpretation.)
(...which is one reason for DNG, which is a more shared standard (also based on TIFF))
  • Things can also ordered arbitrarily, as long as all the offsets are correct.

Sometimes there is reason to order things in a specific way (e.g. all metadata together to scan through things with fewer seeks), but generally it's whatever was easiest for the TIFF writer code, which is why an IFD and its data are often adjacent.

so it's almost a filesystem with (IFDs like directory entry metadata), and it might contain data you can't easily tell is unused


  • BigTIFF refers to a variant that uses the same logic as classical TIFF but not quite the same.
It differs in that it uses 64-bit offsets, meaning files can be larger than 4GiB.
Not formally part of the standard, but present in libtiff for a while now (and it's not too hard to adapt in other implementations).
Differences:
The header mentions 43 instead of 42 (and has two extra fields you're probably not going to ever use)
note that since the 'data' field is larger, more value types are now inlined
IFD and tag structures are larger, but follow the same rules

Notes on compression

See also:

GIF

Method / limitations:

  • Paletted, with a maximum of 256 colors, so best for line art and grayscale images, not for photo quality
    • (technically you can use many frames and many palette updates to render one image in way more colors, but it's inefficient and not all decoders like doing this)
  • Two versions, GIF87a and GIF89a, the latter being adding delays so being the most interesting for showing animation
  • Uses LZW - which was patented in 1985 and expired in 2003 or 2004 (varying with country)
in theory you could skip compression but this makes for rather large files
so this spurred development of PNG. PNG now seems the handier choice for most things, except animation (GIF is still the only widely supported image formats that does this).


See also:


Notes on GIF file structure

Broad structure

For the structure and possible order of related chunks, see diagrams like that on [13] - can be interesting when writing a parser.

  • The GIF file header consists of:
    • The six-byte file header, either GIF89a or GIF87a
    • the logical screen descriptor (LSD), which is seven bytes, and which must be present, so is effectively part of the file header (see next section for details)


  • A sequence of one of the following three:
    • an application extension (see below)
    • a comment extension
    • stuff that you can see, which itself can consist of:
      • An optional GCE (Graphics Control Extension, see below) before one of the following two:
        • Image Desciptor + optional local color table + image data
        • (optional) Plain Text Extension (rarely used)
  • A 0x3B byte (; character) as a GIF file trailer.



Notes on extension blocks

  • The Netscape extension block is common for animated GIFs (though not required)
    • This extension may only be placed directly after the LSD
    • It only really contains the loop instruction
  • The Graphic Control Extension (GCE) is very common in animated GIFs, as it mentions each frame's duration (and also its transparency)
  • Comment extension - used for for attribution, 'created by this-and-that software', and such
  • Plain text extension mentions text to be rendered. Rarely used, and a number of decoders simply ignore it.

Details

The Logical Screen Descriptor consists of:

  • 2 bytes: logical screen width (LSB-order 16-bit int)
  • 2 bytes: logical screen height (LSB-order 16-bit int)
  • 1 byte, bit packing like:
    • 1 bit: whether the global color table is present (and follows). Most of the below is only relevant if it is.
    • 3 bits: color resolution - "Number of bits per primary color available to the original image". 000 means 1, 001 means 2, ..., 111 means 8
    • 1 bit: sort flag - whether the colors in the global color table are sorted in order of decreasing use, which can help the decoder
    • 3 bits: size of color table - which will take 2(value+1) bytes
  • 1 byte: which color is the background color (only meaningful if there is a global color table)
  • 1 byte: aspect ratio. Most images use 0, meaning using the storage's ratio (square pixels). If nonzero, the spec says the ratio should be calculated like (byte-value + 15) / 64
  • optional global colour table, if mentioned by the LSD (almost every animated GIF has this)


Extension blocks consists of:

  • byte: 0x21 (! character)
  • byte: extension label
distinguishes between e.g. GCE, comment extension, plain text extension, application extension
  • data
many start by mentioning a size, but e.g. comment doesn't
  • byte: 0x00


Image Desciptor + optional local color table + actual image data consists of:

  • optional Graphics Control Extension (GCE)
    • has local pallette? (if not, the global one will be used)
    • has transparent color? -
    • user input required? - usually ignored
    • duration, in hundreths of seconds, though
    • removal controls (verify) (do nothing, replace with previous image, replace with background color)
  • image data block (OR plain text block, for text to be rendered, but these are generally not supported, and so ingored)
    • , character (0x2C byte) signifies start of Image Descriptor
    • 2 bytes: left postition on logical screen (LSB-order 16-bit int)
    • 2 bytes: top postition on logical screen (LSB-order 16-bit int)
    • 2 bytes: frame wdidth (LSB-order 16-bit int)
    • 2 bytes: frame height (LSB-order 16-bit int)
    • 1 byte: packed
      • 1: color color table present/follows
      • 1: interlaced?
      • 1: sort flag (see notes on global color table)
      • 2: reserved
      • 3: size of local color table
    • optional local color table
    • image data

See also

Other notes

(Animated) GIF rendering

The LSD defines a logical image, which you can consider it the canvas that subsequent frames draw on.

Those may choose to only draw onto part of that - optimized animated GIFs may well use this, though many also just draw over the entire image (even if that's often larger than necessary).

Technically, each frame may have its own distinct disposal method (which specifies how to draw that image), specified by the GCE that is probably before it.

Because of transparency, background, frame disposal, and the partial-frame-update detail, animated GIFs can only really be rendered incrementally.

Viewers do this, but this can be a significant detail to code that reads GIF files, more so to code that wants to write smaller GIFs.

While GIFs are 256-color, you can have a true-color GIF, in that most real-world GIF rendering is done onto a true color target anyway, and a GIF can update a frame in many individual partial renders, with zero delay and its own pallette. This is rarely done, mostly because this is inefficient, and there's more suitable image formats you can be using.


Roughly speaking, rendering is done by:

  • Allocating the logical screen
  • For each frame:
    • update only non-transparent regions (each frame may have transparency and may have a unique color index for it)
      • ...using the local color table if present, or the global color table if not
    • dispose as specified by the GCE (usually "don't do anything", but not always)



Individual/overall palette

For an animated GIF, each frame can have its own optimized pallete.

In theory, adaptive palette choice makes each individual frame as close to the original as possible, but this is only useful for slow-updating things, for example diagram sequences.

For short-interval, video-like GIF (probably the more common case), the same independent adaptiveness means every frame may have the same input color be somewhat different, which can look like continuous flickering.

As such, it may look better to calculate an overall palette based on all the input frames and use it for all frames. This means fewer colors, and more change from the original, but the animation will look smoother. Of course, this only works for shortish sequences that don't change scene too much - the more colors, the worse it'll look. (Note that it does mean you can omit the local palette)


GIFs that aren't GIFs

For over twenty years, GIF was popular because it was the only standard thing you could use to get animation, supported in web browsers early and widely and quite reliably.


Around the 2020s, sites have started to use 'GIF' to mean "looping animation, without (or with) audio", which are now frequently HTML5 video tags pointing at H.264, VP8, or WebM video (sometimes multiple, in a fallback list, for browser compatibility to help it play on more browsers) instead of GIF files.

This may be done by companies to save bandwidth on larger/longer animations (modern video codes are frequently smaller if the content is indeed more videoey than it is nineties-cliparty).

But also, they seem to consider GIF to be a brand name, to attract people with. (when they feel the need to label it as GIF, it more likely isn't).

Such sites are likely to convert actual GIFs to video.


Purists may twitch at this, point out that it's confusing, that the restrictions of GIF are part of the charm, that throwing refined video compressor at the problem means people will probably make larger files because they are unaware of this, that there are fewer places you can embed them, that browsers will have a heavier job, that the mixed support for the varied formats it may be means it's harder to save and reuse, that browser policies around video tags (e.g. being the main reason you want the magic incantation of autoplay loop playsinline muted for iOS), may break things more frequently and without warning (Apple have changed that multiple times - it may be wrong advice by now), that proper fallbacks to other formats take more knowledge or support fewer browsers, because/and that older browsers may not support them while supporting GIF well, whereas GIFs have just worked perfectly everywhere since the damn nineties.

My favourite so far is a GIF of an image using ordered dithering (ew), that was re-encoded in H.264 which doesn't deal with diterhing well at all and looks terrible, because high pixel-to-pixel contrast is one of the last things that video codecs are made for. But who's keeping track of what we're throwing at what walls?


Yet in popular and functional use, you can do exactly the same, they will often look better, and usually work, so hey.


https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/replace-animated-gifs-with-video

WebP

Can be seen as

  • an improvement on JPEG
~25% smaller when lossy at similar perceptual quality
  • with some PNG-like features
~25% smaller when lossless
supports alpha channel.


Created by google for the web. Mainly read by browsers, though support in some (e.g. Firefox, Edge, Safari) came later, and particularly Apple (Safari) seemed to specifically avoid support for many years (and still only supports it under OX 11.6 or later), so you should probably use it it in a HTML5 <picture> with fallbacks, or JS polyfill, or such.

Animation was added later. Support was not immediate but is now there [14], also making it an alternative over GIF (different quality/size choices).

A bunch of image editors also open it.

Related to VP8 and WebM


See also:



AVIF

A new (~2019) open source image format, royalty free format (based on AV1 video codec).


Comparable in idea to WebP.

Not widely supported yet[15], but expanding.


See also


HEIF

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


A container for images and sequences, related to MPEG-H Part 12.

Because it's basically an MPEG-style container, it can contain various other things, so is not really a singular format.

It can also contain JPEG images, video (H.264, HEVC, and AV1 - so apparently AVIF can be considered a specific case of HEIF?)


https://en.wikipedia.org/wiki/High_Efficiency_Image_File_Format

PBM, PNM, etc.

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


These files store uncompressed Per-pixel values, in as ASCII numbers or a simple binary form.

They are very easy for programs to write, and read, and arguably CPU-efficient as there is no decompression to do.

They are also some of the most space-inefficient, and can contain almost no metadata.

They are interesting

in simple scripts, in quick-'n'-dirty experiments
as an intermediate format to hand around between programs you duct-tape together
as an universal-ish intermediate format, e.g. as input to compressors, so that you can avoid those programs having to link with a dozen specifi image reading libraries and the potential headaches of each.


PNM is a collective name for:

  • PBM ('portable bit map'), two-color (1 bit per pixel)
  • PGM ('portable gray map') (typically 8 bits per pixel, 16-bit also seen in later versions(verify))
  • PPM ('portable pixel map'), color in the form of r,g,b triplets (typically 8 bits per channel, 16 also seen in later versions(verify))

They are often implicitly BT.709 coded, with gamma applied. Some programs allow specifying/assuming non-gamma'd form too.

The 16-bit variants imply some endianness decisions for the binary forms - which isn't standardized, though MSB


PNM files start with two-byte magic, in ASCII

  • P1 - PBM, ASCII
  • P2 - PGM, ASCII
  • P3 - PPM, ASCII
  • P4 - PBM, binary
  • P5 - PGM, binary
  • P6 - PPM, binary
  • P7 - PAM, binary


PAM ('portable arbitrary map') is a generalization of PNM, which adds depth and tuple type to the basic metadata, and only uses binary form.


The netPBM package deals with:

  • PNM (see above), and
  • PAM, a more general version of pbm-style formats, which can store all PNM formats, and some more options.



See also:

DjVu

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

DjVu is a free and open file format.

It has different codecs for bitonal and gradient images, and applies well to high-resolution scanned text, manuscripts, magazines, line art, and such, for which is reaches noticeably higher compression than most general-purpose/photo image formats (and/or, at the same size, better legibility than various other formats).

It can optionally contain a text layer (often OCRed).

This combination of things make it a useful format for archiving images of books.


See also:


Layers and types

Document and document pages are layered/composited (see also ITU-T T.44), and each layer can be coded in a different way. In some cases, those layers may have come from different color channels from the same original image.

DjVu's coders are probably best at two-tone images, such as text. The separation between background layer(s), foreground layer(s), and masks helps - it allows e.g. highly compressable two-color coding for the text and a different choice for, say the texture of a book's paper.


Images and documents

DjVu images are valid single-page DjVu documents.

Multi-page DjVu files are either Bundled (into a single file), or Indirect (where the main document is an index pointing to individual DjVu document files, which is handier when serving individual pages sparsely, a little less handy for sending things around).



Creating

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Command examples

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Viewing, converting

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
  • Browser plugins
  • Separate viewers WinDjView and MacDjView [16]
  • There are also various viewers for PDAs, phones and such


Conversion:

  • CGI on-the-fly conversion
  • If you have a to-PDF-printer such as the Adobe's, PrimoPDF, doPDF(?) or similar, you can create a PDFs using any DjVu viewer (or anything else) that can print.

WMF, EMF, EMF+

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
  • WMF (Windows MetaFile): Primarily calls into Windows's GDI[17] library
  • EMF: more commands than WMF
  • EMF+: calls into GDI+ (newer version of GDI)
  • .wmz is gzipped wmf
  • .emz is gzipped emf


Since they mostly arguments towards a known drawing API, they can be seen as code that will be executed, and there have been several security exploits of the API using these image files.

The format is not very common, though various programs can still export it, and it's seen in contexts like clipart libraries.


The ability to load/convert on non-windows:

  • libwmf allows rasterization, and has some command line conversions, like wmf2eps and wmf2svg
  • openoffice imports it
  • UniConvertor [18]
  • inkscape? (depends on wmf2svg from libwmf?)


See also:

RAW photo formats

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


There seems to be a different RAW format for each camera manufacturer, some more recognizable than others, most only handled fully by bundled software, and some of the more professional image/photo tools.

There's also Adobe's DNG ('Digital NeGative'), meant to be a more portable format than the manufacturer specific ones. It looks like it was never quite as easy as it should have been, and so has not caught on in a big way.

Many are readable through conversion libraries like libraw or dcraw), though often not with full understanding/preservation of metadata.


Most of these formats are a variation on TIFF (structure-wise, not pixel-data-wise), but tend to be more restricted than general TIFF (presumably because TIFF is such a flexible format that a full parser is annoying to write).

Aside from possibly storing image data in somewhat unusual ways, they may also not strictly be compliant TIFFs (e.g. Sony's ARW does not store image ImageLength, which TIFF6 requires). --> See also:

Unsorted

LDF: Bi-tonal format

WBMP (Wireless Bitmap) - monochrome, 1 bit per pixel data.

JPEG 2000