Video format notes

From Helpful
Jump to: navigation, search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Digital sound and processing: capture, storage, reproduction · programming and codescs · some glossary · Audio and signal processing - unsorted stuff

Image: file formats · image processing

Video: format notes · encoding notes

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions

Unsorted: Signal analysis, modeling, processing (some audio, some more generic)

For more, see Category:Audio, video, images

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

This page is mostly about storage of video, and variation therein. It also touches on some video capture.

For Notes on encoding video, see that.


Digital video (files, streaming)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

This is meant primarily as a technical overview of the codecs in common and/or current use (with some historical relations where they are interesting, or just easy to find), without too many details; there are just too many old and specialist codecs and details that are not interesting to most readers.

Note that some players hand off reading/parsing file formats to libraries, while others do it themselves.

For example, VLC does a lot of work itself, particularly using its own decoders. This puts it in control, allowing it to be more robust to somewhat broken files, and more CPU-efficient in some cases. At the same time, it won't play unusual files as it won't perfectly imitate other common implementations, and it won't be quite as quick to use codecs it doesn't know about; in these cases, players that hand off the work to other things (such as mplayerc) will work better.

Container formats

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Containers are file types that usually allow various streams of various types and using various codecs

Some relatively general-purpose container formats include:

AVI (Audio Video Interleave)

A (a RIFF derivative; see also IFF), and fairly common, though not ideal for MPEG-4 video tracks, VBR MP3 audio tracks, and some other things, as this older format does not really allow this without some hacks, that may be convention but not standard. Many AVIs in the wild violate the AVI standard - but play fine on most (computer) players.


  • Files with the .divx extension are usually AVIs (...containing DivX video)
  • Google Video (.gvi) files use MPEG-4 ASP and MP3 in a mild variant on AVI container [1] (and do not really exist anymore)

MKV (Matroska Video)

An open standard, preferred by some, as it a fairly well designed many-stream format, and because it allows subtitle embedding, meaning you avoid hassle related to external subtitle files.


Ogg is a container format - see also Ogg notes - and an open standard.

Extension is usually .ogg, or .ogm, though .ogv, .oga, and .ogx are also seen.

Note that initially, ogg often impied Ogg Vorbis: Ogg containers containing Vorbis audio data.

Ogg Media (.ogm) is an extension of Ogg, which supports subtitle tracks, audio tracks, and some other things that make it more practical than AVI, and put it alongside things like Matroska.

Ogg Media is not really necessary and will probably not be developed, in favour of letting Matroska become a wider, more useful container format.(verify)


See MPEG notes


A number of container formats support only a limited number of codecs (sometimes just one), particularly if they are proprietary and/or specific-purpose.

Such container formats include:

  • Flash video (.flv) [2]
  • NUT (.nut), a competitor to avi/ogg/matroska [3]
  • Quicktime files (.mov) are containers, though without extensions to quicktime, they support relatively few codecs. In recent versions, MPEG-4 was added.
  • ASF (Advanced Systems Format), a proprietary format from Microsoft, most commonly storing wma and wmv content, and sees little other use in practice (partly because of patents and active legal protecting). [4]
  • RealMedia (.rm)
  • DivX Media Format (.dmf)

Fairly specific-purpose:

  • Digital Picture Exchange (.dpx) [6]
  • Material Exchange Format (.mxf) [7]
  • Smacker (.smk), used in some video games [8]
  • Bink (.bik), used in some video games [9]
  • ratDVD


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

tl;dr: data in MPEG-2 PS, some restrictions, and some DVD-specific metadata/layout around it.

A VIDEO_TS directory with VOB, IFO, and BUP files are, in a fashion, a container format as they are the DVD-Video way of laying out:

  • metadata about steam data (chapters, languages of tracks, angles, etc.)
  • Video streams (usually MPEG-2, sometimes MPEG-1)
  • Audio streams (AC-3, MPEG-1 Layer II (MP2), PCM, or DTS)
  • Subtitle streams (bitmap images)

(note: The AUDIO_TS directory is used by DVD-Audio discs, which are fairly rare. On DVD-Video discs, this directory is empty, and the audio you hear is one of the streams in the VOBs.)

IFO stores metadata for the streams inside the VOB files (e.g. chapters; subtitles and audio tracks). BUP files are simply an exact backup copy of the IFO files (to have a fallback for a scratched DVD).

VOB files are containers based on MPEG-2 PS, and store the audio, video, and image tracks.

VOB files are segmented in files no larger than 1GB, which was a design decision meant to avoid problems with filesystem's file size limits (since the size of a DVD was larger than many filesystems at the time could deal with).

DVD players are basic computers in that they run a virtual machine. DVD-Video discs with menus run bytecode on that, although most such code is pretty trivial if you consider the potentially flexibility of the VM -- there are a few DVD games, playable by any DVD player.

See also:

Stream identifiers (FourCCs and others)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

When container formats can store more than one video codec, they want to be able to indicate the format (codec) used in each stream.

For example:

  • AVI uses FourCCs, a sequence of four bytes used in AVI and a few others - usually four printable ASCII characters
  • MPEG containers mostly just contains MPEG video (...but there're a bunch of details to that)
  • Matroska (mkv) uses another system, CodecID, a flexible-length string.
  • Ogg doesn't have an identifier system, instead asking all available codecs whether they can play the data given to them (initially just the first frame from a stream).

Video codecs

H.26x family (related to MPEG and ITU standards. H.something is the ITU name):

  • H.261, a format made for videoconferencing over ISDN. Came before the more widely used H.263 [10]
  • H.262, which is identical to part of the MPEG-2 standard
  • H.263: for videoconferencing (seen used in H.323).
    • See also [11]
    • Also the base of various other codecs, including:
      • VIVO 1.0, 2.0, I263 and other h263(+) variants
      • Early RealVideo
      • Sorenson (including early Flash video)
        • Sorenson 1 (SVQ1, svq1, svqi)
        • Sorenson Spark (Used in Flash 6, 7 and later for video)
        • (Sorenson 3 (SVQ3) was apparently based on a H.264 draft instead)
  • H.264, a.k.a. MPEG-4 AVC, MPEG-4 Part 10
    • FourCC depends on the encoder (not too settled?).
      • ffmpeg/mencoder: FMP4 (which it also uses for MPEG-4 ASP, i.e. DivX and such. It seems this is mostly meant to send these files to ffdshow(verify), but not all players understand that)
      • Apple: avc1
      • Various: H264, h264 (verify)
      • Some: x264 (verify)

  • MPEG-4 part 2, a.k.a. MPEG-4 ASP
    • DivX, XviD, and many versions, variants, and derivatives
    • FourCC: [12] mentions 3IV2, 3iv2, BLZ0, DIGI, DIV1, div1, DIVX, divx, DX50, dx50, DXGM, EM4A, EPHV, FMP4, fmp4, FVFW, HDX4, hdx4, M4CC, M4S2, m4s2, MP4S, mp4s, MP4V, mp4v, MVXM, RMP4, SEDG, SMP4, UMP4, WV1F, XVID, XviD, xvid, XVIX

(See also MPEG4)

  • WebM
    • VP8 or VP9, plus Vorbis or Opus, in Matroska
started by Google after acquiring On2
supported by all modern browsers (like H.264)
open, also royalty-free (unlike some parts of MPEG4)
Quality is quite comparable to H.264

On2 (Duck and TrueMotion also refer to the same company):

  • VP3 (FourCC: VP30, VP31, VP32): [13]. Roughly in the same class as MPEG4 ASP. Open sourced.
  • VP4 (FourCC: VP40) [14]
  • VP5 (FourCC: VP50): [15] [16]
  • VP6 (FourCC: VP60, VP61, VP62): Used for some broadcasting [17] [18]
  • VP7 (FourCC: VP70, VP71, VP72): A competitor for MPEG-4 [19] [20]

Xiph's Theora codec is based on (and better than) On2's VP3 [21]

Flash video [22] (preferred first, will play list) (verify)

    • in Flash 6: Sorenson Spark (based on H.263)
    • in Flash 7: Sorenson Spark
    • in Flash 8: VP6, Sorenson Spark [23]
    • in Flash 9: H.264, VP6, Sorenson Spark (and understands MP4, M4V, M4A, 3GP and MOV containers)
    • in Flash 10: (verify)

RealVideo uses different names internally and publicly, some of which are confusable:

  • RealVideo (FourCC RV10, RV13) (based on H.263)
  • RealVideo G2 (fourCC rv20) used in version 6 (and 7?) (based on H.263)
  • RealVideo 3 (FourCC rv30) used in version 8 (apparently based on a draft of H.264)
  • RealVideo 4 (FourCC RV40, and also UNDF) is the internal name/number for the codec used in version 9. Version 10 is the same format, but the encoder is a little better.
  • The H.263-based versions (up to and including 7) were not very impressive, but versions 9 and 10 are quite decent. All are proprietary and generally only play on RealPlayer itself, unless you use something like Real Alternative.


  • Windows Media Video [24], often in .wmv files (which are asf containers)
    • version 7 (FourCC: WMV1) (based on MPEG-4 part 2)
    • version 8 (FourCC: WMV2)
    • version 9 (FourCC: WMV3)
  • RTVideo [25]
  • VC-1 [26]


  • Quicktime [27]
    • 1: simple graphics, and RPZA video [28]
    • 5: Added Sorenson Video 3 (H.263 based)
    • 6: MPEG-2, MPEG-4 Part 2 support. Later versions also added Pixlet [29] [30]
    • 7: H.264/MPEG-4 AVC, better general MPEG-4 support
  • Internal formats like 'Intermediate Codec' [31] and ProRes [32]

Dirac [33] is a new, royalty-free codec from the BBC, and is apparently comparable to H.264(verify).

Older formats:

  • Flic (.fli, .flc), primarily video-only files used in Autodesk Animator [34]
  • Intel Indeo:
    • Indeo 2 (FourCC: RT21) [36]
    • Indeo 3 (FourCC: IV31 for 3.1, IV32 for 3.2) [37]
    • Indeo 4 (FourCC: IV40, also IV41 for 4.1) [38]
    • Indeo 5.0 (FourCC: IV50) [39]
  • MJPEG is mostly just a sequence of JPEG images (FourCC: AVDJ, AVID, AVRn, dmb1, MJPG, mjpa, mjpb). [40] [41] (There are also some variations on this theme)
  • Various RLE-like formats, used primarily for very simple animations


  • Uncompressed Raw YUV [42]
  • Compressed YUV, e.g.
    • HuffYUV (lossless, and easily over 20GB/hour)
  • RawRGB (FourCC: 'raw ', sometimes 0x00000000) [43]
  • Hardware formats: (verify)
    • AVID
    • VCR2
    • ASV2

See also:

Pixel/color formats (and their relation to codecs)

Streaming, streaming support protocols

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

See Streaming audio and video


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Hardsubs are a jargon term that refers to subtitles that are mastered to be directly part of the video. They have no special status, mask any actual video. This avoids all support issues, and usually looks good, but they give no choice of language, or whether to display the subtitles or not.

Softsubs refer to separate subtitle data, historically often as a separate file with the same name and a different extension, and more recently as a part of container formats which support multiple streams (such as MKV), which can also store multiple different subtitles (e.g. languages) at once.

There are a number of formats, and not all file extensions are very obvious. Particularly things like .sub and .txt may be one of various formats.

Formats include:

  • Plain-text:
    • Various, but most importantly:
    • SRT (SubRip [44] [45]) - very simple (but not too well standardized, and some advanced features are not well handled by some players/editors)
  • Subtitle editors' internal formats (text, binary, xml, other), some of which became more widely used:
    • SSA (SubStation Alpha, software of the same name) [46] [47]
    • ASS (Advanced Substation Alpha, aegisub) - an extension of SSA ([48])
    • AS5 (ASS version 5) [49]
    • JacoSub [50] [51]
    • XombieSub [52]
    • AQTitle(verify) [53]
    • Turbotitler (old) [54]
  • Image-based: (avoids font problems, larger)
    • VOBsub (.sub and .idx) are subtitle streams as used by DVD (verify)
    • MicroDVD (.sub), specific to the MicroDVD player [55]
    • PRS (Pre-rendered subtitles) stores (PNG) images [56]
  • XML-based
    • USF (Universal Subtitle Format) [57] [58], and XML format that is not very common outside of Matroska containers
    • SSF (Structured Subtitles Format) is a newer XML-based format (apparently with no current major interest or support [59])
  • Other/unsorted (and other internal formats):
    • SAMI (.smi) [60], often used in Korea
    • DVD-based/derived (CC, SPU, VobSub)
    • Karaoke formats (.lrc, .vkt, )
    • MPSub (.sub), a format internal to mplayer [61]
    • MPEG-4 Timed Text [62]
    • Power DivX (.psb) [63]
    • ViPlay Subtitle File (.vsf)
    • Phoenix Japanimation Society (.pjs) [64] (old(verify))
    • Subsonic (.sub) [65]
    • ZeroG (.zeg) [66]
    • Adove Encore (.txt) [67]
    • MPL2 [68]
    • VPlayer [69]
    • Sasami Script (.s2k)
    • SubViewer (verify)
    • RT (verify)
    • DVB (verify)
    • Teletext (verify)
    • LRC[70] (Lyrics, meant for lyrics/karaoke on audio)
    • TTML [71]
    • ARIB (Association of Radio Industries and Businesses)

Editors and other utilities:

Player support

Frame rate, analog TV format, and related

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(I'm not sure about all this - there is so much fuzzy and/or conflicting information out there)

Frame rates

Movies on media like DVD come in different frame rates. This does not matter to computer playability, so unless you are compressing or converting video, you probably want to ignore this completely.

Common rates

Some of the more common rates seem to be:

rate common uses / suggests source also referred to as approx
24 (exactly) used to shoot most film, and used in most cinema projection film
24000/1001fps usually an intermediate in conversion from film to NTSC color 'NTSC film' 23.976,
25 (exactly) Speed of rasters transferred (not shown frames) in broadcasts such as PAL (except PAL M) and SECAM. 'PAL video',
30000/1001 the speed of rasters transferred (not shown frames) in (interlaced) broadcasts such as NTSC M (the most common NTSC form) and also also PAL M. Pre-1953 NTSC broadcasts was exactly 30.0fps)}} 'NTSC video' 29.97
30 (exactly) Apparently the black and white variant of NTSC was exactly 30, and 30000/1001 was the hack upon that (verify). Exactly-30fps content is relatively rare(verify), because it's either pre-1953 NTSC TV, or modern digital things that just chose this(verify).

50 (exactly) Can refer to 50 frame per second progressive, or 25 frame per second interlaced that is being played (and possibly deinterlaced) as its 50 contained fields per second (as e.g. in PAL and SECAM TV ((except PAL M)) 'PAL film', 'PAL video', 'PAL field rate'
60000/1001 (verify) The field rate of NTSC color. Can refers to NTSC color TV that is transferring interlaced rasters. 'NTSC field rate'

These are the most common, but other rates than these exist. For example, there is double rate and quad rate NTSC and PAL (~60fps, ~120fps; 50fps, 100fps), often used for editing, or e.g. as intermediates when converting interlaced material.

A framerate hints at the source of the video (24 is typically film, ~25 is often PAL broadcast, 30000/1001 is typically NTSC broadcast) and/or the way it is played (e.g. 50 and 60000/1001 usually means analog TV content, and possibly interlaced). There's a bunch of cases where you can't be sure, because there are some common conversions, e.g. 24fps film converted to 25fps and 29.97fps for broadcast and DVDs. And be careful of assumptions about interlaced/progressive/telecined. Note also that DVDs can mix these.

Movies still mostly uses 24fps, primarily because we're used to it. It looks calmer, and we associate 24fps with higher quality, partly because historically higher framerates remind us of home video and its associated cheapness (also with technical uses like interlaced sports broadcasts).

(Of course, these associations are also entangled with camerawork and other aspects, so it's not quite that simple. It's partly silly, because TV broadcast of 24fps material necessarily involved some less-than-perfect conversions)

On 1001 and approximations

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

NTSC made things interesting.

NTSC existed since 1941 was then exactly 30fps, had no color, and fit in 4.5MHz of usable radio frequency.

In 1953 it was replaced by what we now call NTSC color. Because people wanted it to be backwards compatible with the black and white televisions of the time, allowing one transmission to serve both, it was designed to send separate luminance (black and white) roughly as before, and separate chrominance information.

...but in the same frequency band, so it would have to overlap the luminance. Minimizing the interference between the two, with the method they chose, meant some math that required that columns, rows, or frames per second had to be shifted a little.

They chose to settle columns and rows to 525 and 286, and fudge frames per second, meaning that that number is

4500000Hz / (525*286)

which happens to simplify to


Which is approximately


So, NTSC color broadcast is never 30fps, it's always 30000/1001.

The approximation 29.97 is inaccurate, though by so little that getting this wrong that it only becomes visually apparent after an hour or two.

It can matter when processing content where NTSC is involved somehow, such as going between PAL and NTSC hardware/broadcast, so matters to professional transcoding. For typical-youtube-length videos you won't notice.

(Similarly, 23.976 for 24000/1001 (happens in film-to-NTSC conversion) is also slightly off, 23.97 and 23.98 more so. (verify)

(Since the same trick wasn't necessary for PAL, PAL is always 25fps precisely.)

(Also, film to PAL is often just played that ~4% faster.)

In most cases, the difference between the fraction and its approximation is tiny, but will be noticeable over time, in timecode (and probably audio sync, depending a little on how the audio is synced).

You can fix NTSC's timecode issue with Drop-Frame (DF) timecode.

Timecode counts in whole frames, so from its perspective the difference between 29.97fps instead of 30fps is .03 missing frames per second - i.e. (30*60*60 - 29.97*60*60=) 108 frames per hour.

Drop Frame Timecode (DF) skips two of the frames from just its own count (these frames do not exist), from nine out of every ten minutes. This happens to work out as exactly (2 frames * (6*9) applicable minutes =) 108.

Common constraints

When video is made for (analog) broadcast, it is very much constrained by that standard's color and frame coding, and more significantly in this context, its framerate.

When video is made for film, it is limited by projector ability, which has historically mostly been 24fps.

When it is made for NTSC or PAL broadcast, it is usually 30000/1001 or 25, respectively.

Computers will usually play anything, since they are not tied to a specific rate. Even though monitors have a refresh rate, almost all common video will be slower than, meaning you'll see all frames anyway. Computer playing is often synchronized to the audio rate).

Common conversions

Conversion from 24fps film to 25fps PAL broadcast rate can be done by playing the frames and audio faster by a factor 25/24, either letting the pitch be higher (as will happen in analog systems) or the same pitch by using digital filters (optional in digital conversion).

Few people will notice the ~4% difference in speed, pitch or video length.

This does not work for NTSC as the difference is ~25%. Instead, telecining is often used, though it also makes film on TV a little jumpier for the 60Hz/30fps countries (including the US).

arguments for 60fps / 60Hz (e.g. gaming)

On reproduction that flashes
When it doesn't flash -- the fps part

Interlacing, telecining and such

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Progressive means that each frame is drawn fully on screen, and that frames are drawn in simple sequence.

Seems very obvious. Best understood in contrast with interlacing and telecining (pixel packing can also matter somewhat).

Constraints in aether bandwidth and framerates are the main reasons for interlacing and telecining. Computers are not as constrained in those two aspects as broadcasting standards and CRT displays are, and as such, other factors (such as the compression codecs) tend to control the nature of video that is digitally available. This is one reason that much of it is progressive.


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Interlacing comes from a time where CRT screens were new, TV broadcast was already a thing, and engineers worked to squeeze more video out of fixed bandwidth.

Interlacing refers to a method where at you update every other line of physical scanlines, and then the ones you left out; repeat.

For example, given 25 full-screen rasters per second, you use half of it, then half of it, for 50fps half-updates (PAL frame rates, because I'm as biased as anyone).

It considers that TV phosphors that are relatively slow to fade out their color, and that the human visual system is less sensitive to flickering details than to flickering areas, and so this happens to be one of the least-noticeable ways of updating the screen at what looks like twice the speed, while TVs that didn't know about this would still do an entirely sensible thing too. So it was genuinely clever, given the constraints of the time.

The broadcaster had the option of doing either. Conceptually, they could now:

  • send 50fps of images that are each half vertical resolution
which looks faster, so great for things like sports broadcasts
  • send a 25fps source, essentially building up each full frame from two halves
gives more details
and would be the logical choice e.g. when the source is 24fps film

Note that at lower level, you could consider

both to be 25 full-screen rasters per second,
both to be 50 updates per second,

and the only real difference to what you end up seeing is what they contain image-wise.

Why do it, and why not?

The main reason to do interfacing is to add speed, or the option for speed, within fixed bandwidth which was also already settled in how frames were transferred.

Which was absolutely true for analog TV - not so much since.

There is even something to be said for interlacing when you don't have such constraints. It's a simple compression-like scheme, that works well enough for video content that shows mostly large-ish details and is predictable enough. You can have deinterlacing make specific assumptions to estimate the original - an estimate that will be better than just doubling the lines, yet smaller than the original data. Advanced de-interlacing algorithms, supported by newer and faster hardware (and making assumptions that are true for most video), can bring quality back to levels that are surprisingly near the original, for most but not all content.

One good reason against interlacing is that it is inherently lossy.

Another is that when digital video compression is involved, you should probably leave such details up to the compressor. In fact, most deal poorly with interlaced data (unless they are expecting it).

While with SDTV broadcast, interlacing was just the way it worked for archaic reasons, with HDTV it's an option. HDTV does see use of interlaced programs, and HD camcorders still choose to do it, and for exactly the same reasons of saving bandwidth (now network / storage, instead of RF).

But it's less pressing, because while it does save space/bandwidth, it's not half, and there's compression anyway.

Also, when your job involves editing and/or compressing video, interlacing means extra steps, extra processing, and extra chance of specific kinds of artefacts.

Some more gritty

Note that depending on how (and where in the chain) you look at the content, you could refer to interlaced content's rate either by the tranferred rasters or the transferred frames - the transferred framerate or the shown framerate (30000/1001 or 60000/1001 for NTSC, 25 or 50 for PAL). This leads to some confusion.

From this perspective of displaying things, each image that is received/read contains two frames, shot at slightly different times, which is updated on the display at different times.

This is not the only relevant view. Digital video will often not see interlaced video as having a doubled framerate, but instead have each captured frame contain two (chronologically adjacent) half updates. This makes sense as this is all the data that is displayed and avoids having to spend a lot of bandwidth -- but note that this basically stores interlaced video in progressive frames, which looks worse than progressive original on TV because while you're showing the same content, you're doing so at half the update speed again, which also means the interlacing lines are more apparent once captured than on TV.

For this reason, de-interlacing is often a good idea, which actually refers to a few different processes that make the video look better.

Interlacing in general and in analog sytems general happens at the cost of display artifacts under particular conditions: while interlacing is usually not particularly visible on TVs, specific types of resulting problems are brought out by things such as fast pans (particularly horizontal movement), sharp contrast and fine detail (such as small text with serifs, computer output to TV in general), shirts with small stripes (which can be lessened with some pre-broadcast processing).

Interlacing is also one of a few reasons that a TV recording of a movie will look a little less detailed than the same thing on a (progressive) DVD, even when/though DVDs use the same resolution as TVs (other reasons for the difference are that TV is analog, and involves various lossy steps coming to your TV from, eventually, the original film material).

For notes on telecined content, see below. For now, note also that interlacing is a technique applied at the playing end (involves taking an image and playing it as two frames), while telecining just generates extra frames initially, which play progressively (or have inverse telecine applied to it; see details).

Note that when a storage format and player are aware of interlacing, it can be used smartly again. For example, DVDs may mix progressive, telecined, and interlaced behaviour. The content is marked, and the player will display it accordingly. Interlaced DVD content is stored in the one-image-is-two-frames way, and the player will display it on the TV in the double-framerate-half-updates way described above.

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Deinterlacing takes interlaced material and produces a progressive-scan result.

Often applied to reduce interlacing's visual artifacts, particularly jagged edges (a.k.a. saw tooth edge distortion, mice teeth, combing, serrations), less noticeable, for display or for (re)compression (as lossy video compression that isn't counting on interlaced video deals with it quite badly).

Note that in some cases deinterlacing reconstructs the original. In most cases it is an inherently lossy process, in that it throws away data and isn't exactly reversible - but may be worth it perceptually.

There are also smarter and dumber ways of doing deinterlacing (most of those detailed below are the simple and relatively dumb variants), and the best choice depends on

  • whether you are doing it for display or storage/(re)compression
  • whether you displaying on a CRT (phosphor and scanline) or something else with output restrictions, or a digital display (which tends to have few restrictions)
  • the nature of the video before interlacing (are the two fields camptured at different times or not?)

You may like to know that:

  • Analog TV has to adhere to broadcast standards from the 1930s and is interlaced
    • ...but whether the two fields in a frame are from different times varies
    • in PAL countries most films are not shown interlaced - the two fields come from the same film frame (the 25/24 discrepancy is fixed by speeding up the film that much)
    • in NTSC countries, film is likely to be telecined (similar to interlacing)
    • sports and such is taken at 50/60fps and sent as that many fields (so shown as that many interlaced half-frames), as it looks smoother.
  • different types of camcorders may store half-height interlaced video, or may store progressive frames (depends on goal an quality).
  • TV capture cards usually take the TV signal as-is, so tend to return still-interlaced video (verify)

For the examples below, consider two video sources transmitted through interlaced broadcast:

  • video A: a film at 25 frames per second. Two two fields complement each other for the exact original frame
  • video B: sports footage, or video from an interlacing camcorder. The 25 frames per video second has its two fields from video taken 1/50th of a second apart.

The text below will mix 'field per second' and 'frame per second' (and sometimes fps for progressive, where there is no difference), so pay attention and correct me if I get it wrong :)


Weaving creates progressive frames by showing both fields at the same time, still interlaced.

This can also be described as 'doing nothing', other than copying each line from the alternate frame. It's simple to do, and it's fast. Technically it retains all the video data, but in a way that doesn't look good in various practice.

Weaving video A means we reconstruct the video at its original resolution and frame rate, and can show it as such digitally.

Weaving video B means we construct a 25fps video with jagged lines (when there is fast movement). On digital display this tends to be more noticeable than on CRTs, so for some purposes you would be better off with any deinterlacing that tries for 50fps output. In particular video compression, as most codecs don't deal so well compressing such one-pixel details, so this will typically give lower-quality encodes (for same-sized output).

Digital capture from interlaced media (e.g. a TV capture card) will often capture in a way that effectively weaves frames, which is why you get the jagged-movement effect if the video is never deinterlaced.


When displaying video on a device with too little processing power, an easier-to-code and faster method is discarding every second field (a.k.a. single field mode) and drawing lines from the other twice (line doubling), to get output that has the same size and same framerate as the original.

On video A we throw away half the vertical detail (and produce 25fps video).

On video B we throw away half the vertical detail and half the time material as well (and produce 25fps video).

Compare with bob. (basically, discard is half a bob)


You can use all the data by blending (a.k.a. averaging, field combining) both fields from a frame into a single output frame.

For video A, you would produce an (unecessarily) blurred version of the original. Weave may be preferable in various cases.

For video B you would create a 25 frame per second version with less jagging, but motion will have a sort of ghosting to it. Better than weave in that the jagging isn't so obvious, but everything will be blurred.

(Note that sizing down video has a similar effect, and can even be used as a quick and dirty way of deinterlacing if software does not offer any deinterlacing options at all)


Bob, Bobbing (also 'progressive scan', but that is an ambiguous term) refers to taking both fields from the frame and displaying them in sequence.

Video A would become line-doubled 50 frames per second. Stationary objects will seem to bob up and down a little, hence the name. You're also doubling the amount of storage/bandwidth necessary for this 25fps video while reducing its vertical detail.

Video B would be shown with its frames at its natural fluid 50 frames per second (note that most other methods would create 25 frame per second output). (note that if you take the wrong lines first, you'll put the in the wrong order and the video looks a bit twitchy)

Bob-and-weave and optional cleverness

Bob and weave refers to the combination of bob and weave.


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Telecine is a portmanteau of 'television' and 'cinema' and refers to the process of converting video between these -- though its methods are more widely applicable than that.

It is perhaps most often mentioned inthe context of frame-rate conversions from film (often 24fps) to NTSC television (30fps, adding intermediate frames) or PAL television (25 frames/s, often by playing the frames at 25fps and the audio a little faster)

Frame rate conversion from 24fps film to 30000/1001 broadcast NTSC is usually done using 2:3 pulldown, which uses a constant pattern of interlace-like half-updating frames and some duplication to end up with ~6 additional intermediate frames per second. Since you end up with 30 frames for a second, which look approximately the same as the 24fps original, the audio speed can stay the same. It still means leaving some frame content on-screen longer than others, which is visible in some types of scenes. For example, a slow smooth camera pan would be shown with a slight judder after telecining.

Telecining is inversible, in that you can calculate the original frames from a stream of telecined frames (though when telecined content is spliced this may mean a dropped frame or two).

When the display is not bound to a 30fps refresh rate, you could apply inverse telecine to yield the original 24fps content - which can make sense whenever you can play that back at that rate, since it removes the judder. Inverse telecine is also useful when (re)compressing telecined video, since many codecs don't like the interlace-like effects of telecining and may deal with it badly.

Hard telecining' refers to storing telecined content e.g. 30000/1001 for NTSC generated from 24fps film content, so that the given content can be played (progressively) on NTSC equipment. The upside is that the framerate is correct and the player doesn't have to anything fancy, the downside is that it usually has negative effect on quality.

Soft telecining refers to storing video using the original framerate (e.g. 24000/1001 or 24fps film) and flagging the content to be telecined, so that the eventual player (e.g. box-top DVD players) will telecine it to 30000/1001 on the fly. NTSC DVDs with content originating from 24fps film usually store 24000/1001fps(verify) progressive, with pulldown flags set so that the DVD player will play it as 30000/1001fps.

See also:

Mixed content

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Note that (particularly digitally stored) video may contain mixes of different frame rates in a single video, and may mix progressive and telecined, progressive and interlaced, and sometimes even all three.

You generally want to figure out whether a video is progressive, interlaced or telecined. One way to do this is using a player that allows per-frame advancing (such as mplayer). Make sure it's not applying filters to fix interlacing/telecining, find a scene with movement (preferably horizontal movement/panning), and see whether there are interlace-style jaggies.

  • If there are none, it is progressive (or possibly already deinterlaced by the player)
  • If there are in every frame, it is interlaced
  • If there are in only some frames, it is telecined (two out of five in 2:3 24-to-30fps telecine).

Note that things like credits may be different (apparently often telecined on DVDs).

While telecining uses regular patterns of extra frames, splicing after telecining means the video will usually not follow that pattern around a splice, meaning that inverse telecine may not be able to decode all original frames. This is often the cause of encoders/players complain about a few skipped and/or duplicate frames in a movie's worth of frames, and you can ignore this - hardware players do the same.

See also



Deinterlacing, telecining:


(Analog) TV formats

There are a number of variants on NTSC, PAL and SECAM that may make TVs from different countries incompatible. NTSC is used in North America and part of South America (mostly NTSC M), and Japan (NTSC J).

PAL is used in most of Europe, part of South America, part of Africa, and Asia. SECAM is used in a few European countries, part of Africa, and Russia.

PAL M (used in Brazil) is an odd one out, being incompatible with other PAL standards, and instead resembling NTSC M - in fact being compatible in the monochrome part of the NTSC M signal.

CRT TVs often support just one of these, as it would be more complex to receive and convert/display more than one, and few people would care for this feature.

It should be noted that the actual broadcast signal imagery uses more lines than are shown on the screen. Of the video lines, there are fewer that are the raster, the imagery that will be shown on screen.

  • 525-scanline video (mostly NTSC) has 486 in the raster, and many show/capture only 480(verify)
  • 625-scanline video (mostly PAL) has 576 in the raster

The non-raster lines historically were the CRT's vertical blanking interval (VBI), but now often contains things like teletext, closed captioning, station identifiers, timecodes, sometimes even things like content ratings and copy protection information (note: not the same as the broadcast flag in digital television).

Video recording/capture will often strip the VBI, so it is unlikely that you will even have to deal with it. Some devices, like the TiVo, will use the information (e.g. respect copy protection) but do not record it (as lines of video, anyway).

Devices exist to add and alter the information here.

PAL ↔ NTSC conversion

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Note that various DVD players do this, although others do not, and neither the fact that they do or that they don't is necessarily advertized very clearly.

PAL to NTSC conversion consists of:

  • Reducing 625 lines to 525 lines
  • creating ~5 more frames per second

NTSC to PAL conversion consists of:

  • increasing 525 to 625 lines
  • removing ~5 frames per second

In general, the simplest method, that cheaper on-the-fly conversion devices often use, is to duplicate/omit lines/frames.

This tends to not be the best looking solution.

Linear interpolation (of frames or lines) can offer smoother-looking motion and fewer artifacts, but are more computationally expensive, and have further requirements - such as working on deinterlaced content.

Fancier methods can use things like motion estimation (similar to fancy methods of deinterlacing)

Digital / HD broadcasting

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

ATSC in the US, DVB in Europe

See also:

See also


On types and groups of frames

In MPEG-like codecs (DivX/Xvid, H.264, and more), there is a choice to encode each frame as a/an...

  • I-Frame (inter-frame)
    • a frame you can decode purely from its own data
    • larger than predictive (B- and F-)frames when B/F can predict differences to adjacent frames fairly well (e.g. motion and such), which is usually. When predictive frames don't work well, such as camera cuts, -frames are preferable.
    • having regular I-frames helps faster seeking, because seeking generally means "look for the most recent I-frame, and start decoding all video until you reach the requested frame"
  • P-frame (predictive)
    • uses information from previous frame
    • ...which is typically less information than a complete I-frame, so makes for better compression
  • B-frame (bidirectional predictive)
    • uses information both from previous and next frame
    • which ends up being less information than forward-only prediction
    • does better on slow motion(verify)
    • more complex to decode

There also used to be D-frames, in MPEG-1, which were basically low-quality easy-to-decode I-frames, which allowed fast preview while seeking.

A GOP (group of pictures) is a smallish group of adjacent frames that belong together in some sense.

A GOP starts with an I-frame, and may contain P-frames, B-frames, and technically can also contain I-frames (a GOP is not defined by position of I-frames, but by being marked as a new GOP in the stream. That said, multiple-I-Frame GOPs are not common)(verify).

Most encoders use I-frames (so start a new GOP)...

  • when it makes sense for the content, such as on scene cuts where a predictive frame would be lower quality
  • once every so often, to guarantee seeking is always fastish by always having a nearby I-frame to restart decoding at
  • other frame-type restrictions. For example, encoders are typically limited to use at most 16 B-frames in a row, and at most 16 P-frames in a row (which gets more interesting when you mix I, P, and B-frames)

To make things more interesting, the frame store order, the decode order, and the display order can all be different. This is largely because (and when) a B-frame can by its nature not be decoded without having the two adjacent frames already decoded, so they are usually stored surrounded by I or P [72]

You could also have GOP-less video, or perhaps better named size-1-GOP, where the video that consists entirely of I-frames. This is much larger for the same video, but means all seeking is pretty immediate, so is nice when editing video, and when you want fast frame-by-frame inspection in both directions. (for example if you want to study animation)

A closed GOP is one that can be decoded completely without need of another GOP. This is contrasted with an open GOP, which ends in a B-frame so which needs to look into the next GOP's first frame (an I-frame).

Open GOPs make for slightly more efficient coding, because you're using a little bit more predictive coding.

Closed GOPs, again, make for slightly easier decoding.

And easier decoding in general means easier/faster frame-by-frame editing, faster playing in reverse, faster DVD angle switching(verify). (It also seems to be compatible with more players (divx5, ffdshow), though that is now probably an outdated statement)

Note that various players, when seeking, don't go for the requested frame but the nearest Iframe, meaning the seeking will always go in larger jumps, and never to specific frames.

There are guidelines for standardized formats and hardware players, usually restricting to relatively small GOP sizes (on the order of 5 to 12 frames).

For example, the specs for DVDs apparently says 12 frames max, which is about half a second worth of video, in part to guarantee seeking in reasonable steps.

Allowing large GOPs (say, 300 frames, ~10 seconds) makes for slightly more efficient coding in the same amount of bytes (because you have a few more predictive frames where you would otherwise force I-frames), but it's a diminishing-returns thing.

Keep in mind that encoders often have multiple different influences on GOP / frame type choice, so handing in GOP options may influence its strategies, more than force anything specifically.

To inspect what frame types a file has:

  • People mention ffprobe -show_frames but this seems to be a deprecated option.(verify)
  • libavcodec has a vstats option, which writes a file in the current directory with frame statistics about the input file. For example:
mplayer -vo null -nosound -speed 100.0 -lavdopts vstats input.avi

(Without -speed it seems to play at the video rate, and there's probably a way around that better than a -speed that it probably won't reach)

or ffmpeg:

ffmpeg -i input.avi -vf showinfo -f null -

Some notes on aspect ratio

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Display Aspect Ratio (DAR) means "this ought to be shown at this ratio". Example: 16:9. This is information that some files can store to tell the player to do so (or that standards can imply for all files of a type).

DAR matters allows more arbitrary aspect ratio in the actual pixel dimensions - for example, SVCDs are 480x480 (NTSC) or 480x576 (PAL/SECAM) -- but store content meant for 4:3 or 16:9 display. Which is one way to store more vertical than horizontal detail. SVCD players will rescale this to some resolution that has the proper aspect ratio at play time (usually just fitting it in the largest non-cropping size for the given TV resolution(verify)).

This works because MPEG can store aspect ration information, so hardware players and most software players listen to and use it. Not all (software) players understand it in MPEG4 files yet, though.

AVI (and any content stored in it, including MPEG4) does not support it -- but the opendml extension that does allow it is now fairly commonly used. Not all players know about opendml, though most that matter do.

When encoding, the best choice depends on what you want to play things on. The most compatible way is rescaling so that square pixels would play correctly. However, this usually means smallish resolution changes which can look like mild blurs .

Some notes on frame rates

See also Video#Frame_rate.2C_analog_TV_format.2C_and_related for basic notes on frame rate.

Resolution names/references

References like 480i and 720p became more commonly used in the era commonly known as HD (now), partly just because it's brief and accurate.

These references are not often seen alongside monitor resolutions, perhaps because "720p" and "1080p HD" is easier to market when you don't realize that's about as good as a decade-old monitor. Except on a bigger screen (for TV image quality, the pixels make much less difference than the new digital content/broadcast standards that came along with them).

References such as 480i and 720p refer to the vertical pixel size and whether the video is interlaced or progressive.

The common vertical resolutions:

  • 480 (for NTSC compatibility)
    • 480i or 480p
  • 576 (for PAL compatibility)
    • 576i or 576p
  • 720
    • always 720p; 720i does not exist as a standard
    • 1280x720 (sometimes 960x720)
  • 1080 (HD)
    • 1080i or 1080p
    • usually 1920x1080

There are some other newish resolutions, many related to content for laptops/LCDs, monitor/TV hybrids, widescreen variations, and such.

HD TV broadcasts are often either 1080i or 720p. While 1080i has greater horizontal resolution (1920x1080 versus 1280x720), 720p does not have interlace artifacts and may look smoother.

The 480 and 576 variants usually refer to content from/for (analog) TVs, so often refer to more specific formats used in broadcast.

  • 576 often refers to PAL, more specifically:
    • analogue broadcast TV, PAL - specifically 576i, and then often specifically 576i50
    • EDTV PAL is progressive, 576p
  • 480 often refers to NTSC, more specifically:
    • analogue broadcast TV, NTSC - specifically 480i, and then often specifically 480i60
    • EDTV NTSC is progressive, 480p
  • 486 active lines seems to refer to older NTSC - it now usually has 480 active lines

There is more variation with various extensions - widescreen, extra resolution as in e.g. PALPlus, and such.

Sometimes the frame rate is also added, such as 720p50 - which usually refers to the display frequency applicable.

In cases like 480i60 and 576i50 you know this probably refers to content from/for NTSC and PAL TV broadcast.

See also:

On TV horizontal pixel resolution

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

For analogue TV, pixel-per-line resolution is not really set in stone. Because of the way the signal is used, anything above ~500 or so looks good enough.

  • Cheaper NTSC CRTs couldn't really display more than 640, cheaper PAL CRTs (verify)
  • 720 was treated as a maximum (fancy editing systems of the time supported it)
  • 704 is a sort of de facto assumption of the average that TVs tend to display(verify), and is also what EDTV uses (704x576 for PAL and 704×480 for NTSC)

As such,

  • NTSC can be 720x480 or 704x480, or 640x480,
  • PAL can be 720x576 or 704x576,

depending a little on context.

On digital broadcast, a stream has a well-defined pixel resolution, but since the displays are more capable, they are typically quite flexible in terms of resolution and frame rate.

Relevant acronyms here include

  • ATSC (digital broadcasting in the US, replacing analog NTSC)
  • DVB (digital broadcasting in Europe, replacing analog PAL)
  • EDTV (sort of halfway between basic digital broadcast and HDTV)
  • HDTV

More resolutions

See e.g. the image on

Screen and pixel ratios

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Video capture hardware

Video editing hardware

Webcam/frame-oriented software

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(What you may want to look for in) more-than-webcam software

Mainly for editing

Mainly for conversion

Some specific tools

See also