Video format notes
| The physical and human spects dealing with audio, video, and images
Video: format notes · encoding notes
Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions
Unsorted: Signal analysis, modeling, processing (some audio, some more generic)
For more, see Category:Audio, video, images
| These are primarily notes|
It won't be complete in any sense.
It exists to contain fragments of useful information.
This page is mostly about storage of video, and variation therein. It also touches on some video capture.
For Notes on encoding video, see that.
- 1 Digital video (files, streaming)
- 1.1 Container formats
- 1.2 The wide concept that is MPEG
- 1.3 Video codecs
- 1.4 Pixel/color formats (and their relation to codecs)
- 1.5 Streaming, streaming support protocols
- 1.6 Subtitles
- 2 Frame rate, analog TV format, and related
- 2.1 Frame rates
- 2.2 Interlacing, telecining and such
- 2.3 (Analog) TV formats
- 2.4 Digital / HD broadcasting
- 2.5 See also
- 3 Semi-sorted
- 3.1 On types and groups of frames
- 3.2 Some notes on aspect ratio
- 3.3 Some notes on frame rates
- 3.4 Resolution names/references
- 3.5 Screen and pixel ratios
- 3.6 Video capture hardware
- 3.7 Video editing hardware
Digital video (files, streaming)
|This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)|
This is meant primarily as a technical overview of the codecs in common and/or current use (with some historical relations where they are interesting, or just easy to find), without too many details; there are just too many old and specialist codecs and details that are not interesting to most readers.
Note that some players hand off reading/parsing file formats to libraries, while others do it themselves.
For example, VLC does a lot of work itself, particularly using its own decoders. This puts it in control, allowing it to be more robust to somewhat broken files, and more CPU-efficient in some cases. At the same time, it won't play unusual files as it won't perfectly imitate other common implementations, and it won't be quite as quick to use codecs it doesn't know about; in these cases, players that hand off the work to other things (such as mplayerc) will work better.
|This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)|
Containers are file types that usually allow various streams of various types and using various codecs
Some relatively general-purpose container formats include:
AVI (Audio Video Interleave)
A (a RIFF derivative; see also IFF), and fairly common, though not ideal for MPEG-4 video tracks, VBR MP3 audio tracks, and some other things, as this old format does not really allow this without some hacks that are also conventions. Many AVIs in the wild violate the AVI standard - but play fine on most (computer) players.
- Files with the .divx extension are usually AVIs (...containing DivX video)
- Google Video (.gvi) files use MPEG-4 ASP and MP3 in a mild variant on AVI container  (and do not really exist anymore)
MKV (Matroska Video)
An open standard, preferred by some, as it a fairly well designed many-stream format, and because it allows subtitle embedding, meaning you avoid hassle related to external subtitle files.
Ogg is an open standard.
Extension is usually .ogg, or .ogm, though .ogv, .oga, and .ogx are also seen.
Note that initially, ogg often impied Ogg Vorbis: Ogg containers containing Vorbis audio data.
See also Ogg notes.
Ogg Media (.ogm) is a somewhat hackish extension (of Ogg), which supports subtitle tracks, audio tracks, and some other things that make it more practical than AVI, and put it alongside things like Matroska.
Ogg Media is not really necessary and will probably not be developed, in favour of letting Matroska become a wider, more useful container format.(verify)
MPEG 1, 2, and 4 (MPEG in general) supports a limited amount of stream types, whether quite specifically settled (such as in DVD VOBs) or less so (various MPEG video in the wild)
- MPEG-PS (more specific encapsulation, used in DVD, HD-DVD, more) 
- MPEG-1 Program Stream (PS)
- MPEG-2 Program Stream (PS)
- MPEG-TS (more specific encapsulation, used in DVB) 
- MPEG-2 Transport Stream (TS)
- MPEG-4: MPEG-4 Container (MP4)
- DVD-Video's VOB is sometimes referred to as an MPEG container; it's a simper variant of an MPEG-2 PS with some DVD-specific uses and restrictions
A number of container formats support only a limited number of codecs (sometimes just one), particularly if they are proprietary and/or specific-purpose.
Such container formats include:
- Flash video (.flv) 
- NUT (.nut), a competitor to avi/ogg/matroska 
- Quicktime files (.mov) are containers, though without extensions to quicktime, they support relatively few codecs. In recent versions, MPEG-4 was added.
- ASF (Advanced Systems Format), a proprietary format from Microsoft, most commonly storing wma and wmv content, and sees little other use in practice (partly because of patents and active legal protecting). 
- RealMedia (.rm)
- DivX Media Format (.dmf)
|This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)|
tl;dr: data in MPEG-2 PS, some restrictions, and some DVD-specific metadata/layout around it.
A VIDEO_TS directory with VOB, IFO, and BUP files are, in a fashion, a container format as they are the DVD-Video way of laying out:
- metadata about steam data (chapters, languages of tracks, angles, etc.)
- Video streams (usually MPEG-2, sometimes MPEG-1)
- Audio streams (AC-3, MPEG-1 Layer II (MP2), PCM, or DTS)
- Subtitle streams (bitmap images)
(note: The AUDIO_TS directory is used by DVD-Audio discs, which are fairly rare. On DVD-Video discs, this directory is empty, and the audio you hear is one of the streams in the VOBs.)
IFO stores metadata for the streams inside the VOB files (e.g. chapters; subtitles and audio tracks). BUP files are simply an exact backup copy of the IFO files (to have a fallback for a scratched DVD).
VOB files are containers based on MPEG-2 PS, and store the audio, video, and image tracks.
VOB files are segmented in files no larger than 1GB, which was a design decision meant to avoid problems with filesystem's file size limits (since the size of a DVD was larger than many filesystems at the time could deal with).
DVD players are basic computers in that they run a virtual machine. DVD-Video discs with menus run bytecode on that, although most such code is pretty trivial if you consider the potentially flexibility of the VM -- there are a few DVD games, playable by any DVD player.
Stream identifiers (FourCCs and others)
When container formats can store more than one video codec, they want to be able to indicate the format (codec) used in each stream.
- AVI uses FourCCs, a sequence of four bytes used in AVI and a few others - usually four printable ASCII characters
- MPEG containers mostly just contains MPEG video (...but there're a bunch of details to that)
- Matroska (mkv) uses another system, CodecID, a flexible-length string.
- Ogg doesn't have an identifier system, instead asking all available codecs whether they can play the data given to them (initially just the first frame from a stream).
The wide concept that is MPEG
MPEG can refer to one of three formats, MPEG-1, MPEG-2, and MPEG-4 (3 was skipped to avoid confusion with MP3, which is actually short for MPEG-1 layer 3), formats that can store video and/or audio streams, and a little more.
MPEG-1 and MPEG-2 see use in DVDs and some related earlier formats (such as VCDs, and variants).
They are relatively simple, meaning that hardware players were easier to get right, but also that these are not the most flexible formats.
Use now tends to mean you want to pick a relatively high bitrate(-per-quality ratio), which is acceptable on DVDs since they have a whole bunch of space to (almost invariably) store a single movie.
These are not a very efficient choice when space is scarcer, though.
MPEG-4 is a standard with many parts.
People usually use it to refer to
- the standard as a whole
- the MP4 container format
- one of the two two large-distinct video codecs in the standard
Those two are:
- MPEG-4 ASP (defined in MPEG-4 Part 2); implementations include:
- MS MPEG4 (v1, v2, v3), (primarily used in ASFs, and not strictly compliant) (FourCC: MP42, MP43, DIV3, WMV7, WMV8, AP41, COL1) 
- DivX ;-), DivX (initially hack of MS MPEG4 v3 to allow use in AVIs. DivX was later commercially developed)
- Xvid (succeeded OpenDivX)
- 3ivx (v1, v2) (FourCC: 3IV1, 3IV2)
- Nero Digital, mostly an internal format
- MPEG-4 AVC, a.k.a. H.264 ('Advanced Video Coding',defined in MPEG-4 Part 10 as well as ITU-T H.264); implementations include:
To see how much wider MPEG-4 is, see e.g. 
Note that the codecs only describe how video should be stored and decoded, so different encoders exist that do things differently, spend more time and squeeze more quality out of the same format.
Such alternatives should work on all players that comply to the respective part of MPEG-4, though in some cases, they also involve file format details which may prevent them from being playable on fully standard players.
H.26x family (related to MPEG and ITU standards. H.something is the ITU name):
- H.261, a format made for videoconferencing over ISDN. Came before the more widely used H.263 
- H.262, which is identical to part of the MPEG-2 standard
- H.263: for videoconferencing (seen used in H.323). Also the base of various other codecs, including:
- VIVO 1.0, 2.0, I263 and other h263(+) variants
- Early RealVideo
- Sorenson (including early Flash video)
- Sorenson 1 (SVQ1, svq1, svqi)
- Sorenson Spark (Used in Flash 6, 7 and later for video)
- (Sorenson 3 (SVQ3) was apparently based on a H.264 draft instead)
- See also 
- H.264, a.k.a. MPEG-4 AVC, MPEG-4 Part 10
- MPEG-4 part 2, a.k.a. MPEG-4 ASP
- DivX, XviD, and many versions, variants, and derivatives
- FourCC:  mentions 3IV2, 3iv2, BLZ0, DIGI, DIV1, div1, DIVX, divx, DX50, dx50, DXGM, EM4A, EPHV, FMP4, fmp4, FVFW, HDX4, hdx4, M4CC, M4S2, m4s2, MP4S, mp4s, MP4V, mp4v, MVXM, RMP4, SEDG, SMP4, UMP4, WV1F, XVID, XviD, xvid, XVIX
(See also MPEG4)
- VP8 or VP9, plus Vorbis or Opus, in Matroska
- started by Google after acquiring On2
- supported by all modern browsers (like H.264)
- open, also royalty-free (unlike some parts of MPEG4)
- Quality is quite comparable to H.264
On2 (Duck and TrueMotion also refer to the same company):
- VP3 (FourCC: VP30, VP31, VP32): . Roughly in the same class as MPEG4 ASP. Open sourced.
- VP4 (FourCC: VP40) 
- VP5 (FourCC: VP50):  
- VP6 (FourCC: VP60, VP61, VP62): Used for some broadcasting  
- VP7 (FourCC: VP70, VP71, VP72): A competitor for MPEG-4  
Xiph's Theora codec is based on (and better than) On2's VP3 
RealVideo uses different names internally and publicly, some of which are confusable:
- RealVideo (FourCC RV10, RV13) (based on H.263)
- RealVideo G2 (fourCC rv20) used in version 6 (and 7?) (based on H.263)
- RealVideo 3 (FourCC rv30) used in version 8 (apparently based on a draft of H.264)
- RealVideo 4 (FourCC RV40, and also UNDF) is the internal name/number for the codec used in version 9. Version 10 is the same format, but the encoder is a little better.
- The H.263-based versions (up to and including 7) were not very impressive, but versions 9 and 10 are quite decent. All are proprietary and generally only play on RealPlayer itself, unless you use something like Real Alternative.
- Windows Media Video , often in .wmv files (which are asf containers)
- version 7 (FourCC: WMV1) (based on MPEG-4 part 2)
- version 8 (FourCC: WMV2)
- version 9 (FourCC: WMV3)
- RTVideo 
- VC-1 
- Quicktime 
- Internal formats like 'Intermediate Codec'  and ProRes 
- Flic (.fli, .flc), primarily video-only files used in Autodesk Animator 
- Cinepak 
- Intel Indeo:
- MJPEG is mostly just a sequence of JPEG images (FourCC: AVDJ, AVID, AVRn, dmb1, MJPG, mjpa, mjpb).   (There are also some variations on this theme)
- Various RLE-like formats, used primarily for very simple animations
- Uncompressed Raw YUV 
- Compressed YUV, e.g.
- HuffYUV (lossless, and easily over 20GB/hour)
- RawRGB (FourCC: 'raw ', sometimes 0x00000000) 
- Hardware formats: (verify)
Pixel/color formats (and their relation to codecs)
Streaming, streaming support protocols
Hardsubs are a jargon term that refers to subtitles that are mastered to be directly part of the video. They have no special status, mask any actual video. This avoids all support issues, and usually looks good, but they give no choice of language, or whether to display the subtitles or not.
Softsubs refer to separate subtitle data, historically often as a separate file with the same name and a different extension, and more recently as a part of container formats which support multiple streams (such as MKV), which can also store multiple different subtitles (e.g. languages) at once.
There are a number of formats, and not all file extensions are very obvious. Particularly things like .sub and .txt may be one of various formats.
- Subtitle editors' internal formats (text, binary, xml, other), some of which became more widely used:
- Image-based: (avoids font problems, larger)
- Other/unsorted (and other internal formats):
- SAMI (.smi) , often used in Korea
- DVD-based/derived (CC, SPU, VobSub)
- Karaoke formats (.lrc, .vkt, )
- MPSub (.sub), a format internal to mplayer 
- MPEG-4 Timed Text 
- Power DivX (.psb) 
- ViPlay Subtitle File (.vsf)
- Phoenix Japanimation Society (.pjs)  (old(verify))
- Subsonic (.sub) 
- ZeroG (.zeg) 
- Adove Encore (.txt) 
- MPL2 
- VPlayer 
- Sasami Script (.s2k)
- SubViewer (verify)
- RT (verify)
- DVB (verify)
- Teletext (verify)
- LRC (Lyrics, meant for lyrics/karaoke on audio)
- TTML 
- ARIB (Association of Radio Industries and Businesses)
Editors and other utilities:
- Aegisub (ASS, SRT, SSA, exports to them and PRS)
- Subtitle Editor (linux)
- VisualSubSync (SRT, ASS, SSA)
- KSubtile (linux, SRT)
- Subtitle workshop
- Subtitle Processor
- SubRip rips subtitles from DVD format (into SRT(verify))
- SubEdit (player)
(I'm not sure about all this - there is so much fuzzy and/or conflicting information out there)
Movies on media like DVD come in different frame rates. This does not matter to computer playability, so unless you are compressing or converting video, you probably want to ignore this completely.
Some of the more common rates seem to be:
|rate||common uses / suggests source||also referred to as||approx|
|24 (exactly)||used to shoot most film, and used in most cinema projection||film|
|24000/1001fps||usually an intermediate in conversion from film to NTSC color||'NTSC film'|| 23.976,|
|25 (exactly)||Speed of rasters transferred (not shown frames) in broadcasts such as PAL (except PAL M) and SECAM.||'PAL video',|
|30000/1001||the speed of rasters transferred (not shown frames) in (interlaced) broadcasts such as NTSC M (the most common NTSC form) and also also PAL M. Pre-1953 NTSC broadcasts was exactly 30.0fps)}}||'NTSC video'||29.97|
|30 (exactly)||Apparently the black and white variant of NTSC was exactly 30, and 30000/1001 was the hack upon that (verify). Exactly-30fps content is relatively rare(verify), because it's either pre-1953 NTSC TV, or modern digital things that just chose this(verify).||
|50 (exactly)||Can refer to 50 frame per second progressive, or 25 frame per second interlaced that is being played (and possibly deinterlaced) as its 50 contained fields per second (as e.g. in PAL and SECAM TV ((except PAL M))||'PAL film', 'PAL video', 'PAL field rate'|
|60000/1001 (verify)||The field rate of NTSC color. Can refers to NTSC color TV that is transferring interlaced rasters.||'NTSC field rate'|
These are the most common, but other rates than these exist. For example, there is double rate and quad rate NTSC and PAL (~60fps, ~120fps; 50fps, 100fps), often used for editing, or e.g. as intermediates when converting interlaced material.
A framerate hints at the source of the video (24 is typically film, ~25 is often PAL broadcast, 30000/1001 is typically NTSC broadcast) and/or the way it is played (e.g. 50 and 60000/1001 usually means analog TV content, and possibly interlaced). There's a bunch of cases where you can't be sure, because there are some common conversions, e.g. 24fps film converted to 25fps and 29.97fps for broadcast and DVDs. And be careful of assumptions about interlaced/progressive/telecined. Note also that DVDs can mix these.
Movies still mostly uses 24fps, primarily because we're used to it. It looks calmer, and we associate 24fps with higher quality, partly because historically higher framerates remind us of home video and its associated cheapness (also with technical uses like interlaced sports broadcasts).
(Of course, these associations are also entangled with camerawork and other aspects, so it's not quite that simple. It's partly silly, because TV broadcast of 24fps material necessarily involved some less-than-perfect conversions)
On 1001 and approximations
NTSC made things interesting.
It existed since 1941 was exactly 30fps, had no color, and fit in 4.5MHz of usable radio frequency.
In 1953 it was replaced by what we now call NTSC color. Because people wanted it to be backwards compatible with the black and white televisions of the time, allowing one transmission to serve both, it was designed to send separate luminance (black and white) roughly as before, and separate chrominance information.
...but in the same frequency band, so it basically had to overlap the luminance. Minimizing the interference between the two, with the method they chose meant some math that required that columns, rows, or frames per second had to be shifted a little.
They chose to settle columns and rows to 525 and 286, and fudge frames per second, meaning that that number is
4500000Hz / (525*286)
which happens to simplify to
Which is approximately
So, NTSC color broadcast is never 30fps, it's always 30000/1001.
The approximation 29.97 is inaccurate, though by so little that getting this wrong that it only shows after an hour or two.
It can matter when processing content where NTSC is involved somehow, such as going between PAL and NTSC hardware/broadcast, so matters to professional transcoding. For typical-youtube-length videos you won't notice.
(Similarly, 23.976 for 24000/1001 (happens in film-to-NTSC conversion) are also slightly off, 23.97 and 23.98 more so. (verify)
(Since the same trick wasn't necessary for PAL, PAL is always 25fps precisely.)
(Also, film to PAL is often just played that ~4% faster.)
In most cases, the difference between the fraction and its approximation is tiny, but will be noticeable over time, in timecode (and probably audio sync, depending a little on how the audio is synced).
You can fix NTSC's timecode issue with Drop-Frame (DF) timecode.
Timecode counts in whole frames, so from its perspective the difference between 29.97fps instead of 30fps is .03 missing frames per second - i.e. (30*60*60 - 29.97*60*60=) 108 frames per hour.
Drop Frame Timecode (DF) skips two of the frames from just its own count (these frames do not exist), from nine out of every ten minutes. This happens to work out as exactly (2 frames * (6*9) applicable minutes =) 108.
When video is made for (analog) broadcast, it is very much constrained by that standard's color and frame coding, and more significantly in this context, its framerate.
When video is made for film, it is limited by projector ability, which has historically mostly been 24fps.
When it is made for NTSC or PAL broadcast, it is usually 30000/1001 or 25, respectively.
Computers will usually play anything, since they are not tied to a specific rate. Even though monitors have a refresh rate, almost all common video will be slower than, meaning you'll see all frames anyway. Computer playing is often synchronized to the audio rate).
Conversion from 24fps film to 25fps PAL broadcast rate can be done by playing the frames and audio faster by a factor 25/24, either letting the pitch be higher (as will happen in analog systems) or the same pitch by using digital filters (optional in digital conversion).
Few people will notice the ~4% difference in speed, pitch or video length.
This does not work for NTSC as the difference is ~25%. Instead, telecining is often used, though it also makes film on TV a little jumpier for the 60Hz/30fps countries (including the US).
60fps (e.g. gaming)
120Hz, 144Hz, 156Hz, etc.
Interlacing, telecining and such
Progressive means that each frame is drawn fully on screen, and that frames are drawn in simple sequence.
Seems very obvious. Best understood in contrast with interlacing and telecining (pixel packing can also matter somewhat).
Constraints in aether bandwidth and framerates are the main reasons for interlacing and telecining. Computers are not as constrained in those two aspects as broadcasting standards and CRT displays are, and as such, other factors (such as the compression codecs) tend to control the nature of video that is digitally available. This is one reason that much of it is progressive.
Interlacing comes from a time where CRT screens were new, TV broadcast was already a thing, and engineers worked to squeeze more video out of fixed bandwidth.
Interlacing refers to a method where at you update every other line of physical scanlines, and then the ones you left out; repeat.
For example, given 25 full-screen rasters per second, you use half of it, then half of it, for 50fps half-updates (PAL frame rates, because I'm as biased as anyone).
It considers that TV phosphors that are relatively slow to fade out their color, and that the human visual system is less sensitive to flickering details than to flickering areas, and so this happens to be one of the least-noticeable ways of updating the screen at what looks like twice the speed, while TVs that didn't know about this would still do an entirely sensible thing too. So it was genuinely clever, given the constraints of the time.
The broadcaster had the option of doing either. Conceptually, they could now:
- send 50fps of images that are each half vertical resolution
- which looks faster, so great for things like sports broadcasts
- send a 25fps source, essentially building up each full frame from two halves
- gives more details
- and would be the logical choice e.g. when the source is 24fps film
Note that at lower level, you could consider
- both to be 25 full-screen rasters per second,
- both to be 50 updates per second,
and the only real difference to what you end up seeing is what they contain image-wise.
Why do it, and why not?
The main reason to do interfacing is to add speed, or the option for speed, within fixed bandwidth which was also already settled in how frames were transferred.
Which was absolutely true for analog TV - not so much since.
There is even something to be said for interlacing when you don't have such constraints. It's a simple compression-like scheme, that works well enough for video content that shows mostly large-ish details and is predictable enough. You can have deinterlacing make specific assumptions to estimate the original - an estimate that will be better than just doubling the lines, yet smaller than the original data. Advanced de-interlacing algorithms, supported by newer and faster hardware (and making assumptions that are true for most video), can bring quality back to levels that are surprisingly near the original, for most but not all content.
One good reason against interlacing is that it is inherently lossy.
Another is that when digital video compression is involved, you should probably leave such details up to the compressor. In fact, most deal poorly with interlaced data (unless they are expecting it).
While with SDTV broadcast, interlacing was just the way it worked for archaic reasons, with HDTV it's an option. HDTV does see use of interlaced programs, and HD camcorders still choose to do it, and for exactly the same reasons of saving bandwidth (now network / storage, instead of RF).
But it's less pressing, because while it does save space/bandwidth, it's not half, and there's compression anyway.
Also, when your job involves editing and/or compressing video, interlacing means extra steps, extra processing, and extra chance of specific kinds of artefacts.
Some more gritty
Note that depending on how (and where in the chain) you look at the content, you could refer to interlaced content's rate either by the tranferred rasters or the transferred frames - the transferred framerate or the shown framerate (30000/1001 or 60000/1001 for NTSC, 25 or 50 for PAL). This leads to some confusion.
From this perspective of displaying things, each image that is received/read contains two frames, shot at slightly different times, which is updated on the display at different times.
This is not the only relevant view. Digital video will often not see interlaced video as having a doubled framerate, but instead have each captured frame contain two (chronologically adjacent) half updates. This makes sense as this is all the data that is displayed and avoids having to spend a lot of bandwidth -- but note that this basically stores interlaced video in progressive frames, which looks worse than progressive original on TV because while you're showing the same content, you're doing so at half the update speed again, which also means the interlacing lines are more apparent once captured than on TV.
For this reason, de-interlacing is often a good idea, which actually refers to a few different processes that make the video look better.
Interlacing in general and in analog sytems general happens at the cost of display artifacts under particular conditions: while interlacing is usually not particularly visible on TVs, specific types of resulting problems are brought out by things such as fast pans (particularly horizontal movement), sharp contrast and fine detail (such as small text with serifs, computer output to TV in general), shirts with small stripes (which can be lessened with some pre-broadcast processing).
Interlacing is also one of a few reasons that a TV recording of a movie will look a little less detailed than the same thing on a (progressive) DVD, even when/though DVDs use the same resolution as TVs (other reasons for the difference are that TV is analog, and involves various lossy steps coming to your TV from, eventually, the original film material).
For notes on telecined content, see below. For now, note also that interlacing is a technique applied at the playing end (involves taking an image and playing it as two frames), while telecining just generates extra frames initially, which play progressively (or have inverse telecine applied to it; see details).
Note that when a storage format and player are aware of interlacing, it can be used smartly again. For example, DVDs may mix progressive, telecined, and interlaced behaviour. The content is marked, and the player will display it accordingly. Interlaced DVD content is stored in the one-image-is-two-frames way, and the player will display it on the TV in the double-framerate-half-updates way described above.
Deinterlacing takes interlaced material and produces a progressive-scan result.
Often applied to reduce interlacing's visual artifacts, particularly jagged edges (a.k.a. saw tooth edge distortion, mice teeth, combing, serrations), less noticeable, for display or for (re)compression (as lossy video compression that isn't counting on interlaced video deals with it quite badly).
Note that in some cases deinterlacing reconstructs the original. In most cases it is an inherently lossy process, in that it throws away data and isn't exactly reversible - but may be worth it perceptually.
There are also smarter and dumber ways of doing deinterlacing (most of those detailed below are the simple and relatively dumb variants), and the best choice depends on
- whether you are doing it for display or storage/(re)compression
- whether you displaying on a CRT (phosphor and scanline) or something else with output restrictions, or a digital display (which tends to have few restrictions)
- the nature of the video before interlacing (are the two fields camptured at different times or not?)
You may like to know that:
- Analog TV has to adhere to broadcast standards from the 1930s and is interlaced
- ...but whether the two fields in a frame are from different times varies
- in PAL countries most films are not shown interlaced - the two fields come from the same film frame (the 25/24 discrepancy is fixed by speeding up the film that much)
- in NTSC countries, film is likely to be telecined (similar to interlacing)
- sports and such is taken at 50/60fps and sent as that many fields (so shown as that many interlaced half-frames), as it looks smoother.
- different types of camcorders may store half-height interlaced video, or may store progressive frames (depends on goal an quality).
- TV capture cards usually take the TV signal as-is, so tend to return still-interlaced video (verify)
For the examples below, consider two video sources transmitted through interlaced broadcast:
- video A: a film at 25 frames per second. Two two fields complement each other for the exact original frame
- video B: sports footage, or video from an interlacing camcorder. The 25 frames per video second has its two fields from video taken 1/50th of a second apart.
The text below will mix 'field per second' and 'frame per second' (and sometimes fps for progressive, where there is no difference), so pay attention and correct me if I get it wrong :)
Weaving creates progressive frames by showing both fields at the same time, still interlaced.
This can also be described as 'doing nothing', other than copying each line from the alternate frame. It's simple to do, and it's fast. Technically it retains all the video data, but in a way that doesn't look good in various practice.
Weaving video A means we reconstruct the video at its original resolution and frame rate, and can show it as such digitally.
Weaving video B means we construct a 25fps video with jagged lines (when there is fast movement). On digital display this tends to be more noticeable than on CRTs, so for some purposes you would be better off with any deinterlacing that tries for 50fps output. In particular video compression, as most codecs don't deal so well compressing such one-pixel details, so this will typically give lower-quality encodes (for same-sized output).
Digital capture from interlaced media (e.g. a TV capture card) will often capture in a way that effectively weaves frames, which is why you get the jagged-movement effect if the video is never deinterlaced.
When displaying video on a device with too little processing power, an easier-to-code and faster method is discarding every second field (a.k.a. single field mode) and drawing lines from the other twice (line doubling), to get output that has the same size and same framerate as the original.
On video A we throw away half the vertical detail (and produce 25fps video).
On video B we throw away half the vertical detail and half the time material as well (and produce 25fps video).
Compare with bob. (basically, discard is half a bob)
You can use all the data by blending (a.k.a. averaging, field combining) both fields from a frame into a single output frame.
For video A, you would produce an (unecessarily) blurred version of the original. Weave may be preferable in various cases.
For video B you would create a 25 frame per second version with less jagging, but motion will have a sort of ghosting to it. Better than weave in that the jagging isn't so obvious, but everything will be blurred.
(Note that sizing down video has a similar effect, and can even be used as a quick and dirty way of deinterlacing if software does not offer any deinterlacing options at all)
Bob, Bobbing (also 'progressive scan', but that is an ambiguous term) refers to taking both fields from the frame and displaying them in sequence.
Video A would become line-doubled 50 frames per second. Stationary objects will seem to bob up and down a little, hence the name. You're also doubling the amount of storage/bandwidth necessary for this 25fps video while reducing its vertical detail.
Video B would be shown with its frames at its natural fluid 50 frames per second (note that most other methods would create 25 frame per second output). (note that if you take the wrong lines first, you'll put the in the wrong order and the video looks a bit twitchy)
Bob-and-weave and optional cleverness
Bob and weave refers to the combination of bob and weave.
Telecine is a portmanteau of 'television' and 'cinema' and refers to the process of converting video between these -- though its methods are more widely applicable than that.
It is perhaps most often mentioned inthe context of frame-rate conversions from film (often 24fps) to NTSC television (30fps, adding intermediate frames) or PAL television (25 frames/s, often by playing the frames at 25fps and the audio a little faster)
Frame rate conversion from 24fps film to 30000/1001 broadcast NTSC is usually done using 2:3 pulldown, which uses a constant pattern of interlace-like half-updating frames and some duplication to end up with ~6 additional intermediate frames per second. Since you end up with 30 frames for a second, which look approximately the same as the 24fps original, the audio speed can stay the same. It still means leaving some frame content on-screen longer than others, which is visible in some types of scenes. For example, a slow smooth camera pan would be shown with a slight judder after telecining.
Telecining is inversible, in that you can calculate the original frames from a stream of telecined frames (though when telecined content is spliced this may mean a dropped frame or two).
When the display is not bound to a 30fps refresh rate, you could apply inverse telecine to yield the original 24fps content - which can make sense whenever you can play that back at that rate, since it removes the judder. Inverse telecine is also useful when (re)compressing telecined video, since many codecs don't like the interlace-like effects of telecining and may deal with it badly.
Hard telecining' refers to storing telecined content e.g. 30000/1001 for NTSC generated from 24fps film content, so that the given content can be played (progressively) on NTSC equipment. The upside is that the framerate is correct and the player doesn't have to anything fancy, the downside is that it usually has negative effect on quality.
Soft telecining refers to storing video using the original framerate (e.g. 24000/1001 or 24fps film) and flagging the content to be telecined, so that the eventual player (e.g. box-top DVD players) will telecine it to 30000/1001 on the fly. NTSC DVDs with content originating from 24fps film usually store 24000/1001fps(verify) progressive, with pulldown flags set so that the DVD player will play it as 30000/1001fps.
Note that (particularly digitally stored) video may contain mixes of different frame rates in a single video, and may mix progressive and telecined, progressive and interlaced, and sometimes even all three.
You generally want to figure out whether a video is progressive, interlaced or telecined. One way to do this is using a player that allows per-frame advancing (such as mplayer). Make sure it's not applying filters to fix interlacing/telecining, find a scene with movement (preferably horizontal movement/panning), and see whether there are interlace-style jaggies.
- If there are none, it is progressive (or possibly already deinterlaced by the player)
- If there are in every frame, it is interlaced
- If there are in only some frames, it is telecined (two out of five in 2:3 24-to-30fps telecine).
Note that things like credits may be different (apparently often telecined on DVDs).
While telecining uses regular patterns of extra frames, splicing after telecining means the video will usually not follow that pattern around a splice, meaning that inverse telecine may not be able to decode all original frames. This is often the cause of encoders/players complain about a few skipped and/or duplicate frames in a movie's worth of frames, and you can ignore this - hardware players do the same.
(Analog) TV formats
There are a number of variants on NTSC, PAL and SECAM that may make TVs from different countries incompatible. NTSC is used in North America and part of South America (mostly NTSC M), and Japan (NTSC J).
PAL is used in most of Europe, part of South America, part of Africa, and Asia. SECAM is used in a few European countries, part of Africa, and Russia.
PAL M (used in Brazil) is an odd one out, being incompatible with other PAL standards, and instead resembling NTSC M - in fact being compatible in the monochrome part of the NTSC M signal.
CRT TVs often support just one of these, as it would be more complex to receive and convert/display more than one, and few people would care for this feature.
It should be noted that the actual broadcast signal imagery uses more lines than are shown on the screen. Of the video lines, there are fewer that are the raster, the imagery that will be shown on screen.
- 525-scanline video (mostly NTSC) has 486 in the raster, and many show/capture only 480(verify)
- 625-scanline video (mostly PAL) has 576 in the raster
The non-raster lines historically were the CRT's vertical blanking interval (VBI), but now often contains things like teletext, closed captioning, station identifiers, timecodes, sometimes even things like content ratings and copy protection information (note: not the same as the broadcast flag in digital television).
Video recording/capture will often strip the VBI, so it is unlikely that you will even have to deal with it. Some devices, like the TiVo, will use the information (e.g. respect copy protection) but do not record it (as lines of video, anyway).
Devices exist to add and alter the information here.
PAL ↔ NTSC conversion
Note that various DVD players do this, although others do not, and neither the fact that they do or that they don't is necessarily advertized very clearly.
PAL to NTSC conversion consists of:
- Reducing 625 lines to 525 lines
- creating ~5 more frames per second
NTSC to PAL conversion consists of:
- increasing 525 to 625 lines
- removing ~5 frames per second
In general, the simplest method, that cheaper on-the-fly conversion devices often use, is to duplicate/omit lines/frames. This tends to not be the best looking solution.
Linear interpolation (of frames or lines) can offer smoother-looking motion and fewer artifacts, but are more computationally expensive, and have further requirements - such as working on deinterlaced content.
Fancier methods can use things like motion estimation (similar to fancy methods of deinterlacing)
Digital / HD broadcasting
ATSC in the US, DVB in Europe
On types and groups of frames
In MPEG-like codecs (DivX/Xvid, H.264, and more), there is a choice to encode each frame as a/an...
- I-Frame (inter-frame)
- a frame you can decode purely from its own data
- larger than predictive (B- and F-)frames when B/F can predict differences to adjacent frames fairly well (e.g. motion and such), which is usually. Yet I-frames are preferable when predictive frames don't work so well, such as camera cuts. Encoders will typically choose I-frames for those.
- having regular I-frames helps faster seeking, because seeking generally means "look for the most recent I-frame, decoding all video until you reach the requested frame"
- P-frame (predictive)
- uses (motion prediction) information from previous frame
- ...which is typically less information than a complete I-frame, so makes for better compression
- B-frame (bidirectional predictive)
- uses (motion prediction) information both from previous and next frame
- which ends up being less information than forward-only prediction, does better on slow motion
- more complex to decode
There also used to be D-frames, in MPEG-1, which were basically low-quality easy-to-decode I-frames, which allowed fast preview while seeking.
A GOP (group of pictures) is a smallish group of adjacent frames that belong together in some sense.
A GOP starts with an I-frame, and may contain P-frames, B-frames, and technically can also contain I-frames (a GOP is not defined by position of I-frames, but by being marked a new GOP in the stream. That said, multiple-I-Frame GOPs are not common)(verify).
Most encoders use I-frames (so start a new GOP)...
- when it makes sense for the content, such as on scene cuts where a predictive frame would be low-quality
- once every so often, to guarantee seeking is always fastish by always having a nearby I-frame to restart decoding at
- other frame-type restrictions. For example, encoders are typically limited to use at most 16 B-frames in a row, and at most 16 P-frames in a row (which gets more interesting when you mix I, P, and B-frames)
You could also have GOP-less (/size-1-GOP) video, that is, video that consists entirely of I-frames. This is very inefficient use of space, but means all seeking is pretty immediate, so is nice when editing video and when you want fast frame-by-frame inspection in both directions.
A closed GOP is one that can be decoded completely without need of another GOP. This is contrasted with an open GOP, which ends in a B-frame so which needs to look into the next GOP's first frame (an I-frame).
Open GOPs make for slightly more efficient coding (simply because you're using a little bit more predictive coding). Closed GOPs make for easier/faster frame-by-frame editing, faster playing in reverse, faster DVD angle switching(verify). (It also seems to be compatible with more players (divx5, ffdshow), though that may now be an outdated statement)
A small GOP size (something like 5 or 10 frames) means seeking is faster, since the nearest I-Frame is closer so fewer frames need to be decided to get to the one you wanted.
For a player that only ever seek to the nearest I-frame (because the above is likely to just pause), more i-frames means more accurate seeking. For example, the specs for DVDs apparently says 12 frames max, which is about half a second worth of video.
Allowing large GOPs (say, 300 frames, ~10 seconds) makes for slightly more efficient coding in the same amount of bytes (because you have a few more predictive frames where you would otherwise force I-frames), but it's a diminishing-returns thing.
When handing options to encoders, note there are often multiple interacting features, policies, strategies and such. Specifying just GOP size may not be enough for what you want.
To inspect what frame types a file has:
- People mention ffprobe -show_frames but this seems to be a deprecated option.(verify)
- libavcodec has a vstats option, which writes a file in the current directory with frame statistics about the input file. For example:
mplayer -vo null -nosound -speed 100.0 -lavdopts vstats input.avi
(Without -speed it seems to play at the video rate, though, and there's probably a way around that better than -speed)
Some notes on aspect ratio
Display Aspect Ratio (DAR) means "this ought to be shown at this ratio". Example: 16:9. This is information that some files can store to tell the player to do so (or that standards can imply for all files of a type).
DAR matters allows more arbitrary aspect ratio in the actual pixel dimensions - for example, SVCDs are 480x480 (NTSC) or 480x576 (PAL/SECAM) -- but store content meant for 4:3 or 16:9 display. Which is one way to store more vertical than horizontal detail. SVCD players will rescale this to some resolution that has the proper aspect ratio at play time (usually just fitting it in the largest non-cropping size for the given TV resolution(verify)).
This works because MPEG can store aspect ration information, so hardware players and most software players listen to and use it. Not all (software) players understand it in MPEG4 files yet, though.
AVI (and any content stored in it, including MPEG4) does not support it -- but the opendml extension that does allow it is now fairly commonly used. Not all players know about opendml, though most that matter do.
When encoding, the best choice depends on what you want to play things on. The most compatible way is rescaling so that square pixels would play correctly. However, this usually means smallish resolution changes which can look like mild blurs .
Some notes on frame rates
See also Video#Frame_rate.2C_analog_TV_format.2C_and_related for basic notes on frame rate.
References like 480i and 720p became more commonly used in the era commonly known as HD (now), partly just because it's brief and accurate.
These references are not often seen alongside monitor resolutions, perhaps because "720p" and "1080p HD" is easier to market when you don't realize that's about as good as a decade-old monitor. Except on a bigger screen (for TV image quality, the pixels make much less difference than the new digital content/broadcast standards that came along with them).
References such as 480i and 720p refer to the vertical pixel size and whether the video is interlaced or progressive.
The common vertical resolutions:
- 480 (for NTSC compatibility)
- 480i or 480p
- 576 (for PAL compatibility)
- 576i or 576p
- always 720p; 720i does not exist as a standard
- 1280x720 (sometimes 960x720)
- 1080 (HD)
- 1080i or 1080p
- usually 1920x1080
There are some other newish resolutions, many related to content for laptops/LCDs, monitor/TV hybrids, widescreen variations, and such.
HD TV broadcasts are often either 1080i or 720p. While 1080i has greater horizontal resolution (1920x1080 versus 1280x720), 720p does not have interlace artifacts and may look smoother.
The 480 and 576 variants usually refer to content from/for (analog) TVs, so often refer to more specific formats used in broadcast.
- 576 often refers to PAL, more specifically:
- analogue broadcast TV, PAL - specifically 576i, and then often specifically 576i50
- EDTV PAL is progressive, 576p
- 480 often refers to NTSC, more specifically:
- analogue broadcast TV, NTSC - specifically 480i, and then often specifically 480i60
- EDTV NTSC is progressive, 480p
- 486 active lines seems to refer to older NTSC - it now usually has 480 active lines
There is more variation with various extensions - widescreen, extra resolution as in e.g. PALPlus, and such.
Sometimes the frame rate is also added, such as 720p50 - which usually refers to the display frequency applicable.
In cases like 480i60 and 576i50 you know this probably refers to content from/for NTSC and PAL TV broadcast.
On TV horizontal pixel resolution
For analogue TV, pixel-per-line resolution is not really set in stone. Because of the way the signal is used, anything above ~500 or so looks good enough.
- Cheaper NTSC CRTs couldn't really display more than 640, cheaper PAL CRTs (verify)
- 720 was treated as a maximum (fancy editing systems of the time supported it)
- 704 is a sort of de facto assumption of the average that TVs tend to display(verify), and is also what EDTV uses (704x576 for PAL and 704×480 for NTSC)
- NTSC can be 720x480 or 704x480, or 640x480,
- PAL can be 720x576 or 704x576,
depending a little on context.
On digital broadcast, a stream has a well-defined pixel resolution, but since the displays are more capable, they are typically quite flexible in terms of resolution and frame rate.
Relevant acronyms here include
- ATSC (digital broadcasting in the US, replacing analog NTSC)
- DVB (digital broadcasting in Europe, replacing analog PAL)
- EDTV (sort of halfway between basic digital broadcast and HDTV)
See e.g. the image on http://en.wikipedia.org/wiki/Display_resolution
Screen and pixel ratios
Video capture hardware
Video editing hardware
(What you may want to look for in) more-than-webcam software
Mainly for editing
Mainly for conversion
Some specific tools