Notes on encoding video
The below focuses on mencoder, and on ffmpeg,
specifically their CLI arguments
- When mentioning options, the first (e.g. subq=7) is the mencoder form, the second (e.g. -subq 7) is the ffmpeg form. (Note that sometimes there are multiple ways to call the same encoder (e.g. x264 executable parameters) and sometimes the parameters are a little more extended, but that'd just be endless...)
See also Video for some more general technical notes related to video files.
Most of the code is in libraries dealing with codecs, containers, conversions, etc.
ffmpeg is a relatively thin CLI around it.
A lot of other video-related projects (e.g. mencoder, VLC)
use it for much of what it can do,
some of which use it somewhat transparently (see e.g. mencoder -lavc and -lavcopts which are mostly passed through verbatim)
some of which augmenting it with other things (see e.g. this for VLC).
Notes on...
"Give me the best options"
There are perhaps four main interests:
- video quality,
- time spent encoding,
- eventual file size (or (average) bitrate given a fixed length), and
- whether it should play everywhere without hiccups (codec availability, predictable decode resource spikes) - for certain values of everywhere:
- on something minimal (set-top box, chromebook, raspberry pi - many have some video decoding support, but.),
- or on a decently powered media center
- or maybe just on your extra fancy overclocked PC
These are potentially all at odds with each other,
and default encoder settings tend to be biased to the 'plays on most hardware' end,
at the cost of some quality and/or space.
...meaning that for specific cases, you can make more suitable tradeoffs.
But only so much, and also it depends.
People tend to develop a few general tactics, such as
- for video I'll archive (e.g. my project renders), I have an extra hour, it it makes a significant difference in size
- Renders for clients or youtube - throw more bitrate at it than actually necessary. Clients like the idea of highest quality, and may well recode themselves too,
- additionally, you could opt for some simper codecs which render faster at somewhat higher size
- (I've seen people throw a factor ten higher than necessary at renders, "just to be safe". I noticed because it was then completely impossible to play on a raspberry Pi until I recoded it down to the bitrate that it probably originally came from)
You can have, or introduce, a lot more constraints or tradeoffs. Consider:
- If you want to be sure something plays on a hardware DVD/DivX player (which are now quite oldschool, and actually never too common), there are some detailed quality-squeezing options to avoid - options that may lead to smaller encodes but also lead to spikes in requite calculation (and/or bitrate), which limited-power hardware can't deal with. But you don't have to worry about this at all on computers (except perhaps for high-bitrate HD content).
- Encoding for a standard DVD-Video sets a specific codec, a bitrate limit, a size limit, so there is relatively little to choose
- and some loss in quality may be unavoidable
- in video editing jobs, you want seeking be fast.
- Animators who study movement (skipping back and forth between frames) will also love you for this.
- ...because on all space-efficienct codecs, seeking back means seeking back to the most recent complete frame and decoding all the differences-to-the-revious frame.
- this is one reason there are some 'editor codecs', also low-compression (may not use predictive frames at all - which is factors larger but a lot snappier)
- video streaming is served by simpler encode and decode/playing (lower latency easier to achieve)
- on a LAN, e.g. on fixed stage setups, you can even to throw lots of bandwidth at it because that one cable can carry it anyway
- on the internet, you probably want something that acts bandwidth-and-CPU-capped.
- you probably want a seconds or two of latency, in that this delay is much more acceptable than stuttering
- complexity of decode - While VLC on a powerful computer plays almost anything, do you also want to play it on smartphones, tablets, a decade-old set-top box? This implies some extra constraints to avoid decoding problems. (This often implies simpler encodes, which will need higher bitrates for the same quality)
- Doing a quick recode from an unusual codec to something that will play on a simple player, and can be thrown away afterwards.
- This often means that you want to keep quality and don't care about size much, and encode time is only limited by your patience. Using a fairly high bitrate is easiest.
- how long do you want the encode to take?
- If it can be twice the size, you can often shave off 30% off the time just by making it not try so hard
- Squeezing out the last few possible percents of objective-quality-per-space can take hours more work.
- most people are not bothered about disk use these days
- Encoding a movie to a fixed size (such as a DVD), and look as good as possible
- You're probably willing to spend more encoding time when it means a noticeable quality improvement
- ...which means you probably want most of the try-harder options
- bothering about a few details can help squeeze out a little more (but at some point becomes a little futile)
- is the input video noisy?
- You'll probably need more bitrate for the quality you'ld usually expect
- ...and is it anime or other simpler shading?
- That's usually low framerate meaning you don't need as much bitrate
- plus you may be able to get away with noise reduction (where in photographic video that will quickly look plasticy or plain ugly)
- Does the video contain interlaced or telecined content?
- You probably want to tell the codec this, or if you can't, convert it to progressive before handing it to the encoder
- ...because giving interlaced/telecined content to a codec that assumes input is always progressive means you lose quality trying to fix all this line-to-line tearing (it's just weird high frequency detail). If this happens, no other encoding options (that are not "deinterlace this") will help you.
- This is relevant to most analog-TV captures, many DVD rips, and some sources of digital video.
There are some further options that may sometimes help, but may have no effect and/or even may cause problems.
For example, in some cases noise shaping and/or noise reduction may lets the encoder focus on details rather than on noise -- but overdoing it can lead to blurriness and very visible artifacts.
On ffmpeg/avconv
If you're like me, you though that the ubuntu packaging meant that libav / avconv was a rebrand/replace of ffmpeg.
This is because ubuntu's packager is in the libav camp - in reality libav is a fork that originated in developer drama - some understandable, some justified, some not so much, and ending in childish fallout.
As things currently are, ffmpeg and libav are distinct projects.
Both ffmpeg and libav are actively developed, both share a large codebase, both still implement mostly the same features and APIs, there is cross-pollenation, and they tend to adopt most of each other's code.
The optimist may say the competition has spurred both development and code cleanup, though minor divergence in the detail is occasionally a pain, for developers and users alike.
The two will probably stay nearly identical, though it's unclear to what degree they will be kept in sync in the long run, or when (or whether) they may reconcile.
Some reading:
- http://blog.pkh.me/p/13-the-ffmpeg-libav-situation.html
- https://github.com/haasn/mpvhq-old/wiki/FFmpeg-versus-Libav
- the forums, if you have popcorn ready
Codec choices
bitrate
If you care about the best tradeoff between size and qualitym this depends on the content and some personal preference.
A given video has its own level of complexity, which varies throughout.
Modern codecs usually spent their bits more efficiently, but you usually have few options.
Once you choose a codec, bitrate is the primary constraint on quality.
Choose a bitrate that is too low for the given content, and no amount of clever options will help you preserve quality. (too high and you're just wasting space)
To give an idea of bitrate that video may need - and of variations with codecs:
- encoding from DVD, SDTV (a.k.a. pre-HD TV) (~500kpix)
- in DivX/XviD (and other variants of MPEG4 ASP) you can get decent quality with 700 to 1000kbit/s.
- in H.264 (MPEG4 AVC), the same content can often be compressed in ~600kbit/s at comparable quality. People regularly opt for somewhat higher bitrates to get nicer quality without worry.
- encoding to standard DVD-Video discs must use MPEG-2, but are typically at least 4.3GB large (DVD5 discs). For 90 minutes you can spend 6000kbit/s on average. This seems like a lot, but MPEG-2 doesn't code as well as newer codecs and often needs at least 3000kbit/s to get consistently decent quality, and 6000kbit/s or more on some complex scenes.
- HD content
- the amount of pixels are a few multiples higher than SD.
- Yes, they are more redundant, but it also matters that there are multiples more of them.
- As such, people often opt for H.264, because it scales a little better.
- 720p is youtubish at 1000kbps, decent at 2000kbps
- Complex shiny 720p or 1080i/p video may need on the order of 4000kbit/s
When quality is more important than file size, you could just throw a large bitrate at it.
Encoders will easily fit quality in a larger bitrate (and may spend less time since they don't have to work as hard).
For either Xvid or H.264, <2500kbit for SD content and <10000kbit for HD content will usually look quite good.
If size matters, then spending a little more time on getting similar quality from half that bitrate sounds like a good idea. You may find yourself doing test encodes just to see whether a particular bitrate (plus options) looks good enough.
rate control
Bitrate roughly means "the space spent on encoding a given length of video (or audio)," and is typically an amount per second of video.
rate control controls how it is spent.
- Constant Bitrate (CBR) means 'this is how much to spend per frame'. It still varies, but little.
- Variable Bitrate (VBR) means 'vary bitrate in reaction to content complexity'. There are multiple ways how.
(Note also that this is one of the major influences on how video can stutter (predictability of decoder resource use))
There are generally four major variants of the CBR/VBR choice:
- CBR (one-pass)
- typically means "spend this much per frame, just do your best". You have to pick a bitrate
- ...that is high enough for good quality throughout the video - or be okay with the complex parts encoding poorly.
- ...or pick it higher so that the most complex parts will be okay (and be okay with spending a bit much on the simple parts)
- The size of the result is picked_bitrate*running_time, to within a small error.
- Quality-per-size ratio will be lower than with VBR. But simpler to encode, so convenient for e.g. streaming.
- one-pass aim-for-bitrate VBR - given a target bitrate (and often a maximum), try to spend the target bitrate, but spike up to the maximum bitrate when it seems good for quality.
- Resulting file size can be guessed, though input complexity will still vary
- You can constrain this. Note that the more you do this, the more this becomes like CBR
- tends to increase and decrease bitrate within a timespan of seconds (not always ideal)
- requires you to make a good guess of the bitrate necessary for each video (takes some intuition training)
- multiple-pass aim-for-bitrate VBR
- given a target bitrate (and often a maximum), use one (or more) encodes passes to figure out how to spread those bits around for the most consistent quality
- can vary the bitrate more quickly than the above - and for better reasons.
- average bitrate will often end up closer to the requested rate
- quality-based VBR (one-pass)
- you ask for a particular quality per frame.
- Easily creates bitrate spikes on complex content. You don't really know the resulting filesize beforehand.
- An easy alternative to CBR when you want high-quality and don't mind spending a bit more space, in a more justified way than just throwing a large bitrate at it
- can make sense for streaming encodes
- There is a further distinction between whether the quantizer is fixed or not:
- constant quantizer (CQP, for Constant Quantizer Parameter)
- Similar degree of compression is applied to all frames, regardless of content. Bitrate will vary because contents do.
- you can ignore CQP, because CRF does something similar, and usually does it better
- constant ratefactor (CRF)
- will vary QP - around your given target but spend more on still frames and less on motion
- ...which tends to give the impression of better quality, even if PSNR and such wouldn't agree
- constant quantizer (CQP, for Constant Quantizer Parameter)
- There is a further distinction between whether the quantizer is fixed or not:
Other notes:
- The resources required by the player are easiest to predict via CBR - either it's too higher or it isn't.
- (note that certain 'try harder' options are also part of this trouble)
- VBR variants with strong bitrate spikes (e.g. n-pass, quality-based VBR) may stutter stutter on underpowered hardware
- which isn't as relevant for SD content anymore, but is for HD
- and you can control this with e.g. some contraints
- When your most important factor is:
- target size: multi-pass VBR
- quality guarantee: quality-based VBR
- realtime encoding: CBR is easiest, either one-pass VBR may code more efficiently
- ...but defaults to CBR, because it often takes less CPU, and it is easier to guarantee any content can be handled without stuttering (by the encoder and decoder).
- ABR, Average BitRate is VBR that tries to end up using the given average bitrate
- Which can mean the one-pass and multi-pass variants, depending on context.
- Some people are quite consistent with the term, I treat it as ambiguous and avoid it.
- for quality-based VBR, the scale of the value handed
- is different between x264, and CQP in MPEG1/2/4
- does not have a direct relation to bitrate (and observed behaviour has previously changed in development)
- Example: -x264encopts crf=23, -crf 23
- Lower value is better quality.
- Currently, for DVD/SDTV resolution, 26 is probably comparable to your average downloaded movie (~700kbit?), 22 is significantly better (~1.4mbit?), 18 is near-lossless (~3mbit?)
- values are technically floating-point, but integers are exact enough for most people
- Lower value is better quality.
- Quality estimations are often more mathematical than visual.
- In particular, noise in the source video has a very real effect
- ...though you wouldn't always agree with the quality judgment even without noise.
On video that plays everywhere
Define everywhere.
- Codec-wise
For example, if you want to be sure video plays on every computer that hasn't been updated for a decade, is the most plain installation (no extra codecs, no VLC), then you have to resort to old versions of some common codec. `(MPEG-2 may be a decent bet - but you'ld need a considerably larger bitrate for comparable quality)
Hardware players have few options. A set-top DVD player may play DivX/XviD. Only recent stuff will attempt H.264. But in nether case all videos - you often need to observe standards complicance and avoid bitrate spikes.
- When playing on computers you can get away with caring a lot less
(On standards complicance: In particular MPEG4 ASP has seen many implementations, including early MSMPEG4, DivX, Xvid, and more. Some encoders and some decoders aren't very compliant, so there are always options you should avoid if you want it to play on this sort of hardware. For H.264 things are simpler; the main worry is resource draw.)
When you're encoding to play on a decently powerful computer, and can count on a relatively recent and updated OS, (and particularly you can tell people to install VLC, and/or mplayerc and a codec pack), then you can more or less do encode however and to whatever you want.
- Resource-wise
Decoding video takes variable amount of resources for each frame, and so the resource draw varies over time.
This is technically true even for CBR, but that case is pretty predictable (and there may be specs that guarantee playability).
With VBR, the resource draw of decoding is higher than CBR in general, and also correlates strongly with bitrate. If the decoder is not fast enough to do the work for a frame in real-time, it will stutter, drop frames, or do other ugly things.
When encoding for players with limited resources (DVD players that do divx and/or H.264, old computers repurposed for movie watching, and very-high-resolution HD even on modern computers), you can add some constraints, to help ensure it will play with more limited resources. This comes at a cost - the same quality will take more space.
H.264 has made the tradeoffs somewhat more explicit, through its levels (see e.g. [1] and profiles.
On video editing
Progressive and not
Interlaced is useful for TV broadcast, and little else.
Encoding often wants progressive video.
If your source is not progressive, you want to make it that.
If it comes from a DVD it may be almost any mix of telecined, interlaced, and possibly progressive content, all sliced together.
If the video you hand in are interlaced (such as video from TV capture cards, which usually place two adjacent frames into one progressive frame, because that's how they receive it), or are telecined (which, roughly, is framerate adjustment by doing interlacing only occasionally - common on NTSC movie DVDs), then the frames being fed to the codec will easily show very sharp line-by-line sharpness vertically, particularly in high motion scenes. Codecs that assume progressive input will spend a lot of space on what from that perspective is video detail.
So you usually want to decode the content into progressive frames. Yes, de-interlacing is a slightly lossy process, but not as bad as you think, and much better than the a codec that assumes progressive frames.
Telling the thing what you want
Mencoder and ffmpeg
Note: the commands avconv and ffmpeg are the same thing - consider it a name change (actually dev drama, don't ask).
You can see both mencoder and ffmpeg consist largely of:
- a bunch of optional video filtering and other processing
- calls to libraries handling the specific codec you are writing
The libraries both use overlap a lot, so result is often similar or identical, but the parameters to each command are different.
Because of this, most of this page mentions both argument styles.
- Note that some arguments may not apply to the codec/library you are using. When in doubt, look at the docs.
For example, to do a conversion to DivX-style MPEG, aiming for 800kbps:
avconv -i input.mpg -vcodec mpeg4 -b:v 800k output.avi
mencoder input.mpg -oac copy -ovc lavc -lavcopts vbitrate=800000 -o output.avi
Note that these two tools have different defaults for other options, so the output will probably not look identical.
for divx/xvid
Bitrate has it has a default, but not a smart one, so you probably wanto to specify it
Order of magnitude: For much DVD/TV-sized video (~500Kpixels), 800k which is okayish with a fast encode, and fairly decent when you use all basic try-harder options.
Try-harder options:
The basic improvement that you almost always want (cheap and noticeable) is at least:
trell:mbd=2
It seems many people look through the docs for the 'gives decent improvement at moderate cost' notes, and most settle on a set like:
trell:mbd=2:mv0:v4mv:cbp:dia=2:predia=2:last_pred=3:cmp=2:precmp=2:subcmp=2:vmax_b_frames=2:vb_strategy=1
Some people like to add preme, and some play wih qns. It's an endless game of fine tuning, worth it for a few cases and less so in others.
For the below:
- that's the mencoder and ffmpeg/avconv options respectively (TODO: add again)
- the mentioned values are biased to give better-than-naive-default quality, while avoiding unreasonable speed/quality tradeoffs
The basic 'do more work for more quality' options:
- trell, -trellis 1 - do more work looking for choices that minimize quantization errors. Somewhat slower and noticeably better encodes, and one of the easiest ways to lessen the blocky look. (TODO: check whether this is on by default)
- TODO: (verify) that these are identical
- cbp, -flags cbp - related to block decisions. Small quality gain at a small speed cost, so generally worth it. (combines with trell - considers both bitrate and distortion(verify))
- note: cbp seems deprecated in ffmpeg, figure out(verify)
- mbd=2, -mbd rd - control how the encoder decides the macroblock mode
- 0 (default) means 'use method specified by mbcmp', 1 means 'try all and optimize for size', 2 means 'try all and optimize for quality' (rate distortion). 0 (simple in ffmpeg) is fastest, while 2 (rd in ffmpeg) and 1 (bits in ffmpeg) tend to be decent tradeoffs.
- Use of mbcmp, precmp, subcmp, cmp, and also qpel will override the method specified by mbd (verify)
Motion estimation related:
- mv0, -flags mv0 - macroblock decision tries more options. Small cost, small gain.
- v4mv, -flags mv4 - allow 4 motion vectors per macroblock (in MPEG4). Small quality gain, small speed cost. Seems to combine well with mbd 1 and 2.
- cmp=2 subcmp=2 precmp=2, -cmp satd -subcmp satd -precmp satd
- comparison function for motion estimation searches, respectively for full-pel, sub-pel, and pre-pass
- People seem to like 2 / satd
- dia=2 predia=2, -dia_size 2 -pre_dia_size 2
- motion detection diamond size and shape.
- 1 is default
- 2 looks further/harder so is slower, and does better in relatively few situations. (There are also some options that make for faster, lower quality encodes)
- last_pred=2, -last_pred 2 - control how many motion predictors from the previous frame are used. Default is 0
- you can choose 1, 2, or 3 for slower encodes and often better quality.
- People seem to argue whether 3 is worth the extra time, over 2
- preme=2, -preme 2
- when to do a motion estimation pre-pass. 2 means always, the default 1 means only after i-frames. Has fairly little effect.
- qpel: use quarter-pixel motion estimation. Doesn't really help for lowish bitrates, though may help a bit for higher bitrates.(verify)
- some hardware players do not support this. For compatibility, leave it off.
You can fix the quantizer -- but it's not really VBR as you still have to decide the target bitrate(verify)
You'll want to know about:
- vqmin= and vqmax= (ffmpeg: -qmin and -qmax) seem to clamp the quantizer in a range
- in other words, you can use a higher vqmin to lower the quality and CPU use, or use a lower vqmax to try to force
- 2 is the lowest you would use; 1 is not worth the higher bitrate
- vqscale=, -qscale - seems to be a shorthand for setting both vqmin and vqmax to the same value (verify), i.e. fixed quantizer, but no variation here seems to make little sense (average within a frame will often be better than constant within a frame)
There is no CRF behaviour available.
Other interesting options:
- threads=auto, -threads 0 (or a number. Default is 1) - More threads makes encodes faster on multicore CPUs, by parallelizing calculation of motion estimation. Hurts that estimation's quality a little bit, while making encodes noticeably faster.
- turbo - sets a bunch of options for a fast, lower-quality encode. Useful for the first pass in 2-pass ABR encodes, where the encoding is only there to estimate complexity
- Exact details seem to vary and may have changed over time. It does something like setting subq=1, frameref=1, setting the simplest/fastest options for cmp, dia/predia, disables qpel, mv4, trellis, cbp, mv0, and noise shaping/reduction.
- ffmpeg seems to have no equivalent, though you could just manually set all these.
- Depending on the present noise and other graininess, whether you have smooth or frame animation (e.g. cartoons, anime), photographic film or cel-like look, and how the specific codec deals with these things, you may wish to experiment with:
- qns=2, -qns 2 - Noise shaping, which can hide ringing artifacts. Can help perceptual quality (even though PSNR measurements will be lower). 2 seems a good value. Should be used on top of trellis. Slow, not necessarily worth the bother, and can sometimes look worse.
- qns=200, -nr 200 - Noise reduction. Sometimes improve perceptual quality by lessening general noise, but aggressive values (say, nr=400) may just look like an ugly selective plastic-everything blur. Avoid if not necessary.
for (lib)x264
(Values below biased towards slower, better-quality encoding without going overboard)
Further detail options
The two basic quality-for-speed tradeoffs are subq and frameref.
- frameref=4, -refs 4 - How many adjacent frames to base decisions on.
- Defaults to 1. For typical (stabilized-)camera-based video, using 2 and 3 can give noticeable improvements at acceptable time tradeoffs.
- For things like cleaned cel animation, anime, and anything else that is largely or usually very still / repeats large chunks between frames, you may see improvement up to 6.
- More means slower encode. How much depends on other options as well.
- More may also hurt CABAC coding efficiency.
- More means more memory required by the decoder
- ...particularly the last can mean it may not play play on all hardware decoders. H.264 levels) relate to this. To be relatively safe, use at most 5 for SD resolution video, 4 for HD.
- subq=6, -subq 6 - sub-pixel motion estimation quality.
- Range is 1 (fast & bad) through 9 (slow, better quality for same bitrate, but hardly worth the time).
- 1, 2, 3 are lower quality and not much faster
- ~4 and 5 are often the default
- ~6 or 7 are noticeably slower than 4 or 5 but you will still notice the quality difference (...mostly when bframes>0).
- There's little quality gain for 8 or 9
- I've seen the default mentioned as 7, 6, and 5, which is also roughly the most sensible zone.
- Interacts with frameref somewhat, in that more references combines with this option to encode slower. For higher frameref the quality increase levels off quickly, meaning that large frameref combined with large subq is rarely worth the extra time.
Also interesting:
- -x264encopts cabac, -coder 1: CABAC does data compression better than the older CAVLC. Default is usually CABAC anyway.
- You probably only use CAVLC (-x264encopts nocabac, -coder 0) when you want compliance to Baseline
- me=umh, -me_method umh - motion estimation type.
- The default, me=hex, is good.
- Encoder nerds seem to like me=umh because it occasionally does better, but it is noticeably slower. How much slower seems to mostly be correlated to frameref. (how much better also varies with that, and of course the video content). You may want to decide based on your value of frameref.
- mixed_refs: cleverer reference search. Generally gives improvements (when frameref is ≤2) and doesn't give a large speed dent.
- bframes=3, -bf 3 - max b-frame amount between I or P frames (see description above)
- As noted above, you probably want to use vb_strategy=1 , -b-strategy 1
- The encoder chooses when to use these, and it rarely uses more than 3.
- When you want to comply with Baseline, this should be 0
- b_pyramid, -flags2 bpyramid - Allow B-frames as prediction reference(verify)
- Allows better quality with slightly slower encoding and decoding. Usually worth it.
- rather-old decoders don't support this
- Only has an effect when b-frame amount is ≥2 (verify)
- weight_b, -flags2 wpred - more analysis in prediction from B-frames(verify).
- Useful, cheap, so you should use it.
- Only has an effect when b-frame amount is ≥2 (verify)
- weight_p, -flags2 wpredp - weighed prediction for P-frames. Slightly better compression, and helps coding efficiency(/quality) of fades, and not much else. The encoder itself doesn't use this much. Small speed hit, often little (sometimes no) effect. Options: 0 (off), 1 (simple), or 2 (smarter, slower). Adobe Flash's video player before 10.1 had a bug that meant use of 2 caused errors.
- threads=auto, -threads 0 - automatically choose amount of threads/cores to use. Similar story to xvid's: encoding speed scales well, hurts quality a tiny bit(verify). Default value is 1. You can hand in an integer.
- partitions=all, -partitions parti4x4,parti8x8,partp4x4,partp8x8,partb8x8
- basically "be more thorough about prediction, not just what usually works well." Sometimes does better on complex or fast movement.
- A "if you've got the time, sure" options, although the default seems to only exclude a single non-general-purpose option(verify).
- 8x8dct, -flags2 8x8dct
- Allows 8x8 as well as 4x4 DCT for macroblocks. Similar concludion to previous item.
- In x264: this one is specifically High profile, not Main or Baseline
The ffmpeg docs mention the following three option sets:
- high quality: subq=6 partitions=all 8x8dct me=umh frameref=5 bframes=3 b_pyramid weight_b
- decent quality: subq=5 8x8dct frameref=2 bframes=3 b_pyramid weight_b
- fastish encode: subq=4 bframes=2 b_pyramid weight_b
There are many more options, but for many of them the default is the best option, or their effect is too minor. If you're really really interested, go read manuals and forums.
Considering profiles and levels
There are quite a few profiles, some of which practical (fast switching between server streams), some targeted at camcorders, professional editing, mastering uses, and there's the Scalable set targeted at videoconferencing)
The more basic set of profiles includes the following:
- Baseline, Constrained Baseline (BP, CBP)
- intended use: video conferencing, low-cost mobile. In practice, things like iPods
- Constrained baseline is the set of features shared between Baseline, Main, and High
- Baseline: CBP plus some robustness, low-delay details
- CAVLC (no CABAC): nocabac, -coder 0
- No bframes: bframes=0, -bf 0
- No pframe prediction: weightp=0, -wpredp 0
- No 8x8 DCT: no8x8dct, -flags2 -wpred-dct8x8
- nointerlaced
- qp>0
- Main (MP)
- Intended use: (DVB) SDTV
- CABAC: -coder 1
- no8x8dct, -flags2 -wpred-dct8x8
- qp>0
- High (HiP)
- Intended use: (DVB) HDTV, BluRay storage
- CABAC: -coder 1
- high qp>0
Notes:
- Mobile devices of different speeds can often comfortably decode Baseline and sometimes Main, but typically not High.(verify)
- One of your choices is between Baseline for wide playability, and anything fancier which uses CABAC for an almost immediate ~20% added coding efficiency.
- The H.264 levels basically let devices certify they have enough temporary space and throughput to let it support a certain bitrate and resolution, and (effectively) -frameref choice.
- You could mention smartphones and media players to have some level. For example, AppleTV does Main profile 720p at level 3.1. General-purpose computers are usually a level above what you need.
- CAVLC (Context-adaptive variable-length coding) is supported in all H.264 profiles
- CABAC (Context-adaptive binary arithmetic coding).
- better quality than CAVLC at same bitrate
- takes more CPU at decode time
- Supported in Main profiles and higher (computer decoders understand it, not all hardware does)
- (don't confuse profiles with ffmpeg's presets)
h
some filters
Rescale filter
Scaling down means less detail. Resizes between similar resolutions (e.g. 10% difference) will mostly have the effect of a mild lowpass/blur, so while they may compress better it won't look much better. Sometimes cropping or letterboxing is a better idea.
When you want a smaller file, or half the resolution, or when target size/bitrate is a hard constraint, then resizing can be worth it, because encoding artifacts (from too low a bitrate) tend to be more visible than a resolution difference, as long as the resolution is still decent.
You can also specify the interpolation method (-sws option), though the default bicubic is often the best choice.
http://www.mplayerhq.hu/DOCS/HTML/en/menc-feat-rescale.html
Cropping filter
You may wish to crop off things like letterboxes. If a letterbox doesn't start on a macroblock edge, that will look like a hard transition to black and the codec will spend more size on it than you would care about.
For digitized stuff you may wish to cro off TV/VCR non-frame overscan noise and such.
Due to codec macroblocks, height and width should usually be a multiple of 4 or 8. Specific devices can want specific resolutions, but PC playback rarely cares.
Other filters
There are quite a few filters available, though most are not useful in everyday cases. To get a list of those you available in your installation, run mencoder -vf help.
Some of the more useful filters include those for deinterlacing, (inverse) telecine, post-processing, and de-noising, and some specific things like creating black bands for subtitles to go in. In a few cases, the same functionality can also be done by the video codec (for example, mpeg4 has ** functionality)
Use of multiple filters chains them - so order matters.
For example, to apply inverse telecine to content that may partially be progressive video, you can use -vf pullup,softskip or -vf softpulldown,ivtc=1. See [2] for more details.
harddup is interesting to mention. Some containers allow a 'the next frame is the same as this' flag, which saves space. However, this will not always play fine. The decoder might skip these and use the next stored frame, meaning it plays too fast and the audio lags behind. (These synchronization problems are apparently more likely to happen in MPEG formats)
The safer alternative is to just hand the same frame to the encoder again, to be compressed. This will take a little more space (though usually relatively little) and avoid causing the described audio/video synchronization problem.
libavcodec options worth mentioning
(...generally mentioning both the mencoder and ffmpeg argument names)
Notes:
- libavcodec shares a bunch of options between multiple encoders - in particular between Xvid and H.264 (both being part of MPEG4)
- In ffmpeg (probably mencoder too), the details in the man page may lag behind the encoder, so when in doubt, trust what ffmpeg -h says over what man ffmpeg says.
(See Video_format_notes#On_types_and_groups_of_frames for some technical background)
- -lavcopts keyint=60, -g 60 - maximum GOP size (basically "after how many non-Iframes do we force an I-frame")
- Something like 10 is good for fast seeking (though forces iframes when the content doesn't call for it)
- Something like 250 spends very few iframes unnecessarily (though can be much slower to seek)
- I've seen low defaults like 12 (possibly to comply with something?) and high defaults like 250
- I would recommend no higher than -g 90 or so - above 60 or so the space difference is negligible and the seekability difference is not.
- for fast seekability / frame-inspectable, you can force -g 1 (iframe-only)
- -lavcopts vmax_b_frames=2, -bf 2 - maximum amount of B-frames in a row
- essentially controls the choice between P- and B-frames whenever there's no call for I-frames
- encoder's choice is always adaptive
- strategy varies with codec; x264 uses at most 2 or 3 at a time, while you can easily get Xvid to generate runs of 16 (the maximum)
- at least 1 or 2 helps typically helps efficient use of space (fewer unnecessary I-frames)
- For a lot of real-world content, more than 2 B-frames doesn't actually help much
- Relatively still content (such as some anime) may benefit from 3.
- I've seen defaults mentioned as 0 or 2 or 3 (varying with codec?)
- When you use 2 or higher, you probably want to look at setting -b_strategy to 1 (or 2), particularly for Xvid
- B-frames make decoding a little slower. This is one reason that H.264 Baseline profile compliance requires you do not use them.
- 0 can also be better for slightly better compatibility (...with slow hardware and old software)(verify)
- vb_strategy=1, -b_strategy 1: encoder's strategy in I/P/B-frame choice
- 0 - use maximum number of B-frames possible (default). In Xvid this uses them even where they're not the best choice(verify), so when you set vmax_b_frames/-bf value over 2 or so you probably do not want this default
- 1 - Avoid B-frames in high motion scenes, which is better for overall quality in such scenes. (can be further tuned with b_sensitivity) Its choice is a little crude, so sometimes you want:
- 2 - try to find optimal frame-type sequence, for more efficient use of space. Significantly slower than the other options, and the gains are often tiny, so only useful when you have hard size constraints and really wish to squeeze out the most quality. (Can be further tuned with brd_scale)
For example, in Xvid...
- -bf 16 -g 16 might give:
IBBBBBBBBBBBBBBBB
- -bf 16 -g 250 might give:
IBBBBBBBBBBBBBBBBPBBBBBBBBBBBBBBBBPBBBB...
- -bf 1 might give:
IBIBIBPBPBPBPBPI...
In H.264,
TODO
Handbrake
A fairly easy to use transcoder, mostly focused on MPEG4.
Most presets code to something that specific hardware likes, often combining H.264 video and AAC audio in a MPEG-4 container.
...but you can play with the options to do MP3 audio, Xvid-style video (though apparently not the advanced settings), use a MKV container, and more.
Bitrate is in the Video tab, detailed try-harder settings in the Advanced tab, Audio stuff under 'Audio'.
for xvid
In the Video tab, 'Video codec' dropdown, 'MPEG-4 (FFmpeg)' refers to MPEG-4 ASP. There is exactly one given preset that uses it, 'Legacy / Classic'
for x264
In the Video tab, 'Video codec' dropdown, 'H.264 (x264)' is what you want -- which is also the default (in all given presets except 'Legacy / Classic' presets)
'Regular/High profile' preset is basically the slow-and-good-quality setting, 'Regular/Normal' a somewhat faster variant.
Tricks, commands, option notes
Images from movie
- mplayer/mencoder
mplayer -nosound -vo png:z=4 infile
Where:
- you can also use jpeg, pnm, tga, or gif89a for an animated gif. See the mencoder man page for options for each file format, which may include quality options and the directory to save files to.
- 4, for png, is moderately fast and low compression (1-9 scale)
- To extract one out of so many frames, add -vf framestep=5 (for one out of six). Frameskip still decodes all frames it passes, which is slower than you might wish
- ...If a selection of keyframes will do, you could getting an image per so-many seconds (or the closest keyframe) using -sstep 1 to skip a second for each extracted frame. The timestep may be irregular, and I seem to remember getting a few bad frames(verify).
- ffmpeg
ffmpeg -i infile -an -f image2 filename%04d.png
- ffmpeg understands %d and %[0-4]d. When extracting single frames you can omit that.
- start at second position: -ss 180
- extract every so many frames(verify): -r 1/5
- exit after some amount of frames: -frames 5
- -sameq ?
See also image2 demuxer for details.
For thumbnailing: try a start position and a single frame, e.g. -ss 180 and -frames 1 (mencoder) / -vframes 1 (ffmpeg)
mencoder seems to use filenames like 00000001.jpg, 00000002.jpg, etc. You can't control the filename, but you can control the directory it goes to, by adding :outdir=/tmp/path to the -vo options (works on jpeg, png, and pnm outputs).
In ffmpeg you can control the filename.
See also
Movie from images
ffmpeg
Something like:
ffmpeg -r 10 -f image2 -pattern_type glob -i "*.png" -vcodec mpeg4 -b:v 2000k out.mp4
Alternatives to input specification:
- -i movie_%1d.tif is shorter, but fails on gaps, and is annoying if there is no strict pattern.
- (There is also the deprecated -i '%*.tif') (verify)
- For some documentation: https://www.ffmpeg.org/ffmpeg-formats.html#image2-1
Notes:
- will determine image filetype (based on extension(verify))
- You may want a lower framerate, e.g. -r 2
- academic users: when your input is sharp rendered things rather than photographic images, you may e.g. prefer forcing iframes (via one-sized GOPs)
mencoder
Something like:
mencoder "mf://*.jpg" -mf fps=10 -o movie.avi -ovc lavc -lavcopts vcodec=mjpeg
TODO: actually try
Alternatives:
- mf://@stills.txt
GIFs
You probably want a palette best for the image set, which requires a pass to generate.
Look at palettegen, e.g. like: https://stackoverflow.com/questions/34552247/how-to-use-palettegen-and-paletteuse-filters-with-ffmpeg-for-image-sequences
or other people's version (sometimes more parametrized)
In my case I wanted a tweaked stopmotion, which amounts to the images from movie, (delete some frames), movie from image sections above.
Screen capture
https://trac.ffmpeg.org/wiki/Capture/Desktop
Note that you can use image2 as output as well, meaning individual files.
letterbox detection
To help discover how the black bars around the video should be cropped:
mplayer -vo null -vf cropdetect dvd:// -dvd-device DVD.ISO
The cropdetect filter may play safe, rounding the sizes to the nearest factor of 16 for compatibility with the most compressors, which means that you may still see a thin black border.
You can play with the values (they are width:height:xoffset:yoffset). Most codecs will also deal with other sizes, but may not necessarily do so most efficiently.
It may pay off to crop a little more than that, so you may want to play with the setting it suggested, e.g.
mplayer -vf crop=688:384:16:96 dvd:// -dvd-device DVD.ISO
specifying time positions, sections, and such
Useful for frame capture, for example for when you want to extract certain sections, for thumbnails that skips intros, and whatnot.
- mplayer/mencoder
You can seek to a start position in seconds, with optional minutes and hours, for example -ss 56 (position in seconds) and -ss 01:02:56 (one hour, two minutes, 56 seconds in).
And stop before the end with either
- -endpos time (note: actually amount of played time, not end position in video. For example, -ss 60 -endpos 60 goes from 0:01:00 to 0:02:00)
- -frames n (to stop after n frames)
- ffmpeg
- -ss 180: the same in mencoder and ffmpeg, see above
- -vframes n: stop after n frames
Add/fix an index (seekability)
- mplayer/mencoder
When there is no avi index or it is invalid, many players will either not allow seeking or take quite a bit of time building one before playing the video.
You can make mplayer calculate an index before it starts playing using -idx, or force recalculation with -forceidx, in case it doesn't seem correct but you know it is, for example because it fails to seek properly or have audio/video syncing problems (note that that can have many other causes too).
You can also write a new file with a new index, which doesn't take very long.
mencoder -forceidx -oac copy -ovc copy inputfile -o outputfile
Multiple inputs
- ffmpeg and multiple sources
You can use multiple inputs, and select from multiple streams from each input.
For example, a DVD source with two soundtracks might show (the 0 before the dot referring to input 0):
Stream #0.0[0x1e0]: Video: mpeg2video (Main), yuv420p, 720x576 [PAR 16:15 DAR 4:3], 8000 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc Stream #0.1[0x80]: Audio: ac3, 48000 Hz, stereo, s16, 192 kb/s Stream #0.2[0x81]: Audio: ac3, 48000 Hz, stereo, s16, 192 kb/s Stream #0.3[0x20]: Subtitle: dvdsub
To pick out the video stream and the second audio stream, you can do
-map 0:0 -map 0:2
....which in the encode debug will show:
Stream mapping: Stream #0.0 -> #0.0 Stream #0.2 -> #0.1
You can also combine streams from multiple inputs (e.g. audio from a separate file).
You can also generate multi-stream outputs, but I haven't looked into that.
Extracing audio
As an audio-only file
Half the time the reason is to have the audio for easy inclusion in some project (so a raw format works best).
- ffmpeg (wav)
ffmpeg -i video.mkv -acodec pcm_s16le -ac 2 audio.wav
- ffmpeg (mp3, stereo, ~192kbps VBR)
ffmpeg -i video.mkv -acodec libmp3lame -ac 2 -q:a 2 audio.mp3
See [3] for more VBR details
Note that if you want send to a pipe (or a filename with unusual extension), you'll need to specify a file format/muxer (see this list), like -f wav, -f mp3, -f ogg or such.
- mplayer/mencoder (wav)
mplayer -vo null -vc null -ao pcm:file=/data/outfile.wav -srate 44100 -noframedrop infile
While sometimes you want the audio as precisely as possible (see e.g. next section), this isn't always a generically playable format so you may want to convert to something you like (e.g MP3 via output codec lame, or just go via wav and encode yourself)
Notes:
- You sometimes want to control the bitrate, amount of channels, etc.
- -srate is optional but may be useful to convert from relatively unusual rates (you may want to avoid it if the input is within 44100..48000).
- -vc null (or dummy) means video isn't decoded
- -vo null discards the video (may be redundant, may be necessary for the chaining)(verify)
- -noframedrop may be redundant, given no video output (verify)
Just the audio stream, as-is
Given an MPEG4 input, you can create an MPEG4 audio-only file by copying just the audio stream to a new container, with something like:
- ffmpeg (original)
ffmpeg -i my_video.mp4 -c copy -map 0:a output_audio.mp4
Flash encoding
In libavcodec, the flv vcodec refers to Sorenson.
Note this is the older and lower-quality variant of flash video. These days you want to use H.264 - see below this section
ffmpeg -i input.avi -vcodec flv -acodec libmp3lame -b 800k -ab 96k -f flv output.flv
or:
mencoder -of lavf -ovc lavc -oac lavc \ -lavcopts vcodec=flv:vbitrate=800:acodec=libmp3lame:abitrate=96 \ inputfile -o outputfile.flv
Notes:
- Audio in Flash is usually MP3 with a few extra restrictions. The most important is that sampling rate should be 11025Hz, 22050Hz or 44100Hz. If it's something different, (e.g. 48kHz) you should resample it
- ffmpeg example: -ar 44100
- mencoder example -af lavcresample=22050 -srate 22050
- Bitrate depends on what you want to do. Ballpark:
- 640x480 and TV/DVD resolutionmight be doable at ~500 to 800kbps
- 720p might need be doable at around ~2000kbps.
Recent Flash versions...
- added H.264 video (since Flash 9)
- added AAC audio (since Flash 9)
- added Speex audio (since Flash 10)
- understands MPEG4 containers (since Flash 9).
- Uses .f4v extension (in the case of video). When you use H.264 or AAC, this container is recommended.
When using H.264 for Flash video, for Flash-less devices (iPhone, iPad), and in HTML5 compliant browsers, which is making it the new web favorite.
As of this writing, FFmpeg does not support directly writing an F4V container ((verify) - probably about some of the metadata, since it can certainly use mpeg-4 containers), so you'll have to use the older .flv container for now (which apparently is a little more restrictive). Example:
ffmpeg -i input.avi -vcodec libx264 -vpre hq -vpre main -ar 44100 -ab 96k -ac 2 -f flv output.flv
When you want to support many devices (particularly phones and other mobile devices) without using multiple streams, you have to stick with a bunch of restrictions.
In particular, some mobile devices can decode Main profile in realtime, but in others you can only guarantee that for Baseline(verify) (and that's with hardware assistance), so but using fancier features that make for more efficient quality-per-space may make video choppy on such platforms.
Yes, Baseline is going to be considerably larger for the same quality.
Some more technical notes
See also Video for some more general technical information
Unsorted notes
lavc vcodecs
⌛ This hasn't been updated for a while, so could be outdated (particularly if it's about something that evolves constantly, such as software or research). |
Ordered very roughly from more to less interesting:
- MPEG4 AVC
- libx264 - x264 H.264/AVC MPEG-4 Part 10
- MPEG4 ASP
- mpeg4 - MPEG-4 (DivX 4/5)
- libxvid - Xvid MPEG-4 Part 2 (ASP)
- msmpeg4 - DivX 3
- msmpeg4v2 - MS MPEG4v2 (pre-standard)
- libtheora - Theora
- flv - Sorenson's H.263 variant used in Flash video (note: recent Flash supports H.264 formats too)
- mpeg1video - MPEG-1 video
- mpeg2video - MPEG-2 video
- h263 - H.263
- h263p - H.263+
- h261 - H.261
- svq1 - Apple Sorenson Video 1 (H.263-based)
- rv10 - an old RealVideo codec (H.263-based)
- dvvideo - Sony Digital Video
- huffyuv - HuffYUV
- ffvhuff - nonstandard 20% smaller HuffYUV using YV12
- ffv1 - FFmpeg's lossless video codec
- ljpeg - Lossless JPEG
- mjpeg - Motion JPEG
- snow experimental wavelet-based codec (from FFmpeg)
- roqvideo - ID Software RoQ Video
- wmv1 - Windows Media Video, version 1 (AKA WMV7)
- wmv2 - Windows Media Video, version 2 (AKA WMV8)
- asv1 - ASUS Video v1
- asv2 - ASUS Video v2
See also:
libavcodec audio-codec options, and other audio notes
⌛ This hasn't been updated for a while, so could be outdated (particularly if it's about something that evolves constantly, such as software or research). |
Audio codecs are regularly
- MP3 (gives good quality in limited bitrates)
- MP2 (for compatibility, and it's simpler+faster than MP3. Also lavc's default(verify))
In some cases you are constrained to specific codecs. For example, you could use AC3 for DVDs, AAC for videos meant to be played on a PSP[4]) or 3GPP-specific codecs, or want some feature not available in all codecs (e.g. more channels than stereo, losslessness).
List of acodecs from the man page (may be a little outdated):
- copy - uses the input stream as-is (may not be possible in the given container)
- MP3:
- libmp3lame - MPEG-1 audio layer 3 (MP3) using LAME (not to be confused with -oac mp3lame)
- mp3 is deprecated, use libmp3lame now
- If you use mencoder, it seems that using -oac mp3lame + -lameopts gives you more configurability than -oac lavc + acodec=libmp3lame (verify)
- libfaac - AAC (Advanced Audio Coding) using FAAC
- ac3 - AC-3 Dolby Digital
- mp2 - MPEG-1 audio layer 2 (MP2), useful for DVDs and such
- vorbis - Ogg Vorbis
- pcm_* and adpcm_* - PCM and ADPCM formats, various specific variants
- libamr_nb - 3GPP Adaptive Multi-Rate (AMR) narrow-band
- libamr_wb - 3GPP Adaptive Multi-Rate (AMR) wide-band
- wmav1 - Windows Media Audio v1
- wmav2 - Windows Media Audio v2
- flac - Free Lossless Audio Codec (FLAC)
- g726 - G.726 ADPCM
- roq_dpcm - Id Software RoQ DPCM
- sonic - experimental simple lossy codec
- sonicls - experimental simple lossless codec
You probably want to specify a bitrate; defaults may well be overly conservative.
For example, to re-encode only audio:
mencoder movie.wmv -ovc copy -oac lavc -lavcopts acodec=libmp3lame:abitrate=96 -o movie.avi ffmpeg -i movie.wmv -vcodec copy -acodec libmp3lame -ab 96 movie.avi
When you use -oac mp3lame instad of (instead of -lavcopts acodec=libmp3lame), you get more control over encoding options (using -lameopts). For example:
mencoder movie.wmv -o movie.avi -ovc lavc -oac mp3lame -lameopts preset=medium
Unsorted mplayer/mencoder notes
File/container options
The output file format -- usually little more than a container -- is regularly left as the default .avi, which is fine if you're not doing any fancy multiplexing, multi-tracking, or embedding. (If you want to specify the file format explicitly, use something like -vf lavf and -lavfopts format=avi, or rather for alternatives like mkv, mp4, or one of the specific-purpose ones (like?).
There are a few details to container formats, what the file can contain (such as alternative audio streams and subtitles), what sort of conventional abuse exists (very common in AVI), which formats are standard-supported and which formats can be shoved in but won't be played by (only-)compliant players.
There are also some details to specific combinations when encoding (see for example MPEG's harddup details).
Encoder/codec choice Mplayer gets a lot of functionality from using FFmpeg, or more specifically libavcodec (lavc for short). (lavc is developed by the ffmpeg team, and ffmpeg itself is another front-end to libavcodec).
In an overall convesion, some things are done with mencoder-specific code, some with lavc (or another encoder choice), and some can be done with either.
For example, there is an mplayer-internal way to encode xvid, and an ffmpeg way. Similarly, there are multiple ways to use mp3 as an audio codec (some were removed to avoid confusion), and multiple ways to mux together streams. In some cases, you may wish to use some specialized tool (for example for complicated muxing) instead of mencoder.
There are two main choices to make when encoding, three if you're picky:
- choice of library for the output video codec (-ovc)
- choice of library for the output audio codec (-oac)
- output (container) format (-of)
To see the options available in your version/installation, run mencoder -ovc help -oac help -of help.
You can also leave a stream alone by using -ovc copy (for video) or -oac copy (for audio). This passes through that stream, so obviously also doesn't combine with filters, and is useful to ends like taking out a single stream or muxing streams into containers.
See also:
Note that libavcodec with the mpeg4 vcodec will by default set Fhe fourCC FMP4, which is not as widely recognized as some other FourCCs. A better supported value is DX50 (DivX 5), which should ebe compatible with more MPEG4-capable players. You can set -fourcc DX50 on the command line (or as a default in your mencoder config).
Simple example
To give a simple recoding example:
mencoder movie.wmv -o movie.avi -ovc lavc -oac lavc
This will
- convert from Microsoft WMV, detected from the input file you have it
- into a new AVI container (default, and there is no specific -of set)
The choice of lavc as the encoding library for both audio and video, with no further options, means this case relies on configured defaults, which means that the output AVI will most likely contain DivX video and MP2 audio.
If you want to make specific codec choices and make specific quality options (usually of the 'spend longer to make a better quality output' sort), you pass them in via -lavcopts.
Most of the variation and choice lies in the options to libavcodec, which are not detailed by the basic -oac help functionaliry because to mencoder, lavc is just one of the libraries you can plug in.
The mencoder page does however spend a lot of text on lavc. See man mencoder and look for the lavcopts section. To skip to that section while viewing the man page, type: /\(\-lavcopts) (or you can just scroll there).
See also
- http://www.mplayerhq.hu/DOCS/HTML/en/ and particularly:
AVCHD and MTS
AVCHD is a format common to consumer HD camcorders (and roughly amounts to a specific H.264 profile, at relatively high bitrate, and AC3 or raw PCM for sound)
They are spanned into smaller files (to avoid size issues on memory cards formatted to e.g. FAT32).
That spanning is done at byte level without any regard to the content. That is, reading and seeking within spanned MTS files requires some of the metadata files around it.
Importing these files as separate video files will mostly work, only dropping a few video frames that are incomplete near the edges. But that's probably not what you want.
Most of these cameras have a transfer tool, that will piece it together during that transfer.
You probably want to use such tools if you have more than one recording on there -- but it's useful to know you that if you copied card contents before reading this,, then can typically just concatenate these files in sequence, yielding a proper single file without said issues(verify). It might still be missing some other structuring, but avoids dropping audio/video content.