Image processing notes

From Helpful
Revision as of 19:01, 21 August 2021 by Helpful (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · image processing

Video: format notes · encoding notes · On display speed

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Digital sound and processing: capture, storage, reproduction · programming and codescs · some glossary · Audio and signal processing - unsorted stuff

Electronic music: Some history, ways of making noises · Gaming synth · on APIs (and latency) ··· microphones · studio and stage notes · Effects · sync ·

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround ·

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

A lot of this is experiment and work in progress, and very little of it has been tested to academic or even pragmatic standards. Don't trust any of it without testing it yourself.

It's also biased to Python, because I like rapid prototyping. I can always write fast code later.


Noise reduction

gaussian blur

(or other simple interpolating blurs)


  • Simple. Fairly fast.
  • does not introduce spurious detail


  • indiscriminantly removes (high-)frequency content. a.k.a. "Smears everything"

median filtering


  • Simple. Not quite as fast as you'ld think.
  • rejects outliers; best example is rejecting salt and pepper noise
  • will preserve edges better than e.g. linear interpolation


  • can remove high-frequency signal
  • the edge preservation depends on some conditions, so doesn't always happen. The mix can look odd.

total variation denoising

Varies amount of blur by the amount of variation near the pixel.

Which means it mostly lessens noise in otherwise flat regions, while leaving spikes and edges mostly intact.


  • This tends to look more detailed than a basic mean filter, particularly on sharp images


  • Can't really tell what real edges are; for subtler images it can be much like mean

See also:

bilateral denoise

Reduce noise while preserving edges.

Averages based on their spatial closeness and radiometric similarity, and potentially other metrics. Like total-variance denoising in that it easily preserves edges, yet is often more true to photographic original than.

Playing with:

non-local means denoising

See also:

Anisotropic diffusion

See also:

Wiener filter

See also:

Halftoning, dithering

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Ordered dithering

Ordered dithering uses regular patterns to imitate gradients.

This will often have noticeable grid and/or crosshatch patterns.

Floyd Steinberg dithering

Jarvis, Judice, and Ninke dithering

Stucki dithering

Atkinson dithering

Burkes dithering

Sierra dithering

Color conversion

...because some spaces are much more sensible to work in, due to linear distance being closer to perceptual distance than the more standard spaces.

Automatic illuminant correction

The intent is usually to take out any mild tint that the illuminant has, or to correct a camera's mis-estimation of the illiminant.

In other words, mainly white balance correction, a.k.a. gray balance.

Color correction often comes down to

  • estimating what the illuminant probably was
photographers who cares about color accuracy tend to use a gray card to have a known-absolute reference.
without one, it's based on assumptions
and incorrect assumptions can introduce unnatural tinting in the process
  • apply chromatic adaptation so that that illuminant effectively becomes a given illuminant, such as D65 (mid-day sun), or just numerically equalize channels.

There are some more specific cases you could focus on, such as cases where you know color filters were used, or e.g. faded photographs could consider the dyes in use - their relative fading is usually documented.

Gray world

The idea: in a well balanced image, the average color is a neutral gray.

So we scale channels to makes the average become gray, typically implemented by making the average of the red, green, and blue the same, often by a basic linear gain.

The results vary a little with in which color space you do this.

Gray world makes decent sense when large patches are expected to be neutral, which is e.g. true in darker photographs.


  • Simple
  • works well on removing the illuminant's tint on images with a lot of white, a lot of dark areas, and/or a lot of each color, or a photograph which had such a neutrality but had a tint applied to it


  • Only true for images with a roughly white illuminant
if tinted, that you want to fully negate that tint - e.g. images intentionally taken with red light will effectively just ramp up Green and Blue - of which there was very little, so it likely comes out quite unnatural
can be tempered by limiting that gain
  • effectively equally uses all of the image for measurement
an assumption that is flawed to varying degrees

Auto levels

The idea: The brightest color should be white, the darkest color should be black.

So: rescale each channel's histogram to span the full range, accepting that some (e.g. 1%) will lie outside and be truncated.


  • brightest color becomes white, which on outside photos is often good enough
  • less sensitive to single-color use than gray world


  • suffers from similar problems to gray world
  • not sensitive to how much small the brightest pixel-area is. For example, having a bright window in the background means a white wall will be ignored.


Retinex is in itself a wider theory dealing with various color constancy effects, also dealing with local context and some human interpretation.

In the context of whole-image color correction, it mainly says that perceived white tends towards the strongest cone signal.

This is essentially a gentler form of gray world, referring to the overall effects rather than a single spot.

It roughly means that the maximum within each channel should be the same. While RGB doesn't correspond precisely to eye cones, it's close enough to work well.

The correction could be implemented as just (linear) gain on each channel to make the maxima the same, though in practice, using a very-high-percentile point, to ignore a few outlier pixels, is more robust.


  • simple idea, simple code
  • better behaved than gray world, in that it avoids many larger color shifts


  • sometimes too cautious, e.g. does little on overexposed images
  • because it still has the underlying assumption the illuminant must be white-ish, it breaks
    • on scenes that had such an illuminant but no near-illuminant color in it
    • where the illuminant is colored, e.g. underwater photography
can be tempered by limiting the difference between the different gains applied - because usually significant different gains makes no sense
...though that in combination with the percentile logic can have some odd side effects

Gray world and retinex

"Combining Gray World and Retinex Theory for Automatic White Balance in Digital Photography" argues that combining the two makes sense.

Which requires a little trickery, as linear correction alone cannot satisfy both criteria at once.

Robust Automatic White Balance

Essentially a variant of gray world that is selective about the areas it uses, primarily looking for nearly-white parts, so e.g isn't distracted by the average of the colored parts.


  • Doesn't make as many mistakes as plain gray world
  • outside photos usually have such near-whites, so this makes sense for them


  • images may not have representative near-whites
  • selection of areas to use turns out to be harder than it sounds, depending on how robust you want it to be.

J Huo et al., "Robust Automatic White Balance Algorithm using Gray Color Points in Images"

More reading

D Nikitenko et al., "Applicability Of White-Balancing Algorithms to Restoring Faded Colour Slides: An Empirical Evaluation"

A Rizzi et al., "A new algorithm for unsupervised global and local color correction"

D Cheng et al., "Illuminant Estimation for Color Constancy: Why spatial domain methods work and the role of the color distribution"

Multiple related images

Median of pixel along set of images

...emphasizing the consistent/still areas, which is typically what you would consider the background.

The common example is "in a still scene with some tourists wandering about, make your camera do one two dozen photos over a minute, and median them", because most of the pixels will be very stable, and the people moving about will be outliers. (note that anyone who was sitting in one spot will probably become blurry because they'll be a composite from multiple photos)

median remove people

Differential image

Often refers to keeping track of a longer-term average, and subtracting individual frames from it.

This takes out everything that's been there consistency (...lately), and highlights details in areas with movement.

One example is stationary traffic videos, focusing on mainly the cars, because it easily removes entirely-static things like the roads, signs, lane detail, and also static-on-the-terms-of-minutes such as lighting gradients.


There are various distinct things called superresolution.

From the information-theoretical view, you can split these into:

Using multiple exposures

Learning-based Super Resolution

Scaling down well

Optical/diffractive superresolution

Plays with the diffraction limit of the optics of a system

See e.g.

See also

HDR and exposure fusion

The eyes are good at adapting locally to the amount of light, e.g. seeing details in a dark room even there's also a bright window in our view, in part because our eyes have different areas, and have logarithmic response - and also because we're used to exploiting these specifics, intuitively.

Film and digital sensors aren't good at this. Both because it's so much easier to create linear-response overall-lighting, but also because that makes sense for fast response, wide applicability, and capturing what's there accurately. But yeah, the window scene they suck at - they would probably adjust to the bright window, which would wash out the dark bits, and has no obvious way to cheat to imitate our eyes. (Or adjust to the dark detail, and have one mighty overexposed window)

High Dynamic Range roughly imitate our eyes, by cheating a bit.

You take images with different exposure (e.g. window-nice-and-dark-washed-out, details-in-dark-and-window-way-overexposed), and synthesizes an image that has detail in both areas, roughly by locally weighing the image that seems to give more detail.

Exposure fusion has only the 'piece together more detail' goal.

HDR has more steps, producing an immediate result that has more dynamic range than monitors can show.

When the purpose is human viewing, this is often done by being nonlinear the same way our eyes are. And often a little more aggressively than our own eyes would be, which tends to have a side effects that look like halos/blooming and some areas having unnatural contrast.

Another purpose is reproduction, e.g. in 3D rendering, which preserves HDR throughout enough of the pipeline to make an informed decision about use of the range - which tends to mean you don't wash away details in the darkest or lightest areas. Some of these techniques are now common because they help things look good for relatively little extra processing. Some are fancy and GPU-intense. It's a spectrum. See e.g. HDR rendering

Motion descriptors

Object tracking

Whole-image descriptors

Describes all of the image, as opposed to describing specific features you found.

...there is natural overlap with dense descriptors, which analyse a whole image in independent patchs.

Color descriptors


A color histogram was used in development, but for many uses this is too high-dimensional.

Color space is settled in each descriptor, because not doing so would hurt interoperability.

Scalable Color Descriptor (SCD)

Defined in HSV

Color Structure Descriptor (CSD)

Defined in HMMD

See also:

Color Layout Descriptor (CLD)

YCbCr space

Dominant colors

Color quantization
Merged histogram

Texture descriptors

Texture is harder to quantify than you may expect.

Some methods are much easier to apply to constrained single purposes, e.g. medical imaging, than it is for arbitrary images.

Some things work better in lab conditions (e.g. recognize known textures), some work well enough to e.g. recognize differences in areas in a picture, but to robustly (e.g. scales, rotation, and lighting-invariant) label textures is hard.


HTD (Homogeneous Texture Descriptor)
idea: use fourier analysis to get basic frequency and direction information
in frequency space (2D FFT amplitudes), divide into 5 octave-style rings, and 6 directions
making for 30 bins
plus two: overall mean and stdev
EHD (Edge Histogram Descriptor)
idea: detect which direction detected edges go, make a histogram of that; useful for overall comparisons.
TBD (Texture Browsing Descriptor)
idea: indicators of perceptual directionality, regularity, and coarseness of a texture
most dominant texture orientations (0 meaning none, 1..6 meaning 0..150 degrees in steps of 30)
second most dominant texture orientations (0 meaning none, 1..6 meaning 0..150 degrees in steps of 30) (optional)
regularity in first orientation
regularity in second orientation (optional)
Implementation based on bank of orientation- and scale-tuned Gabor filters.




Image entropy

Can mean

  • the overall entropy of a whole block
  • for each pixel, the entropy of the values around it

Roughly an estimate of local contrast / texture.

Image moments


Contour detection


Feature detection and description

Related tasks

Template matching

Classical features

The classical set of features are (a subset of) things that happen at few-pixel scale:

  • Points -
  • Blobs - smooth areas that won't (necessarily) be detected by point detection. Their approximate centers may also be considered interest points
  • Edges -
    • a relatively one-dimensional feature, though with a direction
  • Corners - Detects things like intersections and ends of sharp lines
    • a relatively two-dimensional kind of feature
  • Ridges -
  • Interest point - could be said to by any of the above, and anything else you can describe clearly enough
    • preferably has a clear definition
    • has a well-defined position
    • preferably quite reproducible, that is, stable under relatively minor image alterations such as scale, rotation, translation, brightness.
    • useful in their direct image context - corners, endpoints, intersections
  • Region of interest
    • any subrange (1D), area (2D), volume (3D), etc. identified for a purpose.
    • Also often in an annotative sense, not necessarily a machine-proffered one

See also:

Edge detection

  • Canny [1]
  • Differential [2]
  • Canny-Deriche [3]
  • Prewitt [4]
  • Roberts Cross operator [5]
  • Sobel [6]
  • Scharr operator - variation on Sobel that tries to deal better with rotation [Sobel_operator#Alternative_operators]


  • Marr-Hildreth [7]

Playing with (mostly python)

  • PIL has ImageFilter.FIND_EDGES (convolution-based)

Interest point / corner detection

Blob detection

Laplacian of Gaussian (LoG)
Difference of Gaussians (DoG)
Determinant of Hessian (DoH)
MSER (Maximally Stable Extremal Regions)

Detects covariant regions, areas that are stable connected part of gray levels.

Primarily a region/blob detector. Sensitive to blur.

Decent performance.

See also:


Principal curvature-based region detector

Harris affine

Hessian affine

Dense descriptors

Dense meaning it describes the whole image a patch at a time. As opposed to sparse, meaning for selective areas (often features).

The distinction can be subtle - dense may just mean we don't necessarily assume that we can reliably selecting good features/areas to study.

Any overall descriptor used locally

...color, texture, or such.

Lets you

  • describe the variation of said descriptors within an image
  • focus on areas where things are happening,

Image gradient

At each point in an image, you can calculate where the local gradient is going towards -essentially a vector.

In theory based on the local derivative, in practice a discrete differentiation operator, such as Sobel or Prewitt (or other kernel-style things - actually quite akin to edge detection that isn't particularly tuned to a single direction (as some are).

Kernel-based methods tend to work on at least 3x3 pixel areas, though may be larger depending on application.

Histogram of Oriented Gradients (HOG)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Refers to the general idea of locally detecting gradients, which is a concept used by a whole family of algorithms.

And to a fairly specific use, doing this for the entire image, on fixed-size, small cells (e.g. 8x8 pixel).

For each cell, we can build a histogram of how much (magnitude-wise) you saw its parts pointing in each direction (e.g. the 8 basic compass directions) -- with some footnotes like bleeding into adjacent bins to account to be resistant to aliasing.

This may well be the first step in something else, e.g. detection of certain objects by training on results.

Due to being based on differences (plus some normalization), it is fairly resistant to illumination differences.

It is somewhat sensitive to orientation. Due to its nature it's not too hard making it less resistant, though by that time you may find SIFT more interesting.


  • R-HOG: rectangular (typically square)
  • C-HOG: circular
  • Fourier HOG
Rotation invariant

See also:


See also:

Sparse/local descriptors

Sparse meaning in describes local areas, and is selective about what parts, as opposed to doing so for the whole image.

Feature description for things like image comparison is based on the idea that considering all points in an image for description is infeasible, so informative points are chosen instead. The challenge then becomes choosing highly informative and stable points.

SIFT (Scale-Invariant Feature Transform)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Read up on local gradients, particularly HOG.

SIFT continues that idea by analyzing the area around an already chosen point of interest -- often after deciding the rotation and scale of the patch it will be analysing based on local content.(verify)

SIFT is often a first step in something else, such as object recognition (often bag-of-words style), is used to align similar images in cooperation with RANSAC,

See also:

uses color information, giving more stable features around color contrast
uses PCA instead of the gradient histogram, and its output is more compact
GSIFT adds global context to each keypoint (verify)
features are robust to more affine transforms(verify)
  • See also SURF - has a similar goal but uses different methods for most steps
  • See also SPIN, RIFT (but SIFT usually performs better(verify))
  • See also FIND, MIFT

GLOH (Gradient Location and Orientation Histogram)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


See also:

SURF (Speeded Up Robust Features)

faster than SIFT, performs similarly

See also

LESH (Local Energy based Shape Histogram)

FAST (Features from Accelerated Segment Test)

Mainly a feature detector

E Rosten, T Drummond (2006) "Machine learning for high-speed corner detection"


A Olivia, A Torralba (2001) "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope" [13]

BRIEF (Binary Robust Independent Elementary Features)

M Calonder et al. (2010) "BRIEF: Binary robust independent elementary features"

ORB (Oriented FAST and Rotated BRIEF)

Offered as an efficient alternative to SIFT (and SURF), and also not patented.

See also:



K Mikolajczyk, C Schmid (2005) "A Performance Evaluation of Local Descriptors"

Combining descriptors

Indexing descriptors and/or making descriptors more compact, for retrieval systems and/or fingerprint-style descriptors often meaning a useful lower-dimensional representation.


Fischer vector

Vector of Locally Aggregated Descriptors (VLAD)


  • Structure tensor

Scale space

Scale space is a concept that makes detection of things work at multiple/varied scales.

Roughly speaking, it's a series of images that lowpassed to different degrees, also in part because that makes detected coordinates work on each image.

In practice it can also be scaled down (implies lowpass), if the algorithm it's supporting deals with that more easily (e.g. always looks at few-pixel scale, can't tweak how many). Note that scaledown and lowpass are not identical. A gaussian filter is fairly ideal in terms of frequency information, (which is why scale space is often specifically gaussian scale space), scaledowns can introduce some spurious, jagged-like information (varies with scaledown method). So in some cases the scaledown happens after filtering.

Motivations include:

  • Most current feature recognition works on a small scale (and often in terms of pixels). We'd like to also detect larger objects, without doing complex compositional things.
  • When we look at a scene, the fact that we recognize objects means we look at it at different scales.
e.g. from a distance we might identify the house, close-up we'll look at the door.
  • "when you squint", or see from a distance, or zoom out, that's essentially a lowpass

It turns out that anything that you can do via differentials (such as common feature detectors (edge, ridge, corner, etc.)) can be done without a rescale.

See also:

Stroke Width Transform (SWT)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

For each pixel, finds the likeliest stroke width containing that pixel. Somewhat aware of direction, and often part of letter detection.

Uses edge and gradient map.


  • not tied to detecting text of a specific size, can deal with rotation and skew
  • not overly sensitive to background gradients


  • slow (because of the intermediate maps)
  • Tends to assume hard contrast (and may assume text is much darker)

Hough transform

Finds imperfect versions of regular features like lines (first version did only lines), circles, ellipses. Essentially votes in a feature space.

Kernel-based Hough transform (KHT)


Transforms mostly used to support others

Morphological image processing

See also:

Whole-image transforms

Gamma compression as a perceptive estimator

bandpass, blur, median

For color analysis we often want to focus on the larger blobs and ignore small details. (though in some cases they can fall away in statistics anyway).

Variance image

Of a single image: Each pixel defined by variance in nearby block of pixels. Good at finding sharp details and ignoring gradients

(Sometimes refers to variance of a pixel within a stack of related images)


Fourier transform


Difference of Gaussians (DoG)

DoG (Difference of Gaussians) takes an image, makes two gaussian-lowpass-filtered results with different sigmas, and subtracts them from each other.

This is much like bandpass, in that it preserves the spatial information in a range relating to the sigmas/radii.

Often mentioned in the context of edge detection. In that case, there may be further tweaking in the sigmas, and further steps in cleaning and counting zero crossings.

Compare Laplacian of Gaussian.

See also:

Laplacian of Gaussian (LoG)

The Laplacian reacts to local rapid change. Since this makes it very sensitive to noise, it is often seen with some smoothing first, e.g. in the form of the LoG (Laplacian of Gaussian)

Determinant of Hessian (DoH)

Radon transform

Not-specifically-image processing

...that find use here, often generic signal processing.


Kalman filter

Nontrivial goals

Particularly those that stand or fall by their assumptions.

Edge-aware transforms

Image registration

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Image registration[14] is the fancy name for "aligning two nearly identical images"

In some cases, e.g. various astronomy, canbe well constrained so you can get a lot of use out of assuming that only a moderate amount of translation (and implied edge cropping) happens, and no scaling and no (or almost no) rotation.

Which is relatively simple and controlled. This is often done with something like cross-correlation and often phase correlation.

It gets more complex if you want to solve for cropping, rotation, uniform and/or non-uniform scale - typically on top of translation. The combination often means you need an iterative approach, and note that this is not a convex problem -- there are potentially local minima that that may not the optimal point, or plain nonsense, so simplex-type solutions will not always work without some guiding assumptions / context-informed filtering.

For example,

  • a bunch of photos from a tripod will usually see no worse than a few-pixel shift, and possibly a tiny rotation.
handheld more of both
  • internet-reposts see a bunch of rescaling and cropping (though rarely rotation)
  • a series of successive frames from an electron microscope may see a shift in the stage
and sometimes of parts of the sample (e.g. in cryo-EM in reaction to the beam)
...yet usually a shift-only, constrained-within-a-few-pixels solution already goes a long way

See also:


Image similarity

Near-duplicate detection

Degree of image similarity

See also:

Image segmentation

Image segmentation takes an image partitions pixels into segments, where pixels with the same label share useful characteristics, typically to isolate main/foreground objects for further analysis, and/or deal with textures.

Quick shift



Object detection

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

'Object detection' tends to refer to detecting anything more complex than a point, edge, blob, or corner.

Recent study has also started to consider the compositional nature of objects.

Image segmentation splits an image into regions. Depending on the task this can be or help object detection, be/help texture detection, ignore background, separate objects/textures to help process each individually, etc.

Retrieval systems


Adaptive thresholding

Your most basic thesholding into a boolean image using a single, global threshold value will do what you wish only on clean examples.

No matter how clever that value is chosen, it implicitly implicitly assumes that the image's values are bimodal. Consider its histogram - it'd have to have one blob for background, one blob for the object.

A good example of something that will mess up that assumption is lighting conditions that adds and overall gradient to the image (a gradient less pronounced than the detail, that is. With a gradient more pronounced than the detail, you probably don't want to )

Informed massaging beforehand can help, of course.

Dynamic thresholding, a.k.a. adaptive thresholding, is basically such massaging as part of the thresholding.

It often amounts to calculate a threshold per pixel, based on local neighbourhood. Even "is the pixel some offset above the local average" already works a lot better, and seems to be the common implementation

There's a good example at


Halide is image processing in a declarative way, splitting algorithm from its execution+optimization. which can be really handy when you want graphics pipeline optimization without having to spend hours at low level, which may turn out to be more platform-specific than you thought.

You write code against its API (C++, but there are bindings for other languages) and uses llvm(verify) for compilation to varied platforms/environments, like x86/SSE, ARM v7/NEON, CUDA, OpenCL, OpenGL, and some more specific experimental ones.

See also

Image summaries and analysis


Co-occurrence matrices

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Studies pixel-level texture.

Often of grayscale images, then called GLCM (Grey Level Co-occurrence Matrix).

With some data massaging it is useful for various other things.

Studying only pairs of adjacent pixels may sound like it'd be very noise-sensitive, but works better than you may think in part because most images are large enough to have a lot of pairs.

That said, there are some limitations and tweaks you should understand before you can apply this robustly.

(And there is value in pre-processing, but note that e.g. blur (and size down, because it's also effectively a lowpass) will mostly just move things towards the main diagonal -- which is related to consistent areas and gradients, which are already pronounced to start with for most images)

The output is a symmetric n-by-n matrix, where

  • each value in the matrix has counted how often the two values (indicated by the row and column of its location) co-occur.
  • n is the quantization level - usually 256 since you would typically run this on 8-bit images.
though quantizing to fewer levels is useful in some applications

There are some overall image properties you can calculate from that matrix, including:

  • Contrast (in nearby pixels)
The sum-of-squares variance
  • Homogeneity (of nearby pixels)
Which is basically the same as 'to what degree are the matrix values in the diagonal'
also a good indication of the amount of pixel-level noise
  • Entropy (within nearby pixels)
  • Energy, a.k.a. uniformity
sum of squares
Value range: 0..1, 1 for constant image
  • Correlation
...of nearby pixels, meaning (verify)
  • The amount of bins that are filled can also give you a decent indication of how clean an image is.
...e.g. giving you some indication of whether it has a single solid background color, and where it is on the spectrum of two-level, paletted, clip-art with little-blur, badly JPEG-compressed diagram, photo.
  • Directionality of an image, if you calculate values for different directions and see variance of such values.

Further notes:

  • Direction of the pixel pairs can matter for images with significant noise, high-frequency content, and/or extreme regularity.
Usually the implementation allows you to give either a x,y pixel offset (for e.g. 'take a pixel, and the pixel one to the right and one down') or an angle.
  • Some implementations also allow calculation for various distances. This is quite similar to running the calculation over resized images, but may be somewhat faster.

See also:


Detecting manipulation

Error level analysis

Principal Component Analysis

Demosaic analysis

See also