Image - unsorted

From Helpful
Jump to: navigation, search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codescs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music: Some history, ways of making noises · Gaming synth

Modular synth (eurorack, mostly): sync · power supply · formats (physical, interconnects)

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

A lot of this is experiment and work in progress, and very little of it has been tested to academic or even pragmatic standards. Don't trust any of it without testing it yourself.

It's also biased to Python, because I like rapid prototyping. I can always write fast code later.

Color conversion

...because some spaces are much more sensible to work in, due to linear distance being closer to perceptual distance than the more standard spaces.

Multiple related images

Median of pixel along set of images

...emphasizing the consistent/still areas, which is typically what you would consider the background.

The common example is "in a still scene with some tourists wandering about, make your camera do one two dozen photos over a minute, and median them", because most of the pixels will be very stable, and the people moving about will be outliers. (note that anyone who was sitting in one spot will probably become blurry because they'll be a composite from multiple photos)

median remove people

Differential image

Often refers to keeping track of a longer-term average, and subtracting individual frames from it.

This takes out everything that's been there consistency (...lately), and highlights details in areas with movement.

One example is stationary traffic videos, focusing on mainly the cars, because it easily removes entirely-static things like the roads, signs, lane detail, and also static-on-the-terms-of-minutes such as lighting gradients.


There are various distinct things called superresolution.

From the information-theoretical view, you can split these into:

Using multiple exposures

Learning-based Super Resolution

Scaling down well

Optical/diffractive superresolution

Plays with the diffraction limit of the optics of a system

See e.g.

See also

HDR and exposure fusion

The eyes are good at adapting locally to the amount of light, e.g. seeing details in a dark room even there's also a bright window in our view, in part because our eyes have different areas, and have logarithmic response - and also because we're used to exploiting these specifics, intuitively.

Film and digital sensors aren't good at this. Both because it's so much easier to create linear-response overall-lighting, but also because that makes sense for fast response, wide applicability, and capturing what's there accurately. But yeah, the window scene they suck at - they would probably adjust to the bright window, which would wash out the dark bits, and has no obvious way to cheat to imitate our eyes. (Or adjust to the dark detail, and have one mighty overexposed window)

High Dynamic Range roughly imitate our eyes, by cheating a bit.

You take images with different exposure (e.g. window-nice-and-dark-washed-out, details-in-dark-and-window-way-overexposed), and synthesizes an image that has detail in both areas, roughly by locally weighing the image that seems to give more detail.

Exposure fusion has only the 'piece together more detail' goal.

HDR has more steps, producing an immediate result that has more dynamic range than monitors can show.

When the purpose is human viewing, this is often done by being nonlinear the same way our eyes are. And often a little more aggressively than our own eyes would be, which tends to have a side effects that look like halos/blooming and some areas having unnatural contrast.

Another purpose is reproduction, e.g. in 3D rendering, which preserves HDR throughout enough of the pipeline to make an informed decision about use of the range - which tends to mean you don't wash away details in the darkest or lightest areas. Some of these techniques are now common because they help things look good for relatively little extra processing. Some are fancy and GPU-intense. It's a spectrum. See e.g. HDR rendering

Motion descriptors

Object tracking

Transforms mostly used to support others

Morphological image processing

See also:

Whole-image transforms

Gamma compression as a perceptive estimator

bandpass, blur, median

For color analysis we often want to focus on the larger blobs and ignore small details. (though in some cases they can fall away in statistics anyway).

Variance image

Of a single image: Each pixel defined by variance in nearby block of pixels. Good at finding sharp details and ignoring gradients

(Sometimes refers to variance of a pixel within a stack of related images)


Fourier transform


Difference of Gaussians (DoG)

DoG (Difference of Gaussians) takes an image, makes two gaussian-lowpass-filtered results with different sigmas, and subtracts them from each other.

This is much like bandpass, in that it preserves the spatial information in a range relating to the sigmas/radii.

Often mentioned in the context of edge detection. In that case, there may be further tweaking in the sigmas, and further steps in cleaning and counting zero crossings.

Compare Laplacian of Gaussian.

See also:

Laplacian of Gaussian (LoG)

The Laplacian reacts to local rapid change. Since this makes it very sensitive to noise, it is often seen with some smoothing first, e.g. in the form of the LoG (Laplacian of Gaussian)

Determinant of Hessian (DoH)

Radon transform

Not-specifically-image processing

...that find use here, often generic signal processing.


Kalman filter

Nontrivial goals

Particularly those that stand or fall by their assumptions.

Edge-aware transforms

Image registration

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Image registration[1] is the fancy name for "aligning two nearly identical images"

In some cases, e.g. various astronomy, canbe well constrained so you can get a lot of use out of assuming that only a moderate amount of translation (and implied edge cropping) happens, and no scaling and no (or almost no) rotation.

Which is relatively simple and controlled. This is often done with something like cross-correlation and often phase correlation.

It gets more complex if you want to solve for cropping, rotation, uniform and/or non-uniform scale - typically on top of translation. The combination often means you need an iterative approach, and note that this is not a convex problem -- there are potentially local minima that that may not the optimal point, or plain nonsense, so simplex-type solutions will not always work without some guiding assumptions / context-informed filtering.

For example,

  • a bunch of photos from a tripod will usually see no worse than a few-pixel shift, and possibly a tiny rotation.
handheld more of both
  • internet-reposts see a bunch of rescaling and cropping (though rarely rotation)
  • a series of successive frames from an electron microscope may see a shift in the stage
and sometimes of parts of the sample (e.g. in cryo-EM in reaction to the beam)
...yet usually a shift-only, constrained-within-a-few-pixels solution already goes a long way

See also:


Image similarity

Near-duplicate detection

Degree of image similarity

See also:

Image segmentation

Image segmentation takes an image partitions pixels into segments, where pixels with the same label share useful characteristics, typically to isolate main/foreground objects for further analysis, and/or deal with textures.

Quick shift



Object detection

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

'Object detection' tends to refer to detecting anything more complex than a point, edge, blob, or corner.

Recent study has also started to consider the compositional nature of objects.

Image segmentation splits an image into regions. Depending on the task this can be or help object detection, be/help texture detection, ignore background, separate objects/textures to help process each individually, etc.

Retrieval systems


Adaptive thresholding

Your most basic thesholding into a boolean image using a single, global threshold value will do what you wish only on clean examples.

No matter how clever that value is chosen, it implicitly implicitly assumes that the image's values are bimodal. Consider its histogram - it'd have to have one blob for background, one blob for the object.

A good example of something that will mess up that assumption is lighting conditions that adds and overall gradient to the image (a gradient less pronounced than the detail, that is. With a gradient more pronounced than the detail, you probably don't want to )

Informed massaging beforehand can help, of course.

Dynamic thresholding, a.k.a. adaptive thresholding, is basically such massaging as part of the thresholding.

It often amounts to calculate a threshold per pixel, based on local neighbourhood. Even "is the pixel some offset above the local average" already works a lot better, and seems to be the common implementation

There's a good example at


Halide is image processing in a declarative way, splitting algorithm from its execution+optimization. which can be really handy when you want graphics pipeline optimization without having to spend hours at low level, which may turn out to be more platform-specific than you thought.

You write code against its API (C++, but there are bindings for other languages) and uses llvm(verify) for compilation to varied platforms/environments, like x86/SSE, ARM v7/NEON, CUDA, OpenCL, OpenGL, and some more specific experimental ones.

See also