Image - unsorted: Difference between revisions

Latest revision as of 00:45, 21 April 2024

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: file format notes · video encoding notes · On display speed · Screen tearing and vsync

Simpler display types · Video display notes · Display DIY

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

A lot of this is experiment and work in progress, and very little of it has been tested to academic or even pragmatic standards. Don't trust any of it without testing it yourself.

It's also biased to Python, because I like rapid prototyping. I can always write fast code later.

Color conversion

...because some spaces are much more sensible to work in, due to linear distance being closer to perceptual distance than the more standard spaces.

Multiple related images

Median of pixel along set of images

...emphasizing the consistent/still areas, which is typically what you would consider the background.

The common example is "in a still scene with some tourists wandering about, make your camera do one two dozen photos over a minute, and median them", because most of the pixels will be very stable, and the people moving about will be outliers. (note that anyone who was sitting in one spot will probably become blurry because they'll be a composite from multiple photos)

median remove people

Differential image

Often refers to keeping track of a longer-term average, and subtracting individual frames from it.

This takes out everything that's been there consistency (...lately), and highlights details in areas with movement.

One example is stationary traffic videos, focusing on mainly the cars, because it easily removes entirely-static things like the roads, signs, lane detail, and also static-on-the-terms-of-minutes such as lighting gradients.

Superresolution

There are various distinct things called superresolution.

From the information-theoretical view, you can split these into:

Using multiple exposures

Learning-based superresolution

Scaling down well

Optical/diffractive superresolution

Plays with the diffraction limit of the optics of a system

See e.g. https://en.wikipedia.org/wiki/Super-resolution_microscopy

HDR and exposure fusion

Our eyes are good at adapting locally to the amount of light, e.g. seeing details in a dark room even there's also a bright window in our view, in part because our eyes have roughly logarithmic response(verify), in part because there are different areas with different sensitivity, and also because we're used to exploiting these specifics, intuitively.

Film and digital sensors aren't good at this. It's so much easier to create sensors with overall linear response, but also because that makes sense for fast response, wide applicability, and capturing what's there accurately. The global response means that a scene with e.g. a bright window in an otherwise dark room scene they will suck at - they would probably adjust to the bright window, which would wash out the dark bits, and has no obvious way to cheat to imitate our eyes. (Or adjust to the dark detail, and have one mighty overexposed window) Our eyes aren't actually much better, but they adapt somewhat more locally, and somewhat more smoothly over time.

High Dynamic Range roughly imitate our eyes, by cheating a bit.

HDR photography takes images with different exposure (e.g. window-nice-and-dark-washed-out, details-in-dark-and-window-way-overexposed), and synthesizes an image that has detail in both areas, roughly by locally using detail from the image that seems to give more detail.

Exposure fusion has only that the 'piece together more detail' goal.

High Dynamic Range (HDR) goes further, tries to create an image with more dynamic range than any of its input images, and (via some more complex steps) producing an image with more dynamic range than monitors can show - and that you could change the brightness of after the fact, to e.g. focus on details in that window or the room.

When the purpose is human viewing, this is often done by being nonlinear the same way our eyes are. And often a little more aggressively than our own eyes would be, which tends to have a side effects that look like halos/blooming and some areas having unnatural contrast.

Another purpose is reproduction, e.g. in HDR rendering in 3D games. For most of us it doesn't show you more dynamic range (most monitors can't), but preserves more lighting range throughout enough of the pipeline to make an informed decision about use of the range - which tends to mean less washing out of darkest or lightest areas (and makes those "going from dark to light" scenes look more realistic), and means you can program those without having to cheat heavily.

Some of these techniques are now common because they help things look good for relatively little extra processing. Some are much fancier and GPU-intense. It's a spectrum. See e.g. HDR rendering

Motion descriptors

Object tracking

Transforms mostly used to support others

Morphological image processing

See also:

http://en.wikipedia.org/wiki/Top-hat_transform

https://www.google.com/search?hl=en&q=morphological%20image%20processing

http://www.dspguide.com/ch25/4.htm

Whole-image transforms

Gamma compression as a perceptive estimator

bandpass, blur, median

For color analysis we often want to focus on the larger blobs and ignore small details. (though in some cases they can fall away in statistics anyway).

Variance image

Of a single image: Each pixel defined by variance in nearby block of pixels. Good at finding sharp details and ignoring gradients

(Sometimes refers to variance of a pixel within a stack of related images)

http://siddhantahuja.wordpress.com/2009/06/08/compute-variance-map-of-an-image/

Convolution

Fourier transform

Gabor

https://en.wikipedia.org/wiki/Gabor_filter

Difference of Gaussians (DoG)

DoG (Difference of Gaussians) takes an image, makes two gaussian-lowpass-filtered results with different sigmas, and subtracts them from each other.

This is much like bandpass, in that it preserves the spatial information in a range relating to the sigmas/radii.

Often mentioned in the context of edge detection. In that case, there may be further tweaking in the sigmas, and further steps in cleaning and counting zero crossings.

Compare Laplacian of Gaussian.

See also:

http://en.wikipedia.org/wiki/Difference_of_Gaussians

Laplacian of Gaussian (LoG)

The Laplacian reacts to local rapid change. Since this makes it very sensitive to noise, it is often seen with some smoothing first, e.g. in the form of the LoG (Laplacian of Gaussian)

Determinant of Hessian (DoH)

http://scikit-image.org/docs/dev/auto_examples/features_detection/plot_blob.html?highlight=difference%20gaussians

Radon transform

Not-specifically-image processing

...that find use here, often generic signal processing.

RANSAC

Kalman filter

Nontrivial goals

Particularly those that stand or fall by their assumptions.

Edge-aware transforms

Image registration

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Image registration[1] is the fancy name for "aligning two nearly identical images"

In some cases, e.g. various astronomy, canbe well constrained so you can get a lot of use out of assuming that only a moderate amount of translation (and implied edge cropping) happens, and no scaling and no (or almost no) rotation.

Which is relatively simple and controlled. This is often done with something like cross-correlation and often phase correlation.

It gets more complex if you want to solve for cropping, rotation, uniform and/or non-uniform scale - typically on top of translation. The combination often means you need an iterative approach, and note that this is not a convex problem -- there are potentially local minima that that may not the optimal point, or plain nonsense, so simplex-type solutions will not always work without some guiding assumptions / context-informed filtering.

For example,

a bunch of photos from a tripod will usually see no worse than a few-pixel shift, and possibly a tiny rotation.

handheld more of both

internet-reposts see a bunch of rescaling and cropping (though rarely rotation)

a series of successive frames from an electron microscope may see a shift in the stage

and sometimes of parts of the sample (e.g. in cryo-EM in reaction to the beam)

...yet usually a shift-only, constrained-within-a-few-pixels solution already goes a long way

See also:

Translation-only:

"Efficient subpixel image registration algorithms"

Image similarity

Near-duplicate detection

Degree of image similarity

Image segmentation

Image segmentation takes an image partitions pixels into segments, where pixels with the same label share useful characteristics, typically to isolate main/foreground objects for further analysis, and/or deal with textures.

Quick shift

SLIC

Felzenszwalb

Object detection

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

'Object detection' tends to refer to detecting anything more complex than a point, edge, blob, or corner.

Recent study has also started to consider the compositional nature of objects.

Image segmentation splits an image into regions. Depending on the task this can be or help object detection, be/help texture detection, ignore background, separate objects/textures to help process each individually, etc.

Retrieval systems

Semi-sorted

Adaptive thresholding

Your most basic thesholding into a boolean image using a single, global threshold value will do what you wish only on clean examples.

No matter how clever that value is chosen, it implicitly implicitly assumes that the image's values are bimodal. Consider its histogram - it'd have to have one blob for background, one blob for the object.

A good example of something that will mess up that assumption is lighting conditions that adds and overall gradient to the image (a gradient less pronounced than the detail, that is. With a gradient more pronounced than the detail, you probably don't want to )

Informed massaging beforehand can help, of course.

Dynamic thresholding, a.k.a. adaptive thresholding, is basically such massaging as part of the thresholding.

It often amounts to calculate a threshold per pixel, based on local neighbourhood. Even "is the pixel some offset above the local average" already works a lot better, and seems to be the common implementation

There's a good example at https://scikit-image.org/docs/0.12.x/auto_examples/segmentation/plot_threshold_adaptive.html

Optical flow

https://en.wikipedia.org/wiki/Optical_flow

Halide

Halide is image processing in a declarative way, splitting algorithm from its execution+optimization. which can be really handy when you want graphics pipeline optimization without having to spend hours at low level, which may turn out to be more platform-specific than you thought.

You write code against its API (C++, but there are bindings for other languages) and uses llvm(verify) for compilation to varied platforms/environments, like x86/SSE, ARM v7/NEON, CUDA, OpenCL, OpenGL, and some more specific experimental ones.

Image - unsorted: Difference between revisions

Latest revision as of 00:45, 21 April 2024

Color conversion

Multiple related images

Median of pixel along set of images

Differential image

Superresolution

Using multiple exposures

Learning-based superresolution

Scaling down well

Optical/diffractive superresolution

See also

HDR and exposure fusion

Motion descriptors

Object tracking

Transforms mostly used to support others

Morphological image processing

Whole-image transforms

Gamma compression as a perceptive estimator

bandpass, blur, median

Variance image

Convolution

Fourier transform

Gabor

Difference of Gaussians (DoG)

Laplacian of Gaussian (LoG)

Determinant of Hessian (DoH)

Radon transform

Not-specifically-image processing

RANSAC

Kalman filter

Nontrivial goals

Edge-aware transforms

Image registration

Image similarity

Near-duplicate detection

Degree of image similarity

Image segmentation

Quick shift

SLIC

Felzenszwalb

Object detection

Retrieval systems

Semi-sorted

Adaptive thresholding

Optical flow

Halide

Navigation menu