Image - unsorted: Difference between revisions
mNo edit summary |
(No difference)
|
Latest revision as of 00:45, 21 April 2024
A lot of this is experiment and work in progress, and very little of it has been tested to academic or even pragmatic standards. Don't trust any of it without testing it yourself.
It's also biased to Python, because I like rapid prototyping. I can always write fast code later.
Color conversion
...because some spaces are much more sensible to work in, due to linear distance being closer to perceptual distance than the more standard spaces.
Median of pixel along set of images
...emphasizing the consistent/still areas, which is typically what you would consider the background.
The common example is "in a still scene with some tourists wandering about, make your camera do one two dozen photos over a minute, and median them", because most of the pixels will be very stable, and the people moving about will be outliers. (note that anyone who was sitting in one spot will probably become blurry because they'll be a composite from multiple photos)
Differential image
Often refers to keeping track of a longer-term average, and subtracting individual frames from it.
This takes out everything that's been there consistency (...lately), and highlights details in areas with movement.
One example is stationary traffic videos, focusing on mainly the cars, because it easily removes entirely-static things like the roads, signs, lane detail, and also static-on-the-terms-of-minutes such as lighting gradients.
Superresolution
There are various distinct things called superresolution.
From the information-theoretical view, you can split these into:
Using multiple exposures
Learning-based superresolution
Scaling down well
Optical/diffractive superresolution
Plays with the diffraction limit of the optics of a system
See also
- https://en.wikipedia.org/wiki/Super-resolution_imaging
- https://newatlas.com/super-resolution-weizmann-institute/23486/
- https://en.wikipedia.org/wiki/Super-resolution_microscopy
HDR and exposure fusion
Our eyes are good at adapting locally to the amount of light, e.g. seeing details in a dark room even there's also a bright window in our view, in part because our eyes have roughly logarithmic response(verify), in part because there are different areas with different sensitivity, and also because we're used to exploiting these specifics, intuitively.
Film and digital sensors aren't good at this. It's so much easier to create sensors with overall linear response, but also because that makes sense for fast response, wide applicability, and capturing what's there accurately. The global response means that a scene with e.g. a bright window in an otherwise dark room scene they will suck at - they would probably adjust to the bright window, which would wash out the dark bits, and has no obvious way to cheat to imitate our eyes. (Or adjust to the dark detail, and have one mighty overexposed window) Our eyes aren't actually much better, but they adapt somewhat more locally, and somewhat more smoothly over time.
High Dynamic Range roughly imitate our eyes, by cheating a bit.
HDR photography takes images with different exposure (e.g. window-nice-and-dark-washed-out, details-in-dark-and-window-way-overexposed), and synthesizes an image that has detail in both areas, roughly by locally using detail from the image that seems to give more detail.
Exposure fusion has only that the 'piece together more detail' goal.
High Dynamic Range (HDR) goes further, tries to create an image with more dynamic range than any of its input images,
and (via some more complex steps) producing an image with more dynamic range than monitors can show - and that you could change the brightness of after the fact, to e.g. focus on details in that window or the room.
When the purpose is human viewing, this is often done by being nonlinear the same way our eyes are. And often a little more aggressively than our own eyes would be, which tends to have a side effects that look like halos/blooming and some areas having unnatural contrast.
Another purpose is reproduction, e.g. in HDR rendering in 3D games. For most of us it doesn't show you more dynamic range (most monitors can't), but preserves more lighting range throughout enough of the pipeline to make an informed decision about use of the range - which tends to mean less washing out of darkest or lightest areas (and makes those "going from dark to light" scenes look more realistic), and means you can program those without having to cheat heavily.
Some of these techniques are now common because they help things look good for relatively little extra processing. Some are much fancier and GPU-intense. It's a spectrum. See e.g. HDR rendering
Motion descriptors
Object tracking
Transforms mostly used to support others
Morphological image processing
See also:
- http://en.wikipedia.org/wiki/Mathematical_morphology
- http://en.wikipedia.org/wiki/Erosion_(morphology)
- http://en.wikipedia.org/wiki/Dilation_(morphology)
- http://en.wikipedia.org/wiki/Topological_skeleton
- http://en.wikipedia.org/wiki/Morphological_Gradient
- http://en.wikipedia.org/wiki/Watershed_(algorithm)
- http://www.esiee.fr/~info/tw/
- http://cmm.ensmp.fr/~beucher/wtshed.html
- http://en.wikipedia.org/wiki/Grassfire_Transform
- http://toyhouse.cc/profiles/blogs/object-skeleton-grassfire-transform
- http://www.sccs.swarthmore.edu/users/02/jill/grassfire/grassfirexform.html
Whole-image transforms
Gamma compression as a perceptive estimator
bandpass, blur, median
For color analysis we often want to focus on the larger blobs and ignore small details. (though in some cases they can fall away in statistics anyway).
Variance image
Of a single image: Each pixel defined by variance in nearby block of pixels. Good at finding sharp details and ignoring gradients
(Sometimes refers to variance of a pixel within a stack of related images)
http://siddhantahuja.wordpress.com/2009/06/08/compute-variance-map-of-an-image/
Convolution
Fourier transform
Gabor
https://en.wikipedia.org/wiki/Gabor_filter
Difference of Gaussians (DoG)
DoG (Difference of Gaussians) takes an image, makes two gaussian-lowpass-filtered results with different sigmas, and subtracts them from each other.
This is much like bandpass, in that it preserves the spatial information in a range relating to the sigmas/radii.
Often mentioned in the context of edge detection.
In that case, there may be further tweaking in the sigmas, and further steps in cleaning and counting zero crossings.
Compare Laplacian of Gaussian.
See also:
Laplacian of Gaussian (LoG)
The Laplacian reacts to local rapid change. Since this makes it very sensitive to noise, it is often seen with some smoothing first, e.g. in the form of the LoG (Laplacian of Gaussian)
Determinant of Hessian (DoH)
Radon transform
Not-specifically-image processing
...that find use here, often generic signal processing.
RANSAC
Kalman filter
Nontrivial goals
Particularly those that stand or fall by their assumptions.
Edge-aware transforms
Image registration
Image registration[1] is the fancy name for "aligning two nearly identical images"
In some cases, e.g. various astronomy, canbe well constrained
so you can get a lot of use out of assuming that only a moderate amount of translation (and implied edge cropping) happens,
and no scaling and no (or almost no) rotation.
Which is relatively simple and controlled. This is often done with something like cross-correlation and often phase correlation.
It gets more complex if you want to solve for cropping, rotation, uniform and/or non-uniform scale - typically on top of translation.
The combination often means you need an iterative approach, and note that this is not a convex problem -- there are potentially local minima that that may not the optimal point, or plain nonsense, so simplex-type solutions will not always work without some guiding assumptions / context-informed filtering.
For example,
- a bunch of photos from a tripod will usually see no worse than a few-pixel shift, and possibly a tiny rotation.
- handheld more of both
- internet-reposts see a bunch of rescaling and cropping (though rarely rotation)
- a series of successive frames from an electron microscope may see a shift in the stage
- and sometimes of parts of the sample (e.g. in cryo-EM in reaction to the beam)
- ...yet usually a shift-only, constrained-within-a-few-pixels solution already goes a long way
See also:
- "An FFT-based technique for translation, rotation and scale-invariant image registration"
- "An IDL/ENVI implementation of the FFT-based algorithm for automatic image registration"
- "Image Registration Using Adaptive Polar Transform"
Translation-only:
Image similarity
Near-duplicate detection
Degree of image similarity
See also:
- "High-Confidence Near-Duplicate Image Detection"
- "Near Duplicate Image Detection: min-Hash and tf-idf Weighting"
- http://www.ee.columbia.edu/ln/dvmm/researchProjects/FeatureExtraction/NearDuplicateByParts/INDDetection.html
- http://www.ee.columbia.edu/ln/dvmm/researchProjects/FeatureExtraction/NearDuplicateDetection/NearDuplicateDetection.htm
http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p208-lv.pdf
Image segmentation
Image segmentation takes an image partitions pixels into segments, where pixels with the same label share useful characteristics, typically to isolate main/foreground objects for further analysis, and/or deal with textures.
Quick shift
SLIC
Felzenszwalb
Object detection
'Object detection' tends to refer to detecting anything more complex than a point, edge, blob, or corner.
Recent study has also started to consider the compositional nature of objects.
Image segmentation splits an image into regions.
Depending on the task this can be or help object detection, be/help texture detection, ignore background, separate objects/textures to help process each individually, etc.
Retrieval systems
Semi-sorted
Adaptive thresholding
Your most basic thesholding into a boolean image using a single, global threshold value will do what you wish only on clean examples.
No matter how clever that value is chosen, it implicitly implicitly assumes that the image's values are bimodal. Consider its histogram - it'd have to have one blob for background, one blob for the object.
A good example of something that will mess up that assumption is lighting conditions that adds and overall gradient to the image (a gradient less pronounced than the detail, that is. With a gradient more pronounced than the detail, you probably don't want to )
Informed massaging beforehand can help, of course.
Dynamic thresholding, a.k.a. adaptive thresholding, is basically such massaging as part of the thresholding.
It often amounts to calculate a threshold per pixel, based on local neighbourhood. Even "is the pixel some offset above the local average" already works a lot better, and seems to be the common implementation
There's a good example at
https://scikit-image.org/docs/0.12.x/auto_examples/segmentation/plot_threshold_adaptive.html
Optical flow
https://en.wikipedia.org/wiki/Optical_flow
Halide
Halide is image processing in a declarative way, splitting algorithm from its execution+optimization. which can be really handy when you want graphics pipeline optimization without having to spend hours at low level, which may turn out to be more platform-specific than you thought.
You write code against its API (C++, but there are bindings for other languages) and uses llvm(verify) for compilation to varied platforms/environments, like x86/SSE, ARM v7/NEON, CUDA, OpenCL, OpenGL, and some more specific experimental ones.
See also