Image feature and contour detection
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
Feature detection and description
Related tasks
Template matching
Classical features
The classical set of features are (a subset of) things that happen at few-pixel scale:
- Points -
- Blobs - smooth areas that won't (necessarily) be detected by point detection. Their approximate centers may also be considered interest points
- Edges -
- a relatively one-dimensional feature, though with a direction
- Corners - Detects things like intersections and ends of sharp lines
- a relatively two-dimensional kind of feature
- Ridges -
- Interest point - could be said to by any of the above, and anything else you can describe clearly enough
- preferably has a clear definition
- has a well-defined position
- preferably quite reproducible, that is, stable under relatively minor image alterations such as scale, rotation, translation, brightness.
- useful in their direct image context - corners, endpoints, intersections
- Region of interest
- any subrange (1D), area (2D), volume (3D), etc. identified for a purpose.
- Also often in an annotative sense, not necessarily a machine-proffered one
See also:
- http://en.wikipedia.org/wiki/Feature_detection_(computer_vision)
- http://en.wikipedia.org/wiki/Corner_detection
- http://en.wikipedia.org/wiki/Edge_detection
- http://en.wikipedia.org/wiki/Blob_detection
- http://en.wikipedia.org/wiki/Ridge_detection
- http://en.wikipedia.org/wiki/Interest_point_detection
- http://en.wikipedia.org/wiki/Region_of_interest
Edge detection
- Canny [1]
- Differential [2]
- Canny-Deriche [3]
- Prewitt [4]
- Roberts Cross operator [5]
- Sobel [6]
- Scharr operator - variation on Sobel that tries to deal better with rotation [Sobel_operator#Alternative_operators]
Old:
- Marr-Hildreth [7]
Playing with (mostly python)
- PIL has ImageFilter.FIND_EDGES (convolution-based)
Interest point / corner detection
Blob detection
Laplacian of Gaussian (LoG)
Difference of Gaussians (DoG)
Determinant of Hessian (DoH)
MSER (Maximally Stable Extremal Regions)
Detects covariant regions, areas that are stable connected part of gray levels.
Primarily a region/blob detector. Sensitive to blur.
Decent performance.
See also:
PCBR
Principal curvature-based region detector
https://en.wikipedia.org/wiki/Principal_curvature-based_region_detector
Harris affine
https://en.wikipedia.org/wiki/Harris_affine_region_detector
Hessian affine
https://en.wikipedia.org/wiki/Hessian_affine_region_detector
Dense descriptors
Dense meaning it describes the whole image a patch at a time. As opposed to sparse, meaning for selective areas (often features).
The distinction can be subtle - dense may just mean we don't necessarily assume that we can reliably selecting good features/areas to study.
Any overall descriptor used locally
...color, texture, or such.
Lets you
- describe the variation of said descriptors within an image
- focus on areas where things are happening,
Image gradient
At each point in an image, you can calculate where the local gradient is going towards -essentially a vector.
In theory based on the local derivative, in practice a discrete differentiation operator, such as Sobel or Prewitt (or other kernel-style things - actually quite akin to edge detection that isn't particularly tuned to a single direction (as some are).
Kernel-based methods tend to work on at least 3x3 pixel areas, though may be larger depending on application.
https://en.wikipedia.org/wiki/Image_gradient
Histogram of Oriented Gradients (HOG)
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
Refers to the general idea of locally detecting gradients, which is a concept used by a whole family of algorithms.
And to a fairly specific use, doing this for the entire image, on fixed-size, small cells (e.g. 8x8 pixel).
For each cell, we can build a histogram of how much (magnitude-wise) you saw its parts pointing in each direction (e.g. the 8 basic compass directions) -- with some footnotes like bleeding into adjacent bins to account to be resistant to aliasing.
This may well be the first step in something else, e.g. detection of certain objects by training on results.
Due to being based on differences (plus some normalization), it is fairly resistant to illumination differences.
It is somewhat sensitive to orientation. Due to its nature it's not too hard making it less resistant, though by that time you may find SIFT more interesting.
Variations:
- R-HOG: rectangular (typically square)
- C-HOG: circular
- Fourier HOG
- Rotation invariant
- A Kanezaki, et al. "Mirror Reflection Invariant HOG Descriptors for Object Detection" (MI-HOG)
See also:
Gist
See also:
- A Olivia, A Torralba (2006), "Building the gist of a scene: the role of global image features in recognition"
Sparse/local descriptors
Sparse meaning in describes local areas, and is selective about what parts, as opposed to doing so for the whole image.
Feature description for things like image comparison is based on the idea that considering all points in an image for description is infeasible, so informative points are chosen instead. The challenge then becomes choosing highly informative and stable points.
SIFT (Scale-Invariant Feature Transform)
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
(patented)
Read up on local gradients, particularly HOG.
SIFT continues that idea by analyzing the area around an already chosen point of interest -- often after deciding the rotation and scale of the patch it will be analysing based on local content.(verify)
SIFT is often a first step in something else, such as object recognition (often bag-of-words style), is used to align similar images in cooperation with RANSAC,
See also:
- Playing with SIFT (mostly python stuff):
- D Lowe, (1999) "Object recognition from local scale-invariant features"
- A Abdel-Hakim, A Farag (2006), "CSIFT: A SIFT descriptor with color invariant characteristics"
- uses color information, giving more stable features around color contrast
- Y Ke, R Sukthankar (2004), "PCA-SIFT: A more distinctive representation for local image descriptors"
- uses PCA instead of the gradient histogram, and its output is more compact
- E Mortensen, H Deng, L Shapiro (2005), "ASIFT descriptor with global context"
- GSIFT adds global context to each keypoint (verify)
- J Morel, G Yu (2009), "ASIFT: A new framework for fully affine invariant image comparison"
- features are robust to more affine transforms(verify)
- R Ma, J Chen, Z Su (2010), "MI-SIFT: mirror and inversion invariant generalization for SIFT descriptor"
- W Zhao et al (2013), "Flip-invariant SIFT for copy and object detection" (F-SIFT)
- J Wu et al. (2013), "A Comparative Study of SIFT and its Variants"
- See also SURF - has a similar goal but uses different methods for most steps
- See also SPIN, RIFT (but SIFT usually performs better(verify))
- See also FIND, MIFT
GLOH (Gradient Location and Orientation Histogram)
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
(patented)
See also:
- http://en.wikipedia.org/wiki/GLOH
- K Mikolajczyk (2005), "A performance evaluation of local descriptors"
SURF (Speeded Up Robust Features)
faster than SIFT, performs similarly
See also
- http://en.wikipedia.org/wiki/SURF
- H Bay et al. (2006) "Surf: Speeded up robust features"
LESH (Local Energy based Shape Histogram)
http://en.wikipedia.org/wiki/LESH
FAST (Features from Accelerated Segment Test)
Mainly a feature detector
E Rosten, T Drummond (2006) "Machine learning for high-speed corner detection"
GIST
A Olivia, A Torralba (2001) "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope" [13]
BRIEF (Binary Robust Independent Elementary Features)
M Calonder et al. (2010) "BRIEF: Binary robust independent elementary features"
ORB (Oriented FAST and Rotated BRIEF)
Offered as an efficient alternative to SIFT (and SURF), and also not patented.
See also:
- E Rublee et al. (2011) "ORB: an efficient alternative to SIFT or SURF"
- https://gilscvblog.com/2013/10/04/a-tutorial-on-binary-descriptors-part-3-the-orb-descriptor/
- http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_orb/py_orb.html
YAPE
Unsorted (local descriptors)
K Mikolajczyk, C Schmid (2005) "A Performance Evaluation of Local Descriptors"
Combining descriptors
Indexing descriptors and/or making descriptors more compact, for retrieval systems and/or fingerprint-style descriptors often meaning a useful lower-dimensional representation.
Bag-of-features
Fischer vector
Vector of Locally Aggregated Descriptors (VLAD)
Unsorted (descriptors)
- Structure tensor
Scale space
Scale space is a concept that makes detection of things work at multiple/varied scales.
Roughly speaking, it's a series of images that lowpassed to different degrees, also in part because that makes detected coordinates work on each image.
In practice it can also be scaled down (implies lowpass), if the algorithm it's supporting deals with that more easily (e.g. always looks at few-pixel scale, can't tweak how many).
Note that scaledown and lowpass are not identical. A gaussian filter is fairly ideal in terms of frequency information, (which is why scale space is often specifically gaussian scale space), scaledowns can introduce some spurious, jagged-like information (varies with scaledown method).
So in some cases the scaledown happens after filtering.
Motivations include:
- Most current feature recognition works on a small scale (and often in terms of pixels). We'd like to also detect larger objects, without doing complex compositional things.
- When we look at a scene, the fact that we recognize objects means we look at it at different scales.
- e.g. from a distance we might identify the house, close-up we'll look at the door.
- "when you squint", or see from a distance, or zoom out, that's essentially a lowpass
It turns out that anything that you can do via differentials (such as common feature detectors (edge, ridge, corner, etc.)) can be done without a rescale.
See also:
Stroke Width Transform (SWT)
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
For each pixel, finds the likeliest stroke width containing that pixel. Somewhat aware of direction, and often part of letter detection.
Uses edge and gradient map.
Pro:
- not tied to detecting text of a specific size, can deal with rotation and skew
- not overly sensitive to background gradients
Con:
- slow (because of the intermediate maps)
- Tends to assume hard contrast (and may assume text is much darker)
Hough transform
Finds imperfect versions of regular features like lines (first version did only lines), circles, ellipses. Essentially votes in a feature space.
http://en.wikipedia.org/wiki/Hough_transform
Kernel-based Hough transform (KHT)
Contour detection
.