Image feature and contour detection

From Helpful
Jump to: navigation, search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Digital sound and processing: capture, storage, reproduction · programming and codescs · some glossary · Audio and signal processing - unsorted stuff

Electronic music: Some history, ways of making noises · Gaming synth · on APIs (and latency) ··· microphones · studio and stage notes · Effects · sync ·

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround ·

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Feature detection and description

Related tasks

Template matching

Classical features

The classical set of features are (a subset of) things that happen at few-pixel scale:

  • Points -
  • Blobs - smooth areas that won't (necessarily) be detected by point detection. Their approximate centers may also be considered interest points
  • Edges -
    • a relatively one-dimensional feature, though with a direction
  • Corners - Detects things like intersections and ends of sharp lines
    • a relatively two-dimensional kind of feature
  • Ridges -
  • Interest point - could be said to by any of the above, and anything else you can describe clearly enough
    • preferably has a clear definition
    • has a well-defined position
    • preferably quite reproducible, that is, stable under relatively minor image alterations such as scale, rotation, translation, brightness.
    • useful in their direct image context - corners, endpoints, intersections
  • Region of interest
    • any subrange (1D), area (2D), volume (3D), etc. identified for a purpose.
    • Also often in an annotative sense, not necessarily a machine-proffered one

See also:

Edge detection

  • Canny [1]
  • Differential [2]
  • Canny-Deriche [3]
  • Prewitt [4]
  • Roberts Cross operator [5]
  • Sobel [6]
  • Scharr operator - variation on Sobel that tries to deal better with rotation [Sobel_operator#Alternative_operators]


  • Marr-Hildreth [7]

Playing with (mostly python)

  • PIL has ImageFilter.FIND_EDGES (convolution-based)

Interest point / corner detection

Blob detection

Laplacian of Gaussian (LoG)
Difference of Gaussians (DoG)
Determinant of Hessian (DoH)
MSER (Maximally Stable Extremal Regions)

Detects covariant regions, areas that are stable connected part of gray levels.

Primarily a region/blob detector. Sensitive to blur.

Decent performance.

See also:


Principal curvature-based region detector

Harris affine

Hessian affine

Dense descriptors

Dense meaning it describes the whole image a patch at a time. As opposed to sparse, meaning for selective areas (often features).

The distinction can be subtle - dense may just mean we don't necessarily assume that we can reliably selecting good features/areas to study.

Any overall descriptor used locally

...color, texture, or such.

Lets you

  • describe the variation of said descriptors within an image
  • focus on areas where things are happening,

Image gradient

At each point in an image, you can calculate where the local gradient is going towards -essentially a vector.

In theory based on the local derivative, in practice a discrete differentiation operator, such as Sobel or Prewitt (or other kernel-style things - actually quite akin to edge detection that isn't particularly tuned to a single direction (as some are).

Kernel-based methods tend to work on at least 3x3 pixel areas, though may be larger depending on application.

Histogram of Oriented Gradients (HOG)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Refers to the general idea of locally detecting gradients, which is a concept used by a whole family of algorithms.

And to a fairly specific use, doing this for the entire image, on fixed-size, small cells (e.g. 8x8 pixel).

For each cell, we can build a histogram of how much (magnitude-wise) you saw its parts pointing in each direction (e.g. the 8 basic compass directions) -- with some footnotes like bleeding into adjacent bins to account to be resistant to aliasing.

This may well be the first step in something else, e.g. detection of certain objects by training on results.

Due to being based on differences (plus some normalization), it is fairly resistant to illumination differences.

It is somewhat sensitive to orientation. Due to its nature it's not too hard making it less resistant, though by that time you may find SIFT more interesting.


  • R-HOG: rectangular (typically square)
  • C-HOG: circular
  • Fourier HOG
Rotation invariant

See also:


See also:

Sparse/local descriptors

Sparse meaning in describes local areas, and is selective about what parts, as opposed to doing so for the whole image.

Feature description for things like image comparison is based on the idea that considering all points in an image for description is infeasible, so informative points are chosen instead. The challenge then becomes choosing highly informative and stable points.

SIFT (Scale-Invariant Feature Transform)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Read up on local gradients, particularly HOG.

SIFT continues that idea by analyzing the area around an already chosen point of interest -- often after deciding the rotation and scale of the patch it will be analysing based on local content.(verify)

SIFT is often a first step in something else, such as object recognition (often bag-of-words style), is used to align similar images in cooperation with RANSAC,

See also:

uses color information, giving more stable features around color contrast
uses PCA instead of the gradient histogram, and its output is more compact
GSIFT adds global context to each keypoint (verify)
features are robust to more affine transforms(verify)
  • See also SURF - has a similar goal but uses different methods for most steps
  • See also SPIN, RIFT (but SIFT usually performs better(verify))
  • See also FIND, MIFT

GLOH (Gradient Location and Orientation Histogram)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


See also:

SURF (Speeded Up Robust Features)

faster than SIFT, performs similarly

See also

LESH (Local Energy based Shape Histogram)

FAST (Features from Accelerated Segment Test)

Mainly a feature detector

E Rosten, T Drummond (2006) "Machine learning for high-speed corner detection"


A Olivia, A Torralba (2001) "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope" [13]

BRIEF (Binary Robust Independent Elementary Features)

M Calonder et al. (2010) "BRIEF: Binary robust independent elementary features"

ORB (Oriented FAST and Rotated BRIEF)

Offered as an efficient alternative to SIFT (and SURF), and also not patented.

See also:


Unsorted (local descriptors)

K Mikolajczyk, C Schmid (2005) "A Performance Evaluation of Local Descriptors"

Combining descriptors

Indexing descriptors and/or making descriptors more compact, for retrieval systems and/or fingerprint-style descriptors often meaning a useful lower-dimensional representation.


Fischer vector

Vector of Locally Aggregated Descriptors (VLAD)

Unsorted (descriptors)

  • Structure tensor

Scale space

Scale space is a concept that makes detection of things work at multiple/varied scales.

Roughly speaking, it's a series of images that lowpassed to different degrees, also in part because that makes detected coordinates work on each image.

In practice it can also be scaled down (implies lowpass), if the algorithm it's supporting deals with that more easily (e.g. always looks at few-pixel scale, can't tweak how many). Note that scaledown and lowpass are not identical. A gaussian filter is fairly ideal in terms of frequency information, (which is why scale space is often specifically gaussian scale space), scaledowns can introduce some spurious, jagged-like information (varies with scaledown method). So in some cases the scaledown happens after filtering.

Motivations include:

  • Most current feature recognition works on a small scale (and often in terms of pixels). We'd like to also detect larger objects, without doing complex compositional things.
  • When we look at a scene, the fact that we recognize objects means we look at it at different scales.
e.g. from a distance we might identify the house, close-up we'll look at the door.
  • "when you squint", or see from a distance, or zoom out, that's essentially a lowpass

It turns out that anything that you can do via differentials (such as common feature detectors (edge, ridge, corner, etc.)) can be done without a rescale.

See also:

Stroke Width Transform (SWT)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

For each pixel, finds the likeliest stroke width containing that pixel. Somewhat aware of direction, and often part of letter detection.

Uses edge and gradient map.


  • not tied to detecting text of a specific size, can deal with rotation and skew
  • not overly sensitive to background gradients


  • slow (because of the intermediate maps)
  • Tends to assume hard contrast (and may assume text is much darker)

Hough transform

Finds imperfect versions of regular features like lines (first version did only lines), circles, ellipses. Essentially votes in a feature space.

Kernel-based Hough transform (KHT)

Contour detection