Image descriptors

From Helpful
Jump to: navigation, search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codescs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music: Some history, ways of making noises · Gaming synth

Modular synth (eurorack, mostly): sync · power supply · formats (physical, interconnects)

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Image summaries and analysis


Co-occurrence matrices

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Studies pixel-level texture.

Often of grayscale images, then called GLCM (Grey Level Co-occurrence Matrix).

With some data massaging it is useful for various other things.

Studying only pairs of adjacent pixels may sound like it'd be very noise-sensitive, but works better than you may think in part because most images are large enough to have a lot of pairs.

That said, there are some limitations and tweaks you should understand before you can apply this robustly.

(And there is value in pre-processing, but note that e.g. blur (and size down, because it's also effectively a lowpass) will mostly just move things towards the main diagonal -- which is related to consistent areas and gradients, which are already pronounced to start with for most images)

The output is a symmetric n-by-n matrix, where

  • each value in the matrix has counted how often the two values (indicated by the row and column of its location) co-occur.
  • n is the quantization level - usually 256 since you would typically run this on 8-bit images.
though quantizing to fewer levels is useful in some applications

There are some overall image properties you can calculate from that matrix, including:

  • Contrast (in nearby pixels)
The sum-of-squares variance
  • Homogeneity (of nearby pixels)
Which is basically the same as 'to what degree are the matrix values in the diagonal'
also a good indication of the amount of pixel-level noise
  • Entropy (within nearby pixels)
  • Energy, a.k.a. uniformity
sum of squares
Value range: 0..1, 1 for constant image
  • Correlation
...of nearby pixels, meaning (verify)
  • The amount of bins that are filled can also give you a decent indication of how clean an image is.
...e.g. giving you some indication of whether it has a single solid background color, and where it is on the spectrum of two-level, paletted, clip-art with little-blur, badly JPEG-compressed diagram, photo.
  • Directionality of an image, if you calculate values for different directions and see variance of such values.

Further notes:

  • Direction of the pixel pairs can matter for images with significant noise, high-frequency content, and/or extreme regularity.
Usually the implementation allows you to give either a x,y pixel offset (for e.g. 'take a pixel, and the pixel one to the right and one down') or an angle.
  • Some implementations also allow calculation for various distances. This is quite similar to running the calculation over resized images, but may be somewhat faster.

See also:

Unsorted (summaries)

Detecting manipulation

Error level analysis

Principal Component Analysis

Demosaic analysis

See also

Whole-image descriptors

Describes all of the image, as opposed to describing specific features you found.

...there is natural overlap with dense descriptors, which analyse a whole image in independent patchs.

Color descriptors

MPEG-7 (color stuff)

A color histogram was used in development, but for many uses this is too high-dimensional.

Color space is settled in each descriptor, because not doing so would hurt interoperability.

Scalable Color Descriptor (SCD)

Defined in HSV

Color Structure Descriptor (CSD)

Defined in HMMD

See also:

Color Layout Descriptor (CLD)

YCbCr space

Dominant colors

Color quantization
Merged histogram


Image entropy

Can mean

  • the overall entropy of a whole block
  • for each pixel, the entropy of the values around it

Roughly an estimate of local contrast / texture.

Image moments


Texture descriptors

Texture is harder to quantify than you may expect.

Some methods are much easier to apply to constrained single purposes, e.g. medical imaging, than it is for arbitrary images.

Some things work better in lab conditions (e.g. recognize known textures), some work well enough to e.g. recognize differences in areas in a picture, but to robustly (e.g. scales, rotation, and lighting-invariant) label textures is hard.

Because most images will contain a mix of textures, you may want to use texture segmentation and texture description only within each, or get really muddy descriptions.

MPEG-7 (texture stuff)

HTD (Homogeneous Texture Descriptor)
idea: use fourier analysis to get basic frequency and direction information
in frequency space (2D FFT amplitudes), divide into 5 octave-style rings, and 6 directions
making for 30 bins
plus two: overall mean and stdev
EHD (Edge Histogram Descriptor)
idea: detect which direction detected edges go, make a histogram of that; useful for overall comparisons.


resample to comparable size
divide image into 16 parts (4 by 4).
Within each part...
do a 5 specific edge detections (horizontal, vertical, both diagonals, and isotropic) with 2x2 pixel kernels
at each pixel, count the strongest response of those (above a threshold)
sum the counts of such strongest response per the 16 larger parts
makes for 16*5 bins, quantized to 3 bits, so 240-bits
extension: combine
per row (4 of them)
per column (4 of them)
for 2x2 (5 of them - the four non-overlapping corners, and the center) (1+4+4+5)*5 = 70 bins more
TBD (Texture Browsing Descriptor)
idea: indicators of perceptual directionality, regularity, and coarseness of a texture
most dominant texture orientations (0 meaning none, 1..6 meaning 0..150 degrees in steps of 30)
second most dominant texture orientations (0 meaning none, 1..6 meaning 0..150 degrees in steps of 30) (optional)
regularity in first orientation
regularity in second orientation (optional)
Implementation based on bank of orientation- and scale-tuned Gabor filters.


Unsorted (texture)