Image descriptors

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Image summaries and analysis

Histograms

Co-occurrence matrices

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Studies pixel-level texture.

Often of grayscale images, then called GLCM (Grey Level Co-occurrence Matrix).

With some data massaging it is useful for various other things.

Studying only pairs of adjacent pixels may sound like it'd be very noise-sensitive, but works better than you may think in part because most images are large enough to have a lot of pairs.

That said, there are some limitations and tweaks you should understand before you can apply this robustly.

(And there is value in pre-processing, but note that e.g. blur (and size down, because it's also effectively a lowpass) will mostly just move things towards the main diagonal -- which is related to consistent areas and gradients, which are already pronounced to start with for most images)

The output is a symmetric n-by-n matrix, where

each value in the matrix has counted how often the two values (indicated by the row and column of its location) co-occur.
n is the quantization level - usually 256 since you would typically run this on 8-bit images.

though quantizing to fewer levels is useful in some applications

There are some overall image properties you can calculate from that matrix, including:

Contrast (in nearby pixels)

The sum-of-squares variance

Homogeneity (of nearby pixels)

Which is basically the same as 'to what degree are the matrix values in the diagonal'

also a good indication of the amount of pixel-level noise

Entropy (within nearby pixels)

Energy, a.k.a. uniformity

sum of squares

Value range: 0..1, 1 for constant image

Correlation

...of nearby pixels, meaning (verify)

The amount of bins that are filled can also give you a decent indication of how clean an image is.

...e.g. giving you some indication of whether it has a single solid background color, and where it is on the spectrum of two-level, paletted, clip-art with little-blur, badly JPEG-compressed diagram, photo.

Directionality of an image, if you calculate values for different directions and see variance of such values.

Further notes:

Direction of the pixel pairs can matter for images with significant noise, high-frequency content, and/or extreme regularity.

Usually the implementation allows you to give either a x,y pixel offset (for e.g. 'take a pixel, and the pixel one to the right and one down') or an angle.

Some implementations also allow calculation for various distances. This is quite similar to running the calculation over resized images, but may be somewhat faster.

Unsorted (summaries)

Detecting manipulation

Error level analysis

Principal Component Analysis

Demosaic analysis

Whole-image descriptors

Describes all of the image, as opposed to describing specific features you found.

...there is natural overlap with dense descriptors, which analyse a whole image in independent patchs.

Color descriptors

MPEG-7 (color stuff)

A color histogram was used in development, but for many uses this is too high-dimensional.

Color space is settled in each descriptor, because not doing so would hurt interoperability.

Scalable Color Descriptor (SCD)

Defined in HSV

Color Structure Descriptor (CSD)

Defined in HMMD

Color Layout Descriptor (CLD)

YCbCr space

Dominant colors

Color quantization

Clustering

Merged histogram

Unsorted

Image entropy

Can mean

the overall entropy of a whole block
for each pixel, the entropy of the values around it

Roughly an estimate of local contrast / texture.

Image moments

Gist

Texture descriptors

Texture is harder to quantify than you may expect.

Some methods are much easier to apply to constrained single purposes, e.g. medical imaging, than it is for arbitrary images.

Some things work better in lab conditions (e.g. recognize known textures), some work well enough to e.g. recognize differences in areas in a picture, but to robustly (e.g. scales, rotation, and lighting-invariant) label textures is hard.

Because most images will contain a mix of textures, you may want to use texture segmentation and texture description only within each, or get really muddy descriptions.

MPEG-7 (texture stuff)

HTD (Homogeneous Texture Descriptor)

idea: use fourier analysis to get basic frequency and direction information

in frequency space (2D FFT amplitudes), divide into 5 octave-style rings, and 6 directions

making for 30 bins

plus two: overall mean and stdev

EHD (Edge Histogram Descriptor)

idea: detect which direction detected edges go, make a histogram of that; useful for overall comparisons.

Roughly

resample to comparable size

divide image into 16 parts (4 by 4).

Within each part...

do a 5 specific edge detections (horizontal, vertical, both diagonals, and isotropic) with 2x2 pixel kernels

at each pixel, count the strongest response of those (above a threshold)

sum the counts of such strongest response per the 16 larger parts

makes for 16*5 bins, quantized to 3 bits, so 240-bits

extension: combine

global

per row (4 of them)

per column (4 of them)

for 2x2 (5 of them - the four non-overlapping corners, and the center)

...so (1+4+4+5)*5 = 70 bins more

TBD (Texture Browsing Descriptor)

idea: indicators of perceptual directionality, regularity, and coarseness of a texture

most dominant texture orientations (0 meaning none, 1..6 meaning 0..150 degrees in steps of 30)

second most dominant texture orientations (0 meaning none, 1..6 meaning 0..150 degrees in steps of 30) (optional)

regularity in first orientation

regularity in second orientation (optional)

coarseness

Implementation based on bank of orientation- and scale-tuned Gabor filters.