Image descriptors: Difference between revisions
mNo edit summary |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 296: | Line 296: | ||
--> | --> | ||
====MPEG-7 | ====MPEG-7 color descriptors==== | ||
A color histogram was used in development, but for many uses this is too high-dimensional. | A color histogram was used in development, but for many uses this is too high-dimensional. | ||
A color space is settled in each descriptor, because not doing so would hurt interoperability. | |||
* '''Scalable Color Descriptor (SCD)''' | |||
:: Defined in HSV | |||
* '''Color Structure Descriptor (CSD)''' | |||
:: Defined in [[HMMD]] | |||
* '''Color Layout Descriptor (CLD)''' | |||
:: YCbCr space | |||
See also: | See also: | ||
* A Buturovic (2005), {{search|MPEG 7 Color Structure Descriptor for visual information retrieval project VizIR}} | * A Buturovic (2005), {{search|MPEG 7 Color Structure Descriptor for visual information retrieval project VizIR}} | ||
====Dominant colors==== | ====Dominant colors==== |
Latest revision as of 17:38, 6 May 2024
Image summaries and analysis
Histograms
Co-occurrence matrices
Studies pixel-level texture.
Often of grayscale images, then called GLCM (Grey Level Co-occurrence Matrix).
With some data massaging it is useful for various other things.
Studying only pairs of adjacent pixels may sound like it'd be very noise-sensitive, but works better than you may think in part because most images are large enough to have a lot of pairs.
That said, there are some limitations and tweaks you should understand before you can apply this robustly.
(And there is value in pre-processing, but note that e.g. blur (and size down, because it's also effectively a lowpass) will mostly just move things towards the main diagonal -- which is related to consistent areas and gradients, which are already pronounced to start with for most images)
The output is a symmetric n-by-n matrix, where
- each value in the matrix has counted how often the two values (indicated by the row and column of its location) co-occur.
- n is the quantization level - usually 256 since you would typically run this on 8-bit images.
- though quantizing to fewer levels is useful in some applications
There are some overall image properties you can calculate from that matrix, including:
- Contrast (in nearby pixels)
- The sum-of-squares variance
- Homogeneity (of nearby pixels)
- Which is basically the same as 'to what degree are the matrix values in the diagonal'
- also a good indication of the amount of pixel-level noise
- Entropy (within nearby pixels)
- Energy, a.k.a. uniformity
- sum of squares
- Value range: 0..1, 1 for constant image
- Correlation
- ...of nearby pixels, meaning (verify)
- The amount of bins that are filled can also give you a decent indication of how clean an image is.
- ...e.g. giving you some indication of whether it has a single solid background color, and where it is on the spectrum of two-level, paletted, clip-art with little-blur, badly JPEG-compressed diagram, photo.
- Directionality of an image, if you calculate values for different directions and see variance of such values.
Further notes:
- Direction of the pixel pairs can matter for images with significant noise, high-frequency content, and/or extreme regularity.
- Usually the implementation allows you to give either a x,y pixel offset (for e.g. 'take a pixel, and the pixel one to the right and one down') or an angle.
- Some implementations also allow calculation for various distances. This is quite similar to running the calculation over resized images, but may be somewhat faster.
See also:
Unsorted (summaries)
Detecting manipulation
Error level analysis
Principal Component Analysis
Demosaic analysis
See also
Whole-image descriptors
Describes all of the image, as opposed to describing specific features you found.
...there is natural overlap with dense descriptors, which analyse a whole image in independent patchs.
Color descriptors
MPEG-7 color descriptors
A color histogram was used in development, but for many uses this is too high-dimensional.
A color space is settled in each descriptor, because not doing so would hurt interoperability.
- Scalable Color Descriptor (SCD)
- Defined in HSV
- Color Structure Descriptor (CSD)
- Defined in HMMD
- Color Layout Descriptor (CLD)
- YCbCr space
See also:
- A Buturovic (2005), MPEG 7 Color Structure Descriptor for visual information retrieval project VizIR
Dominant colors
Color quantization
Clustering
Merged histogram
Unsorted
Image entropy
Can mean
- the overall entropy of a whole block
- for each pixel, the entropy of the values around it
Roughly an estimate of local contrast / texture.
Image moments
Gist
Texture descriptors
Texture is harder to quantify than you may expect.
Some methods are much easier to apply to constrained single purposes, e.g. medical imaging, than it is for arbitrary images.
Some things work better in lab conditions (e.g. recognize known textures), some work well enough to e.g. recognize differences in areas in a picture, but to robustly (e.g. scales, rotation, and lighting-invariant) label textures is hard.
Because most images will contain a mix of textures,
you may want to use texture segmentation and texture description only within each,
or get really muddy descriptions.
MPEG-7 (texture stuff)
HTD (Homogeneous Texture Descriptor)
- idea: use fourier analysis to get basic frequency and direction information
- in frequency space (2D FFT amplitudes), divide into 5 octave-style rings, and 6 directions
- making for 30 bins
- plus two: overall mean and stdev
EHD (Edge Histogram Descriptor)
- idea: detect which direction detected edges go, make a histogram of that; useful for overall comparisons.
Roughly
- resample to comparable size
- divide image into 16 parts (4 by 4).
- Within each part...
- do a 5 specific edge detections (horizontal, vertical, both diagonals, and isotropic) with 2x2 pixel kernels
- at each pixel, count the strongest response of those (above a threshold)
- sum the counts of such strongest response per the 16 larger parts
- makes for 16*5 bins, quantized to 3 bits, so 240-bits
- extension: combine
- global
- per row (4 of them)
- per column (4 of them)
- for 2x2 (5 of them - the four non-overlapping corners, and the center)
- ...so (1+4+4+5)*5 = 70 bins more
TBD (Texture Browsing Descriptor)
- idea: indicators of perceptual directionality, regularity, and coarseness of a texture
-
- most dominant texture orientations (0 meaning none, 1..6 meaning 0..150 degrees in steps of 30)
- second most dominant texture orientations (0 meaning none, 1..6 meaning 0..150 degrees in steps of 30) (optional)
- regularity in first orientation
- regularity in second orientation (optional)
- coarseness
- Implementation based on bank of orientation- and scale-tuned Gabor filters.