The physical and human spects dealing with audio, video, and images
Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff
Image: file formats
· noise reduction
· halftoning, dithering
· illuminant correction
· Image descriptors
· Reverse image search
· image feature and contour detection
· OCR
· Image - unsorted
Video: format notes · encoding notes · On display speed · Screen tearing and vsync
Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music
Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction
Digital sound and processing:
capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff
Music electronics:
device voltage and impedance, audio and otherwise ·
amps and speakers ·
basic audio hacks ·
Simple ADCs and DACs ·
digital audio ·
multichannel and surround
On the stage side: microphones · studio and stage notes ·
Effects ·
sync
Electronic music:
- Electronic music - musical terms
- MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth
- Modular synth (eurorack, mostly):
- sync · power supply · formats (physical, interconnects)
- DAW: Ableton notes · MuLab notes · Mainstage notes
Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification
For more, see Category:Audio, video, images
|
✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
OCR as a task
Software
✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
OCRopus
- document OCR (used in Google Books, Internet Archive)
- multifont, multilanguage
- https://en.wikipedia.org/wiki/OCRopus
Tesseract
- document OCR
- https://opensource.google.com/projects/tesseract
- https://en.wikipedia.org/wiki/Tesseract_(software)
CuneiForm
- https://en.wikipedia.org/wiki/CuneiForm_(software)
keras-ocr
EasyOCR
ABBYY (FineReader)
Google Docs OCR
Rossum
- paid, online-only?
Amazon Rekognition
- more for scene text?(verify)
- paid, online-only
Amazon Textract
- more for documents?(verify)
- paid, online-only
Transym
- more for documents?(verify)
- paid, online-only
Apache Tika
- geared at content analysis and indexing (also metadata/document structure parser)
- uses tesseract for OCR
- https://tika.apache.org/
Integrated features / online APIs (i.e. not easy to automate)
Acrobat,
Google Keep,
Google Drive ('open with' converts),
OneNote,
IBM datacap[1],
Abbyy,
Convenience tools / wrappers
Powertoys's Text Extractor
- from screen capture. More of a convenience tool
- for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text
Lios
-->
Output formats
hOCR
A (HTML-based) format to store detected words/fragments of text's position,
and optionally detected style, layout, and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.
https://en.wikipedia.org/wiki/HOCR
https://pypi.org/project/hocr-spec/
ALTO
https://en.wikipedia.org/wiki/ALTO_(XML)
PAGE XML
https://en.wikipedia.org/wiki/PAGE_(XML)
abbyyXML
https://support.abbyy.com/hc/en-us/articles/360017336699-ABBYY-FineReader-Engine-XML-Export