OCR

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCR as a task

Software

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCRopus

document OCR (used in Google Books, Internet Archive)

multifont, multilanguage

https://en.wikipedia.org/wiki/OCRopus

Tesseract

document OCR

https://opensource.google.com/projects/tesseract

https://en.wikipedia.org/wiki/Tesseract_(software)

CuneiForm

https://en.wikipedia.org/wiki/CuneiForm_(software)

keras-ocr

https://keras-ocr.readthedocs.io/en/latest/

EasyOCR

https://github.com/JaidedAI/EasyOCR

ABBYY (FineReader)

paid

https://pdf.abbyy.com/

Google Docs OCR

online-only

Rossum

paid, online-only?

https://rossum.ai/lp/ocr-software/

Amazon Rekognition

more for scene text?(verify)

paid, online-only

Amazon Textract

Convenience tools / wrappers

Powertoys's Text Extractor

from screen capture. More of a convenience tool

for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text

Lios

Document managers

Apache Tika

geared at content analysis and indexing (also metadata/document structure parser)

uses tesseract for OCR

https://tika.apache.org/

Aleph

https://docs.aleph.occrp.org/

-->

Output formats

hOCR

A (HTML-based) format to store detected words/fragments of text's position, and optionally detected style, layout, and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.

https://en.wikipedia.org/wiki/HOCR

https://pypi.org/project/hocr-spec/

OCR

Contents

OCR as a task

Software

Convenience tools / wrappers

Document managers

Output formats

hOCR

ALTO

PAGE XML

abbyyXML

Navigation menu