OCR

From Helpful
Jump to navigation Jump to search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical terms
MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCR as a task

Software

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


OCRopus

document OCR (used in Google Books, Internet Archive)
multifont, multilanguage
https://en.wikipedia.org/wiki/OCRopus

Tesseract

document OCR
https://opensource.google.com/projects/tesseract
https://en.wikipedia.org/wiki/Tesseract_(software)

CuneiForm

https://en.wikipedia.org/wiki/CuneiForm_(software)

keras-ocr

https://keras-ocr.readthedocs.io/en/latest/

EasyOCR

https://github.com/JaidedAI/EasyOCR



ABBYY (FineReader)

paid
https://pdf.abbyy.com/

Google Docs OCR

online-only

Rossum

paid, online-only?
https://rossum.ai/lp/ocr-software/

Amazon Rekognition

more for scene text?(verify)
paid, online-only

Amazon Textract

more for documents?(verify)
paid, online-only

Transym

more for documents?(verify)
paid, online-only
https://transym.com/



Integrated features / online APIs (i.e. not easy to automate)

Acrobat,
Google Keep,
Google Drive ('open with' converts),
OneNote,
IBM datacap[1],


Convenience tools / wrappers

Powertoys's Text Extractor

from screen capture. More of a convenience tool
for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text

Lios


Document managers with OCR

Output formats

hOCR

A (HTML-based) format to store detected words/fragments of text's position, and optionally detected style, layout, and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.


https://en.wikipedia.org/wiki/HOCR

https://pypi.org/project/hocr-spec/

ALTO

https://en.wikipedia.org/wiki/ALTO_(XML)

PAGE XML

https://en.wikipedia.org/wiki/PAGE_(XML)

abbyyXML

https://support.abbyy.com/hc/en-us/articles/360017336699-ABBYY-FineReader-Engine-XML-Export