OCR: Difference between revisions

Revision as of 19:21, 15 July 2023

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCR as a task

Software

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCRopus

document OCR (used in Google Books, Internet Archive)

multifont, multilanguage

https://en.wikipedia.org/wiki/OCRopus

Tesseract

document OCR

https://opensource.google.com/projects/tesseract

https://en.wikipedia.org/wiki/Tesseract_(software)

CuneiForm

https://en.wikipedia.org/wiki/CuneiForm_(software)

keras-ocr

https://keras-ocr.readthedocs.io/en/latest/

EasyOCR

https://github.com/JaidedAI/EasyOCR

ABBYY (FineReader)

Google Docs OCR

Rossum

paid, online-only?

Amazon Rekognition

more for scene text?(verify)

paid, online-only

Amazon Textract

Convenience tools / wrappers

Powertoys's Text Extractor

from screen capture. More of a convenience tool

for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text

Lios

-->

Output formats

hOCR

A (HTML-based) format to store detected words/fragments of text's position, and optionally detected style, layout, and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.

https://en.wikipedia.org/wiki/HOCR

https://pypi.org/project/hocr-spec/

@@ Line 103: / Line 103: @@
 keras-ocr
-:
+: https://keras-ocr.readthedocs.io/en/latest/
 EasyOCR
-:
+: https://github.com/JaidedAI/EasyOCR

OCR: Difference between revisions

Revision as of 19:21, 15 July 2023

Contents

OCR as a task

Software

Convenience tools / wrappers

Output formats

hOCR

ALTO

PAGE XML

abbyyXML

Navigation menu