OCR: Difference between revisions

Revision as of 19:20, 15 July 2023

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCR as a task

Software

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCRopus

document OCR (used in Google Books, Internet Archive)

multifont, multilanguage

https://en.wikipedia.org/wiki/OCRopus

Tesseract

document OCR

https://opensource.google.com/projects/tesseract

https://en.wikipedia.org/wiki/Tesseract_(software)

CuneiForm

https://en.wikipedia.org/wiki/CuneiForm_(software)

keras-ocr

EasyOCR

ABBYY (FineReader)

Google Docs OCR

Rossum

paid, online-only?

Amazon Rekognition

more for scene text?(verify)

paid, online-only

Amazon Textract

Convenience tools / wrappers

Powertoys's Text Extractor

from screen capture. More of a convenience tool

for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text

Lios

-->

Output formats

hOCR

A (HTML-based) format to store detected words/fragments of text's position, and optionally detected style, layout, and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.

https://en.wikipedia.org/wiki/HOCR

https://pypi.org/project/hocr-spec/

@@ Line 86: / Line 86: @@
 -->
 ===Software===
-<!--
+{{stub}}
@@ Line 99: / Line 98: @@
 : https://opensource.google.com/projects/tesseract
 : https://en.wikipedia.org/wiki/Tesseract_(software)
+CuneiForm
+: https://en.wikipedia.org/wiki/CuneiForm_(software)
 keras-ocr
@@ Line 107: / Line 109: @@
-[https://learn.microsoft.com/en-us/windows/powertoys/text-extractor Powertoys's Text Extractor]
-: from screen capture.  More of a convenience tool
-: for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text
-CuneiForm
-: https://en.wikipedia.org/wiki/CuneiForm_(software)
@@ Line 155: / Line 147: @@
 IBM datacap[https://www.ibm.com/products/data-capture-and-imaging],
   Abbyy,
+====Convenience tools / wrappers====
+[https://learn.microsoft.com/en-us/windows/powertoys/text-extractor Powertoys's Text Extractor]
+: from screen capture.  More of a convenience tool
+: for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text
+[https://github.com/zendalona/lios Lios]
 -->
 ===Output formats===

OCR: Difference between revisions

Revision as of 19:20, 15 July 2023

Contents

OCR as a task

Software

Convenience tools / wrappers

Output formats

hOCR

ALTO

PAGE XML

abbyyXML

Navigation menu