OCR: Difference between revisions

Latest revision as of 19:25, 15 July 2023

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCR as a task

Software

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCRopus

document OCR (used in Google Books, Internet Archive)

multifont, multilanguage

https://en.wikipedia.org/wiki/OCRopus

Tesseract

document OCR

https://opensource.google.com/projects/tesseract

https://en.wikipedia.org/wiki/Tesseract_(software)

CuneiForm

https://en.wikipedia.org/wiki/CuneiForm_(software)

keras-ocr

https://keras-ocr.readthedocs.io/en/latest/

EasyOCR

https://github.com/JaidedAI/EasyOCR

ABBYY (FineReader)

paid

https://pdf.abbyy.com/

Google Docs OCR

online-only

Rossum

paid, online-only?

https://rossum.ai/lp/ocr-software/

Amazon Rekognition

more for scene text?(verify)

paid, online-only

Amazon Textract

Convenience tools / wrappers

Powertoys's Text Extractor

from screen capture. More of a convenience tool

for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text

Lios

Document managers with OCR

Output formats

hOCR

A (HTML-based) format to store detected words/fragments of text's position, and optionally detected style, layout, and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.

https://en.wikipedia.org/wiki/HOCR

https://pypi.org/project/hocr-spec/

@@ Line 86: / Line 86: @@
 -->
 ===Software===
-<!--
+{{stub}}
@@ Line 99: / Line 98: @@
 : https://opensource.google.com/projects/tesseract
 : https://en.wikipedia.org/wiki/Tesseract_(software)
+CuneiForm
+: https://en.wikipedia.org/wiki/CuneiForm_(software)
 keras-ocr
-:
+: https://keras-ocr.readthedocs.io/en/latest/
 EasyOCR
-:
+: https://github.com/JaidedAI/EasyOCR
-[https://learn.microsoft.com/en-us/windows/powertoys/text-extractor Powertoys's Text Extractor]
-: from screen capture.  More of a convenience tool
-: for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text
-CuneiForm
-: https://en.wikipedia.org/wiki/CuneiForm_(software)
 ABBYY (FineReader)
+: paid
+: https://pdf.abbyy.com/
 Google Docs OCR
+: online-only
 Rossum
 : paid, online-only?
+: https://rossum.ai/lp/ocr-software/
 Amazon Rekognition
@@ Line 133: / Line 129: @@
 : more for documents?{{verify}}
 : paid, online-only
 Transym
 : more for documents?{{verify}}
 : paid, online-only
+: https://transym.com/
+Integrated features / online APIs (i.e. not easy to automate)
+: Acrobat,
+: Google Keep,
+: Google Drive ('open with' converts),
+: OneNote,
+: IBM datacap[https://www.ibm.com/products/data-capture-and-imaging],
+====Convenience tools / wrappers====
+[https://learn.microsoft.com/en-us/windows/powertoys/text-extractor Powertoys's Text Extractor]
+: from screen capture.  More of a convenience tool
+: for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text
+[https://github.com/zendalona/lios Lios]
+====Document managers with OCR====
+<!--
 Apache Tika
 : geared at content analysis and indexing  (also metadata/document structure parser)
@@ Line 145: / Line 166: @@
 : https://tika.apache.org/
+Aleph
+: https://docs.aleph.occrp.org/
-Integrated features / online APIs (i.e. not easy to automate)
+-->
-Acrobat,
-Google Keep,
-Google Drive ('open with' converts),
-OneNote,
-IBM datacap[https://www.ibm.com/products/data-capture-and-imaging],
- Abbyy,
--->
 ===Output formats===
 ====hOCR====
-<!--
 A (HTML-based) format to store detected words/fragments of text's position,
@@ Line 168: / Line 181: @@
 https://en.wikipedia.org/wiki/HOCR
--->
+https://pypi.org/project/hocr-spec/
 ====ALTO====

OCR: Difference between revisions

Latest revision as of 19:25, 15 July 2023

Contents

OCR as a task

Software

Convenience tools / wrappers

Document managers with OCR

Output formats

hOCR

ALTO

PAGE XML

abbyyXML

Navigation menu