OCR: Difference between revisions

From Helpful
Jump to navigation Jump to search
Line 103: Line 103:


keras-ocr
keras-ocr
:  
: https://keras-ocr.readthedocs.io/en/latest/


EasyOCR
EasyOCR
:  
: https://github.com/JaidedAI/EasyOCR





Revision as of 19:21, 15 July 2023

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical terms
MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

OCR as a task

Software

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


OCRopus

document OCR (used in Google Books, Internet Archive)
multifont, multilanguage
https://en.wikipedia.org/wiki/OCRopus

Tesseract

document OCR
https://opensource.google.com/projects/tesseract
https://en.wikipedia.org/wiki/Tesseract_(software)

CuneiForm

https://en.wikipedia.org/wiki/CuneiForm_(software)

keras-ocr

https://keras-ocr.readthedocs.io/en/latest/

EasyOCR

https://github.com/JaidedAI/EasyOCR



ABBYY (FineReader)

Google Docs OCR

Rossum

paid, online-only?

Amazon Rekognition

more for scene text?(verify)
paid, online-only

Amazon Textract

more for documents?(verify)
paid, online-only


Transym

more for documents?(verify)
paid, online-only


Apache Tika

geared at content analysis and indexing (also metadata/document structure parser)
uses tesseract for OCR
https://tika.apache.org/


Integrated features / online APIs (i.e. not easy to automate)

Acrobat,
Google Keep,
Google Drive ('open with' converts),
OneNote,
IBM datacap[1],


Convenience tools / wrappers

Powertoys's Text Extractor

from screen capture. More of a convenience tool
for text that comes from fonts this can work quite well, and fairly quickly, even in photographic context, though degrades quickly on more creative text

Lios



-->

Output formats

hOCR

A (HTML-based) format to store detected words/fragments of text's position, and optionally detected style, layout, and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.


https://en.wikipedia.org/wiki/HOCR

https://pypi.org/project/hocr-spec/

ALTO

https://en.wikipedia.org/wiki/ALTO_(XML)

PAGE XML

https://en.wikipedia.org/wiki/PAGE_(XML)

abbyyXML

https://support.abbyy.com/hc/en-us/articles/360017336699-ABBYY-FineReader-Engine-XML-Export