Voice recognition

The earliest talking products, often toys, would basically be the most minimal viable record players, playing fixed phrases from fixed plastic records, wind-up and non-electronic.

Speech ICs - mostly early ones

Around the seventies, it became viable to create ICs that produces speech, though it was fairly basic.

Synthesizing arbitrary words is hard to do, because you then need

some way to map all words to phonemes

and languages like english have a lot of weird exceptions even just in common words

unseen words

plus phoneme blending

just playing phonemes

decent intonation

is often contextual

Earlier ICs had a set of known words that mapped to which phonemes and transitions to use.

Some variants accepted only phonemes input in the first place (allophone style, usually).

There were also slightly more capable variants used in speech research.

TMC0280 / TMS5100 and related (1978)

speak and spell, Apple II Echo 2, arcade games

while these do linear predictive coding[2] style synthesis, these are essentially vocal tract parameters, not phonemes

so these were typically driven from a fixed vocabulary of around 200 words, and letters

though apparently a few things made it do more arbitrary things?

Emulation: Yes, e.g. by MAME

TMS5220 (1980), TMS5220C (1983), TSP50C50 (1985), TSP50C40 (1986)

improvements on the same idea, used in later products

Emulation: Talkie (arduino) [3] seems to imitate TMS5220(verify)

Toshiba T6721A

Commodore Magic Voice

known vocabulary (also approx 200 words), refuses on anything else

Emulation: Yes, e.g. by YAPE(verify)

Votrax SC-01, SC-01A (1980?)

phoneme chips

often with something else looking up words or following syntax rules

http://www.redcedar.com/sc01.htm

Emulation: yes, e.g. by MAME [4], and e.g. [5][6] puts that MAME code in an STM32 to be a replacement for a broken IC

SSI 263A (a.k.a. SC-02?) (1985?)

SP0256-AL2 [7] (1980s?)

follows some basic english phonetic rules

The -AL2 means there are english allophones in there - there are other variants

interfacing

8 bits of data, plus some latching

you would typically load an address that contains allophone data (the ROM actually controls pitch, amplitude, formants)

(could you send this manually? Or would you have to emulate an external ROM?)

Instruction set

examples

Currah speech 64, a.k.a. Voice Messenger [8], Tandy Speech/Sound Cartridge (alongside a AY-3-8913, see PSGs) [9], Amstrad SSA-1

Emulation: Yes, MAME

CTS256A-AL2

MEA8000 [10] When compared to e.g. a SP0256, this has neither the microprocessor or ROM - intended to be controlled by a separate microprocessor (which has its own ROM) ...so you send the saw/noise+formant model parameters.

Emulator: Yes, in MAME [11]

PCF8200 - like the MEA8000 (seemingly based on it), but a little more capable.

DECTalk

A fairly large board (verify), though the later Dectalk Express made it more portable.

Emulation: Yes, e.g. DTC-01 (MAME), https://github.com/connornishijima/80speak

Steven Hawking's voice is an oddball, because it is actually the voice of Dennis Klatt, the engineer who initially made his system. At the time, it was the best automated speech you could get.

Klatt's work went into other products, like the DECtalk.

While the earlier variant Hawking sounded robotic, he refused upgrades, in part because he identified with the voice over time. He also seems to have appreciated the work of Klatt, who continued his work even when Klatt lost his voice.

Hardware-wise, the Speech Plus CallText 5010 (a model specific to Hawking) is basically a custom computer (quite old, based on an 80188), and the most interesting part of it is the DSP that translates formant descriptors to sound.

https://speechkit.io/blog/stephen-hawkings-voice/

unsorted

RoboVoice SP0-512

english text to speech, relatively basic

SP0-512-Datasheet.pdf

Franklin Language Master LM4000

DT1050 Digitalker

http://vtda.org/docs/components/NatSemi/Digitalker/IM-FL30M120_DT1050_Digitalker_Datasheet_Dec80.pdf

V30120

Emic 2

Fonix DECtalk ?

more recent

6188 (2003?), SYN6288 (2010?)

chinese, other?

WTS701

XFS5152CE (much more recent?)

arbitrary text, chinese and english

Software (also starting early)

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Voice recognition and text to speech

Contents

Voice recognition

Voice synthesis and Text-to-speech

Analog hardware

Speech ICs - mostly early ones

Software (also starting early)

Navigation menu