Speech processing: Difference between revisions

From Helpful
Jump to navigation Jump to search
Line 12: Line 12:
====Spectrogram====
====Spectrogram====
<!--
<!--
A spectrogram in general is a plot of frequencies over time.




Line 17: Line 18:
* only shows 0..5kHz because there's almost nothing interesting to speech above that, and zooming in means we can see the pitch movement better
* only shows 0..5kHz because there's almost nothing interesting to speech above that, and zooming in means we can see the pitch movement better
* has a mild lowpass to put most of the formants on similar-looking strength visible
* has a mild lowpass to put most of the formants on similar-looking strength visible
* (autoscaling)
* tries to always show curves regardless of volume, by adapting to the maximum volume present  
* tries to always show curves regardless of volume, by adapting to the maximum volume present  
* tries to not show you noise by showing only the top so-many decibels below that maximum
* tries to hide noise by showing only the top so-many decibels below that maximum
* applies dynamic comprssion to try to smooth over amplitude variation in your speech
* applies dynamic comprssion to try to smooth over amplitude variation in your speech


Line 25: Line 25:


-->
-->


====Intonogram====
====Intonogram====

Revision as of 13:07, 5 February 2024

Template:Phonetics


Plots and visualisations

Oscillogram

Waveform view.


Spectrogram

Intonogram

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

An intonograph seems to sometimes point at a device used for speech analysis (a little more specific than e.g. abusing a visicorder), and the plots it made are called intonograms.


...but most things called intonograms seem to be prints of computer analyses.

Most of them will have an estimation of fundamental frequency of speech.

Other things they may show on the same plot tends to include the waveform, and may include intensity, and e.g. time markers for manual annotation.


It seems to now indicate any sort of plot that shows a combination of information, so e.g. praat's Sound view (and perhaps Manipulation view) would probably qualify.


Simple modelling of speech

source-filter model

The source-filter model names the model/attitude that we can get a good approximation of speech with

  • either a tone at the fundamental pitch (for vowels) or noise (for consonants)
  • a few filters to imitate the formants

https://en.wikipedia.org/wiki/Source%E2%80%93filter_model

LPC & PSOLA