|This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)
Formants are acoustic resonances that happen due to the physical space around their production, and at pitches defined by the size of that space, and are typically fairly broad.
Acoustically, this is just resonance and the idea could be extended to instruments, and even to rooms (see e.g. room modes), but the term originated to describe our voices.
We have another word for resonance in part because it describes cases where these are movable, in particular in vocal tracts, where they vary per person and are partially controllable.
- F0 is the frequency at which the vocal folds vibrate, (so sometimes called the fundamental)
- which is more quantitative than our varied use of the word pitch could be
- F1 the frequency of the first first formant/resonance
- F2 is the second formant,
- F3 the third formant
F3, F4, and F5 become less informative to most tasks, so most practical research cares at most about the first two or three, and almost no one tries to more than past five -
When you reproduce speech using the "saw/noise + formant filters" model, you mostly control those filters's frequency, but may also get control over the width of each such filter (though bandwidth starts mattering little by the fourth formant).
Formants help explain how vowels are such easily agreed on and understood things, when though there is so much variation in vocal tract, e.g. their length and overall pitch.
It turns out that vowels are not settled by pitch directly (we understand vowels at different pitches), but by relation between pitches of these resonant points. To judging vowels, we mainly hear the relation between F1 and F2 - e.g. Vowel charts are plots in terms these two formants (F1 by F2, sometimes F1 by F2-F1), with no F0 in sight.
- formants are usually some distinguishable space apart, though if not always quite as much or as cleanly as theory might like.
- You can get quite far in identification with just the first two formants
- roughly related to how high the tongue is (verify)
- often somewhere around 500Hz (within 200..1000Hz)
- roughly related to how far back it is (verify)
- oten somewhere around 1500Hz (500..2500Hz)
- somwhere around around 2500Hz.
- it relates to nasal sounds, but the nasal cavity is barely controllable, and varies less between people, so less informative in general
- it still helps points out e.g. r-colored vowels like the American r - r-colored vowels have lower F3 than regular vowels
- consonants that don't interrupt voicing (laterals, approximants) can set up some specific extra resonances too (verify).
- While the F0 of voice varies (roughly 100Hz for men, 200Hz for women, and around 300Hz for children, the formants vary less.
- That said, our judgment whether a voice is male, female, or child is informed by both F0 and the formants (verify)