Praat notes

From Helpful
Jump to navigation Jump to search

Articulation

Formants

Phonetic scripts

Prosody

Intonation, stress, focus

Speech processing · Praat notes · Praat plugins and toolkit notes · Praat scripting notes


This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

How Praat thinks

The list

Loading files gives them an entry in the list.

You can save items from the list.

You can select one or more objects from the list, to then use one of the (applicable) buttons, to do something useful with this object, ot combination of objects.


Aside from selecting by hand and pressing a button.

You can also select them programmatically, by name or by id (a serial numbering, the how-manieth that was added(verify)).

This means you can create scripts do do things for you -- and use the interactible commands they create as building blocks for larger scripts.

...but more on scripting later.


You would be forgiven to think this is a project you can save.

It is not.

You can not save the list itself.

The list is itself a temporary scratch area - it cannot be remembered between runs of Praat.

Whether you're working interactively or writing scripts, You have to keep track of what you are doing right now,

So the list represents something like "the things I need for what I am currently working on".

Suggestion: Think like a database

Objects and object combinations

Many objects will have a View & Edit to, well, view and possibly edit it.


There are also combinations of objects that bring up new buttons (based on the combination of types)

There are also hints -- e.g. buttons with questionmarks, like TextGrid's "Edit with Sound?" suggests that if you select a Sound and a TextGrid, you will get a new button.

For example:

  • Sound and Textgrid has a View & Edit that combines those two object's basic view, meant for annotations


Not everything that exists is hinted at, mind

e.g. Sound and Pitch


Object types

There are quite a few object types, many of which you may never use.


Roughly in order of how quickly you will probably see or need it:


Some of the more commonly used object types

Sound[1]

waveform/PCM data
often mono, but can be multi-channel
View & Edit shows waveform and spectrogram
as the legend hints:
blue trace is pitch (note it's on a different Y axes from the shown spectrogram, because it'd otherwise be at the bottom)
red dots are formant places
green/yellow is intensity
pulses are shown in the waveform

LongSound[2]

for things that won't necessarily fit in memory (this is less of a concern these days)
more restricted than Sound (verify)


Spectrogram

spectrum over time (STFT), defaulting to a 50ms window length
if you only care to see this visual, you may not need to create a Spectrogram object, in that things that include sound data (e.g. Sound object, Manipulation object) tend to show a spectrogram in their View & Edit


Manipulation

LPC/PSOLA style speech analysis of Sound object
View & Edit shows a
pulses (if extracted becomes a PointProcess)
estimated pitch (extracted as PitchTier)
duration (extracted as DurationTier)
contains a little more
in particular the original Sound, e.g. for comparison's sake


Pitch

periodicity candidates over time
in equally sized/spaced frames (note that the evidence for these may come from different amount of pulses, and pitchtiers) show them that way

PitchTier[3]

basically a set of (timestamp, pitch_in_hz)
probably extracted via a Manipulation object
Praat itself, if asked about pitch at a point, interpolates between these points (and extends outwards before the first and after the last value)
can also be altered (and drawn) to resynthesize LPC/PSOLA type things with different vocal pitch


Intensity[4]

Intensity is at regular interval

IntensityTier[5] - amplitude envelope

(timestamp, intensity)


PointProcess[6]

a sequence of points in time, e.g. marking vocal pulses
mostly related to from LPC/PSOLA pitch stuff



Strings[7]

ordered list of strings

Table


Matrix


More specific and/or lesser-used object types

ExperimentMFC - Multiple Forced Choice[8] style listening experiment



The praat picture window

Mainly used for making plots from data. ...which can then be saved as raster (PNG) or vector (EPS, PDF).

You won't need this until you do, so can close it. ...I've seen people making startup scripts to specifically close it

Automation and scripting

As hintend at above, you can automate Praat.


At the end of the day, a script which is a bunch of text lines mentioning the actions, that are equivalent to doing those actions manually.


And you can write it yourself, but almost everything you can do in Praat GUI is recorded into history, and can be recorded into a script, and this is often the easiest way to create a useful script -- or at least do most of it, to then maybe edit a little.


The easiest way to at least reduce the amount of clicking you do when you want to do something to a large set of sounds files, is just to use that recording.

No coding required.


But coding makes things more powerful - it adds, among other things,

  • a way of ask for user input (primarily for the parameters you then hand into an existing action)
  • a basic scripting language that lets you do conditions and loops and other basics you will probably need to express your wishes.


💤 (This recordability of actions is also part of why Praat seems a little clunky at first. There is no fundamental difference between functionality already in Praat, and what you add later - it's all just buttons triggering specific actions. This action nature is also why some actions have old (and sometimes stupid) defaults: to not break what older scripts did when they relied on those old defaults)



https://www.fon.hum.uva.nl/praat/manual/History_mechanism.html New script, paste history


https://www.fon.hum.uva.nl/praat/manual/Scripting.html

Common tasks

Recording sound

NewRecord Mono Sound


Can record one or more fragments, and Save each to List. To do just one, you may like Save to List & Close


🛈 Hint: Good control of recording levels makes for clean recordings

Try to avoid very quiet recordings

Praat will scale up whatever you already recorded, to make it visible.

This is great in that you always see a waveform
and fin

This is great when you meant to do exactly what you did - say, experiment with noise

Yet when you did not intend to make a quiet-and-probably-noisy recording, this actually creates and hides an issue.
namely that quiet recordings are almost always also quite noisy.


"Why does that make it noisy?"

Because there is always some noise in the recording, both from the room you are in and from the devices you use, and the quieter the sound you record, the closer to that noise it is.

Yes, you can amplify it later, but you will also amplify the noise.

The less you have to amplify, the more you avoid this.


To illustrate that effect to yourself

  • record some gentle taps on the microphone but otherwise be quiet.
  • View & Edit the sound
you should see only the taps.
  • remove the taps (Select, SoundSet selection to zero)_
it will suddenly show loud everywhere else
  • undo that, then go to SoundSound Scaling, and in particular compare 'by whole' (the default) and 'fixed range'
  • 'by whole' - looks for the loudest sample in the whole sound
this is the detault
  • 'by window' - looks for the loudest part within the currentlyzoomed area (if stereo: max of both channels)
  • 'by window and channel' - by maximum used range (if stereo: individually)
  • 'fixed height' - seems to be "amount around calculated average"
"so if you say 2 you basically see the whole thing (but might hide a DC offset - not that you generally care about that)
  • 'fixed range' - seems to be "give min and max (average implied)"
"so if you say -1 and 1, you see the whole thing


To set up reasonable recording levels

Tell your subject to talk at reasonably loud levels, and look at the green/yellow/red indicator

  • if it barely registers, increase the input/mic sensitivity
if it's not visible at all, increase
if it goes into the red, lower it
if some of your hardware/software indicates in dB: 10 to 20dB should be enough
If not, 'a noticeable amount' should do)
if it registers a moderate amount when you're entirely quiet, consider turning it down
...because that's just device noise amplified a lot (which suggests something amplifies so much that louder things will distort).


This isn't always perfect advice - hardware varies, and there may be other reasons for e.g. things to still be quiet, or for there to be distortion in other hardware, before it even got into the PC.

Sometimes things are quiet because they keep their distance from an insensitive microphone. Sometimes things distort because they smush their face into a sensitive microphone. Sometimes things are wird because someone else used the hardware and changed something. etc.

So when doing serious recording, try to start with a sanity check: make a quick recording and listen to it (headphones are often better than speakers), and maybe check the waveform for flat, clipped tops.

💤 Why Mono?

Microphones are usually mono.

Also, it is good science to only vary the things you intend to study, and stereo effects are not easy to control in recording or playback, so mono makes your life simpler.

That said,

  • when you want to be really precise, consider that on some hardware, 'record mono' means 'mix stereo to mono', while on others, it means 'record the first channel', and you won't really know until you test
  • when recording for only transcription and not experimental playback,

know that stereo recordings can be more useful, because people are good at separating multiple voices from any sound recorded in a vaguely binaural way. You may like a portable recorder with two mics mounted on them.

Viewing your recording

Select the Sound, press the View & Edit button.

More basic things you may want to do is zooming in and out, scrolling around, cutting pieces off the edge - mainly see the Edit and Time menus (and maybe learn the keyboard shortcuts)



💤 The spectrogram is tuned for voice, and can be tweaked further.

Spectrogram → Spectrogram settings

View Range - Normally 0Hz to 5kHz (even though we recorded more) - there's almost nothing interesting above, and zooming down means we can see the pitch movement better. This is a good default, though you could lower this further to focus on vowel formant curves more.

Window length is about the (STFT) tradeoff between frequency resolution and time resolution. The default (0.005) is a good tradeoff for many tasks. Higher (0.015) may sometimes make e.g. separate formants more visible, yet makes them harder to place in time precisely. If over time you learn to recognize things visually, you may not want to touch this, just because it will look different.

The concept of dynamic range relates loudest to softest levels. Here it controls the softest levels to still draw, relative to the maximum (see below). The default 70dB thows away little to nothing, so often shows device's noise as well. Lowering this is sort of like lowering a volume knob - it will lower signal and noise equally, though the first 20 or so lowering might throw away quiet noise and it may look a little cleaner.


Spectrogram → Spectrogram advanced settings

Maximum is the energy level (you can ignore the units) to treat as the loudest to show (black). By default this field is ignored, because autoscaling handles this (...within a zoom level, so scrolling will make it vary - if you want to inspect in detail, you might care to turn off autoscaling).

Pre-emphasis considers that the loudness of speech's components (vowels mostly) falls approximately -6 dB per octave, so tries to make those overtones more visible, basically by amplifying higher frequencies. The default is +6dB per octave. Higher puts more focus on higher sounds. (identity-gain point at 1000 Hz?)

Dynamic compression amounts to bringing up the volume at the times the signal is quieter.

this has use if there is significant variation in how loud individual responses are.
...but if you have strong recordings, it can only bring up noise so has little value
This is a fraction, how much to amplify it towards the level of everything else. You rarely want to make this higher than 0.5 or so (because that's often around 20dB).


https://www.fon.hum.uva.nl/praat/manual/Advanced_spectrogram_settings___.html

Annotating your recording

Create a TextGrid

To create an empty TextGrid of the same length as a given Sound:

  • Select Sound
  • Find the Button: AnnotateTo TextGrid


The form dialog that gives you is:

All tier names:                  Mary John bell
Which of these are point tiers:  bell

The first time you see that, it introduces two or three new things, so this bears some comment.

💤 For context:

You can have multiple, independent 'tracks' of information, called tiers. This is useful e.g. when there are distinct things worth noting, e.g.

multiple speakers
a speaker and a bell to mark the start of experiment response
annotation at sentence, phrase, word, phoneme level
aligned translations
...and/or whatever else you can think of


Also, sometimes we want to segment things, and sometimes we just want to mark things in time.

So each tier can be either an

  • interval tier -
consists of segments that always covers the whole recording
inserting something at a time will split the segment that is currently there into two
you can select the segments
you can optionally label each segment (you might e.g. start by marking the silences)
  • point tier (sometimes 'text tier')
inserting a point adds a specific points in time
you can optionally label each point
you can select the labels


So now we can grasp that

  • Tier names - space-separated list. Settles the amount of tiers and their tiers at the same time
  • Which of these are point tiers? - repeat the names of tiers you want to be point tiers. Any not mentioned will become interval tiers

So:

All tier names:                  Mary John bell
Which of these are point tiers:  bell

means:

create Mary as an interval tier
create John as an interval tier
create bell as a point tier

Actually using that TextGrid

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

If you did the above, you have a Sound and a TextGrid of the same time length.


Actual annotations

Editing

A mix of keyboard and mouse seems to be most convenient.


Remember zooming (Ctrl-I, Ctrl-O), scrolling (PgUp, PgDn)


Click in waveform or spectrogram: choose a point in time


Some of the most useful keyboard shortcuts are:

Enter - add segment/point in currently selected tier

Ctrl-1, Ctrl-2 - add point in specific tier

Mouse-drag a segment-point to move it

Alt-backspace - remove point / merge segment with previous

Alt-arrows - to move around the segments (useful when cleaning up)

Annotating

Other views/editors

Pitch editor

https://www.fon.hum.uva.nl/praat/manual/PitchEditor.html


Text files, short text files, binary files

Many data-style objects (including some cases you may never use, like Sound objects) have a structured representation that can be saved as

  • a text file
which contains a little more than necessary but is human-readable.
  • a short text file, which basically omits the variable names,
but is stable enough that parsers should have no trouble (no idea if there were breaking changes over time)
  • a binary format, which is a little more compact.


Some thing have futher forms, e.g.

  • PitchTier and DurationTier has
    • PitchTier/DurationTier Spreadsheet file (not unlike their short text form)
    • headerless spreadsheet file (basically TSV)


While the text formats look parseable by yourself, try to avoid that when it is easy because the format has changed. Praat will know how to handle that, but your or other's libraries may break over time.

Praat setup

Praat preferences folder

Mostly contains

  • Preferences file[9], mostly contains a whole bunch of defaults
  • Buttons file[10], mostly contains registration/show/hide of interface elements (menu items and buttons)
  • possible plugins, in directories


Location:

Windows:   %USERPROFILE%\Praat
OSX:       ~/Library/Preferences/Praat Prefs/
Linux:     ~/.praat-dir/

https://www.fon.hum.uva.nl/praat/manual/preferences_folder.html

Interacting with Praat

Calling into Praat executable should typically be done with --open, --run, or --send


Opening files with Praat can be done like

Praat.exe --open data\hello.wav data\hello.TextGrid
Praat.exe --open script.Praat


You can

Praat.exe --run testCommandLineCalls.praat "argument"

...which does so without a GUI; any Info-window output goes to the console that runs it.


To command a running Praat GUI (or a new one if one wasn't running), you want sending

Praat.ext --send "argument"

sendpraat is roughly the same, e.g.

sendpraat 1000 praat "Read from file... hello.wav" "Play reverse" "Remove"


https://www.fon.hum.uva.nl/praat/manual/Scripting_6_9__Calling_from_the_command_line.html


"The phonetic font is not available"

Praat wants a font that covers all IPA characters in Unicode


Praat comes from a time where you likely needed to install SIL Doulos and/or SIL Charis to guarantee it. These days there are other fonts that cover them, but Praat still plays safe and still requires them but if it works, you can ignore the warning.


See also: how to install fonts (in general)

Consider tweaking Windows

Showing file extensions

If you have made

Recording1.wav
Recording1.Manipilation
Recording1.PitchTier

then it is fairly clear what belongs together.

However, Windows typically hides extensions (that it knows about). This is nice for a uncluttered overview where you have a nice icon instead -- but less precise when using a computer as a tool. In particular, when explorer show you just

Recording1
Recording1
Recording1

that's less great.


If you want to see extensions:

Win7: Tools : Folder options : View tab : uncheck "Hide extensions for known file types"
Win10: Explorer window : View : check "File Name Extensions"
Win11: Explorer window : View : Show : File Name Extensions
OSX:
Linux / Gnome:

Where is my configuration

Windows: %USERPROFILE%\Praat Linux: ~/.praat-dir OSX: ~/Library/Preferences/Praat Prefs/

...which are each shorthands that resolve to directory for user currently logged in


Consider: Open with

Why?

Say you often find yourself opening one file at a time with Praat's OpenRead from File to add a bunch of files


There are faster options.

"Open with" is a general windows feature that lets you tell it what extensions to associate with what program.


How?

  • Windows: Right-click a file (with an extension you want to associate with a program - you would have to do this once for each praat file extension)
"Open With..."
may require some extra clicking (varies with windows version) like "More apps", "look for more apps", scrolling, etc
The first time you do this you also need to browse to where precisely the application is located on disk. If you've done that once it should be in the list of apps