Human hearing, psychoacoustics: Difference between revisions

Revision as of 15:03, 6 September 2023

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

Psycho-acoustics is a study of various sound response and interpretation effects that happen in the source-ear-brain-perception path, particularly the ear and brain.

There are various complex topics in (human) hearing. If you mostly skip this section, the concepts you should probably know about the varying sensitivity to frequencies, know about masking and such, and know that practical psycho-acoustic models (used e.g. for things like sound compression) are mostly a fuzzy combination of various effects.

Results of said physiology, models

Frequency perception

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In music, tones are usually seen in a way that originates in part from scientific pitch notation, in which an octave is a doubling of frequency. This already illustrates the non-linear nature of human hearing, but is in itself not actually an accurate model of human pitch perception.

Accurate comparison of frequencies and frequency bands can be an involved subject.

Perceived equal frequency intervals/distances are not easily caught in a more complex function. If you've ever heard a frequency generator slowly and linearly increase the frequency, you'll know that it sounds to us like fast changes at the start and past ~6KHz it's all a slightly changing high beep.

In addition, or perhaps in extension, the accuracy with which we can can judge similar tones as not-the-same changes with frequency, and is also not trivial to model.

Frequency warping is often applied to attempt to linearize perceived pitch, something that can help various perceptual analyses and visualizations.

The critical bandwidth increases and the pitch resolution we hear decreases.

The non-linear nature of frequency hearing, the existence and the approximate size of critical bandwidths is useful information when we want to model our hearing.

It is also useful information to things like lossy audio compression, since it tells us that spending equal space on perceived tones means we should expend coding space non-linearly with frequency.

For background: Mathematical frequency intervals

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In the mathematical scientific pitch notation, an octave refers to a doubling of the frequency, cents are a (log-based) ratio defined so that there are 1200 steps in an octave, of a multiple 2^1/1200 each.

Note that the human Just Noticeable Difference tones is around 5 cents, although there are reasons that tuning should ideally be more accurate than that, particularly for instruments with overtones for there to be more audible dissonance in (verify).

Notes can be referenced in a note-octave, a combination of semitone letter and the octave it is in.

This scale is typically anchored at A4, typically settling that as 440Hz - concert pitch, but there are variations.

The limit of 88 keys on most pianos, good for seven and a half octaves, comes from a "...that's enough" (88-key is A0 to C8 - below A0 it's more of a rumble, above C8 it's shrill, though there are some slight extensions). That makes its middle C be C4.

Bandwidths, frequency warping, and more

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Note that studies and applications in this area usually address multiple things at once, usually one or more of:

estimation of the width of the critical band of a band / at a frequency
frequency warping (Hz to a perceptually linear scale). (Note that a fitting critical band data can often be a decent approximate warper)
working towards a filterbank design that is an accurate model in specific or various ways (there are a number of psychoacoustic effects that you could wish to model, or ignore to keep things simple)

Also useful to note:

since some terms refer to a general concept, it may refer to different formulae, and to different publications that pulls in more or less related details than others.

There are a good amount of convenient approximations around, which tend to confuse various summaries (...such as this one. I'll warn you against assuming this is all correct.)
while bands are sometimes reported as given widths around given centers, particularly for approximate filterbank designs, bands do really not have fixed positions. It is more accurate to consider a function that reports a width for arbitrary frequencies.

There is often some rule-of-thumb knowledge, and various models and formulae that are more accurate than either of those - but many formulae are still noticeably specific and/or inaccurate, often mostly for lowish and for high frequencies.

It's useful to clearly know the difference between critical band rate functions (mostly a frequency warper, a function from Hz to fairly synthetic units) and critical bandwidth functions (estimation of the bandwidth of hearing at given frequency, a Hz→Hz function). They are easily confused because they have similar names, similarly shaped graphs, and the fact that they approximate rather related concepts.

A quite understandable source of confusion is that in many mentions of critical band rate, it is noted that the interval between whole Bark units corresponds to critical bandwidths. This is approximately true, but approximately at best. It is often a fairly rough attempt to simplify away the need for a separate critical bandwidth function. One of the largest differences between these two groups seems to be the treatment under ~500Hz, and details like that ERB lends itself to filterbank design more easily.

Another source of confusion is naming: Earlier models often have a name involving 'critical band', later and different models often mention ERB (equivalent rectangular bandwidth), and it seems that references can quite fuzzily refer to those two approximate sets, the whole, or sometimes the wrong ones, making these names rather finicky to use.

If you like to categorize things, you could say that you have CB rate functions (bark units), ERB rate function (ERB units), and from both those areas there are bandwidth functions.

Critical bands

Critical band rate:

Primarily a frequency warper
Units: input in Hz, output (usually) in Bark
often given the letter z
approximately linear to input frequency up to ~200Hz (~0 to 2 bark)
approximately linear to log of input frequency for ~500Hz..10kHz (~5 to 22 bark)

Critical bandwidth:

a function to approximate the width of a critical band at a given frequency (Hz->Hz)
Units: input in Hz, output in Hz
useful for model design, usually along with a frequency warper
approximately proportional to input frequency up ~500Hz (sized ~100Hz per band)
approximately proportional to log of input frequency over ~1kHz (sized ~20% of center frequency)

ERB

Equivalent Rectangular Bandwidth (ERB) and ERB-rate formulae (both introduced some time after most mentioned critical-band and bandwidth scales) approximate the relationship between the frequency and ear's according critical bandwidth.

More specifically, they do so using a modeled rectangular passband filter (with the same pass-band center as the auditory filter it models, and has similar response to white noise).

For low frequencies (below ~500Hz), ERB bandwidth functions will estimate bandwidth noticeably smaller than critical bandwidth does.

There are a number of different investigations, measurements, and approximations to calculate an ERB. The formulae most commonly used seem to be from (Glasberg & Moore 1990)(verify).

Bark scale

The term Bark scale (somewhat fuzzily) refers to most critical band rate functions.

The Bark unit was named in reference to Heinrich Barkhausen (who introduced the phon).

Using bark as a unit was proposed in (Zwicker 1961), which was also perhaps the earliest summarizing publication on critical band rate (and makes a bark-unit-is-approximately-a-bandwidth note), and which reports a number of centers, edges.

Bark is related to, but somewhat less popular than the Mel scale.

The value of the Bark scale in tables often goes from 1 to 24, though in practice it can be a function meant to be usable from 0 to 25. (the 24th band covers a band up to 31kHz. The 25th is easily extrapolated, and useful to e.g. deal with 44kHz/48kHz recordings).

Mel scale

There is also the Mel scale, which like Bark was based on empirical data, but has slightly different aims

Frequency warping scale that aims for linear perceptual pitch units (verify)
Defined as f(hz) = 1127.0148 * log_e(1+hz/700)
Which is the same as 2595 * log₁₀(1+hz/700)
and some other definitions (approximation or exactly?)
The inverse (from mel to Hz) is f(m) = 700 * (e^m/1127.0148 - 1)

Various approximating functions

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

(early version of a graphs of various related functions)

CB rate

Bark according to Zwicker and Terhardt 1980:

13*arctan(hz*0.00076) + 3.5*arctan((hz/7500)²)

error varies, up to ~0.2 Bark

Bark according to Traunmuller 1990:

f(hz) = (26.81*hz)/(1960.0+hz) - 0.53

seems more accurate than the previous, particularly for the 200-500Hz range
seems to be the more usual function in use

ERB rate

11.17 * log_e( (hz+312)/(hz+14675) ) + 43.0

(mentioned at least in Moore and Glasberg 1983)

Bandwidth

Very simple rule-o-thumb approximation ('100Hz below 500Hz, 20% of the center above that'):

max(100,hz/5)

(Zwicker Terhardt 1980, or earlier?)

25 + 75*( 1 + 1.4*(hz/1000.)² )^0.69

...with bark (and not Hz) as input:

52548 / (z² - 52.56z + 690.39)

6.23*khz² + 93.39*khz + 28.52

Filterbanks

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The idea here is to get a basic imitation of the ear's response throughout the frequency scale

While the following implementation is still a relatively basic model of the human cochlea, it reveals various human biases to hearing sound, and as such are quite convenient for a number of perceptual tasks.

The common implementation is a set of passband filters, often with overlapping reponse, to model response to different frequencies. On eh (~34mm-long) basilar membrane, a sound will tend to excite roughly 1.5mm of it at a time, which leads to the typical choice of using 20 to 24 of these filters.

That number of centers, and the according bandwidths (which are not exactly based on the 1.5mm) vary between different models' assumptions, and on the frequency linearization/warping used and the upper limit on frequency that you want the model to include. (For a sense of size: the more important bandwidths are usually on the order of 100 to 1000Hz wide)

That frequency resolution may seem low, but is quite decent considering the size of the cochlea. The Just Noticeable Differences seem to be sub-Hz at a few hundred Hz, up to ten Hz at a few kHz, up to perhaps ~30Hz at ~5-8kHz.

Reports vary considerably, probably because pure sines are much easier than sounds with timbre, vibrato, etc. so it can easily double over these values. (verify)

Regardless, note that this is much better than the ~1.5mm excitation suggest; there is obviously more at work than ~24 coefficients, and for us, much of this happens in post-processing in the brain.

Hearing damage

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Masking

Other psychoacoustic effects

Localization

Selective attention

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

@@ Line 8: / Line 8: @@
 ==Results of said physiology, models==
-===Loudness perception===
 ===Frequency perception===

Human hearing, psychoacoustics: Difference between revisions

Revision as of 15:03, 6 September 2023

Contents

Results of said physiology, models

Frequency perception

For background: Mathematical frequency intervals

Bandwidths, frequency warping, and more

Critical bands

ERB

Bark scale

Mel scale

Various approximating functions

See also (bandwidths)

Filterbanks

Hearing damage

Masking

Other psychoacoustic effects

Localization

Selective attention

Auditory illusions

See also

Navigation menu