Human hearing, psychoacoustics: Difference between revisions

Revision as of 14:43, 6 September 2023

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync

Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync

Electronic music:

Electronic music - musical terms

MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth

Modular synth (eurorack, mostly):

sync · power supply · formats (physical, interconnects)

DAW: Ableton notes · MuLab notes · Mainstage notes

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

Psycho-acoustics is a study of various sound response and interpretation effects that happen in the source-ear-brain-perception path, particularly the ear and brain.

There are various complex topics in (human) hearing. If you mostly skip this section, the concepts you should probably know about the varying sensitivity to frequencies, know about masking and such, and know that practical psycho-acoustic models (used e.g. for things like sound compression) are mostly a fuzzy combination of various effects.

Some physiology

Inner and outer ear: Cochlea, basilar membrane, meatus, and more

Some summarizing notes

Results of said physiology, models

Loudness perception

Perceptual loudness of frequencies

Physical reality can be described in terms of pressure or intensity, but human perception of the same sounds sounds cannot.

The most important detail is probably the fact that we hear the same physical amplitudes of different frequencies as different intensities. For an extreme example, a pure 40Hz tone needs to be about 45dB louder than a pure 4000Hz tone to be perceived as just about as loud (note that 45dB is roughly a multiple of 30000 of energy used).

This difference is caused by the way our ears work. This has been measured several times, notably by Fletcher and Munson (1933), and a little more accurately by Robertson and Dadson (1956), and more accurately since that (see e.g. ISO 226:2003 for details).

These tests used near-perfect listening conditions, middle-aged people, and perhaps most importantly, pure tones. This last detail limits their value of direct application in the face of complex signals, noise, temporal psycoacoustics (pulses, listener fatigue), and such.

The test results are often viewed as a graph of equal loudness contours, which indicate the perceived difference when a (simple) sound at a particular db(SPL) level changes frequency. A different way to see the contours is the amount of amplitude change the ear applies for a tone at a frequency and amplitude.

There are of course other effects on how you hear each frequency, some from the environment and physics (interference, absorption), some psychoacoustic (frequency masking, temporal masking, listener fatigue, reaction to pulses), some related to quality of your reproduction hardware (headphones often offer better detail than speakers, cheap sound cards may actually alias) and others.

There are other effects that you could call psychoacoustic. For example, our judge of how loud a sound system is depends not only on how loud it is but also on how much it is distorting.

approximate equal loudness adjustments (graphed with logarithmic Hz scale)

If you digitize the equal loudness curves, flip it to gear it towards subtracting 0dB to whatever amount of dB applies for a frequency, you'll get something like like the graph on the right. If you have the data this is based on, it's fairly simple to adjust post-FFT data for loudness.

More adapted measures of loudness

There are a number of filters/measures, most of which try to considering equal loudness curves, and/or bias to specific purposes.

Note that most are designed for simple signals, even if they are regularly used for complex ones (noise, music). The main exception in this list is ITU-R 468, which is more valid on complex signals than most others.

Perhaps best known is dBA and to a lesser degree dBC, commonly used to mechanically measure sound levels in roughly human terms. (notation varies - dBA is also seen as dB(A), dBa, and other mild writing variations. Note that dBa has a different meaning in some contexts)

Adjustments according to db(A), db(B), db(C), fairly simple shapes

.

dB(A) seems intended to be a simple approximation of human frequency response at relatively quiet levels, and used e.g. in low-level noise pollution - things that are annoying, but below deafening.

That said, dBA's weighs the first few dozen Hz down a lot (~30-70dB) - which is a good way to have a device spec sheet pretend 50 / 60Hz humming is barely there (and won't rattle anything else) just because our ears are worse at this.

Sound level meters on mixing panels and similar might be (roughly) A-weighed -- which is useful when focusing on vocals but quite a poor indication of bass. Assume level meters are not great at indicating bass (though the way they are wrong varies a bunch) until you know what that do.

dB(C) is meant to approximate the ear at fairly loud sound levels, and leaves in more low frequencies - but not enough to evaluate the effect of low bass. It seems to often be used in traffic loudness measurements.

dB(Z) is flat for most of the spectrum (z refers to zero), but has more defined cutoff points than the 'flat' ratings left up to manufacturers.(verify)

so not as useful for loudness perception unless it is well defined where those falloffs are - but apparently it should be a passband between 10Hz and 20kHz(verify)

yet arguably the most useful to evaluate potential hearing damage

Old and/or specific-purpose:

dB(B) lies somewhere between A and C, both in terms of frequency response and intended loudness levels. You could say it roughly models the ear at medium sound levels - which arguably makes it more useful than either A or C. Yet it seems to be rarely used, perhaps because there are better models there are better models that are not much more complex.

dB(D) was intended for loud aircraft noise. It has a peak around 6kHz that models how people sense random noise differently from tones, particularly around there (verify).

dB(G) focuses primarily around sub-bass (20Hz) and is useful when measuring large slow movements, like wind turbines.

ISO 226 has a more complex frequency response than dBA/B/C, with pre-2003 versions based on the Robertson-Dadson results, and the 2003 version being based on revisions based on more recent equal loudness tests.

ITU-R 468 (note: ITU-R used to be CCIR), is a better approximation for noise and complex signals (dBA and its family were designed for pure tones), and also models our reduced sensitivity to short bursts and clicks to some degree. R468 has seen a lot of use in some specific fields.

Loudness meters vary in design. They may be unfiltered, use dBA, dBC, and sometimes other filters, they may be slow or fast in response to large level differences (e.g. a peak programme meter is slower, almost ignoring few-millisecond peaks, since human hearing is tolerant of short distortion), decay slower to give a multi-tasking broadcast operator more of a chance to get an idea of recent peaks, and have other variations.

Various weighings have relatively simple circuits, e.g. https://web.archive.org/web/20210507071115/https://sound-au.com/project17.htm

Phons: At 1kHz, 1 phon is defined as 1 dB SPL. For other frequencies it is adjusted following the equal loudness curves.

Phons and sones were proposed perceptual loudness units / experiments, and neither are in particularly regular use other than being referenced by some definitions.

Sones: A phon-based exponential scale with base two. The definition says that at 1kHz, 1 sone is 40 phons (probably because that makes 1 sone a practical quiet sound rather than a theoretical hearing limit). The idea behind the base two is that a perceptual doubling of intensity (~10dB) means a doubling of the sones:

  sones     phons / dBSPL@1kHZ
   0.5             30 (verify)
   1               40
   2               50
   4               60        
   8               70
  16               80
  32               90 
  64              100    
 128              110    
 256              120    
 512              130
1024              140

(For frequencies other than 1kHz this must be adjusted according to equal loudness curves).

Frequency perception

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In music, tones are usually seen in a way that originates in part from scientific pitch notation, in which an octave is a doubling of frequency. This already illustrates the non-linear nature of human hearing, but is in itself not actually an accurate model of human pitch perception.

Accurate comparison of frequencies and frequency bands can be an involved subject.

Perceived equal frequency intervals/distances are not easily caught in a more complex function. If you've ever heard a frequency generator slowly and linearly increase the frequency, you'll know that it sounds to us like fast changes at the start and past ~6KHz it's all a slightly changing high beep.

In addition, or perhaps in extension, the accuracy with which we can can judge similar tones as not-the-same changes with frequency, and is also not trivial to model.

Frequency warping is often applied to attempt to linearize perceived pitch, something that can help various perceptual analyses and visualizations.

The critical bandwidth increases and the pitch resolution we hear decreases.

The non-linear nature of frequency hearing, the existence and the approximate size of critical bandwidths is useful information when we want to model our hearing.

It is also useful information to things like lossy audio compression, since it tells us that spending equal space on perceived tones means we should expend coding space non-linearly with frequency.

For background: Mathematical frequency intervals

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In the mathematical scientific pitch notation, an octave refers to a doubling of the frequency, cents are a (log-based) ratio defined so that there are 1200 steps in an octave, of a multiple 2^1/1200 each.

Note that the human Just Noticeable Difference tones is around 5 cents, although there are reasons that tuning should ideally be more accurate than that, particularly for instruments with overtones for there to be more audible dissonance in (verify).

Notes can be referenced in a note-octave, a combination of semitone letter and the octave it is in.

This scale is typically anchored at A4, typically settling that as 440Hz - concert pitch, but there are variations.

The limit of 88 keys on most pianos, good for seven and a half octaves, comes from a "...that's enough" (88-key is A0 to C8 - below A0 it's more of a rumble, above C8 it's shrill, though there are some slight extensions). That makes its middle C be C4.

Bandwidths, frequency warping, and more

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Note that studies and applications in this area usually address multiple things at once, usually one or more of:

estimation of the width of the critical band of a band / at a frequency
frequency warping (Hz to a perceptually linear scale). (Note that a fitting critical band data can often be a decent approximate warper)
working towards a filterbank design that is an accurate model in specific or various ways (there are a number of psychoacoustic effects that you could wish to model, or ignore to keep things simple)

Also useful to note:

since some terms refer to a general concept, it may refer to different formulae, and to different publications that pulls in more or less related details than others.

There are a good amount of convenient approximations around, which tend to confuse various summaries (...such as this one. I'll warn you against assuming this is all correct.)
while bands are sometimes reported as given widths around given centers, particularly for approximate filterbank designs, bands do really not have fixed positions. It is more accurate to consider a function that reports a width for arbitrary frequencies.

There is often some rule-of-thumb knowledge, and various models and formulae that are more accurate than either of those - but many formulae are still noticeably specific and/or inaccurate, often mostly for lowish and for high frequencies.

It's useful to clearly know the difference between critical band rate functions (mostly a frequency warper, a function from Hz to fairly synthetic units) and critical bandwidth functions (estimation of the bandwidth of hearing at given frequency, a Hz→Hz function). They are easily confused because they have similar names, similarly shaped graphs, and the fact that they approximate rather related concepts.

A quite understandable source of confusion is that in many mentions of critical band rate, it is noted that the interval between whole Bark units corresponds to critical bandwidths. This is approximately true, but approximately at best. It is often a fairly rough attempt to simplify away the need for a separate critical bandwidth function. One of the largest differences between these two groups seems to be the treatment under ~500Hz, and details like that ERB lends itself to filterbank design more easily.

Another source of confusion is naming: Earlier models often have a name involving 'critical band', later and different models often mention ERB (equivalent rectangular bandwidth), and it seems that references can quite fuzzily refer to those two approximate sets, the whole, or sometimes the wrong ones, making these names rather finicky to use.

If you like to categorize things, you could say that you have CB rate functions (bark units), ERB rate function (ERB units), and from both those areas there are bandwidth functions.

Critical bands

Critical band rate:

Primarily a frequency warper
Units: input in Hz, output (usually) in Bark
often given the letter z
approximately linear to input frequency up to ~200Hz (~0 to 2 bark)
approximately linear to log of input frequency for ~500Hz..10kHz (~5 to 22 bark)

Critical bandwidth:

a function to approximate the width of a critical band at a given frequency (Hz->Hz)
Units: input in Hz, output in Hz
useful for model design, usually along with a frequency warper
approximately proportional to input frequency up ~500Hz (sized ~100Hz per band)
approximately proportional to log of input frequency over ~1kHz (sized ~20% of center frequency)

ERB

Equivalent Rectangular Bandwidth (ERB) and ERB-rate formulae (both introduced some time after most mentioned critical-band and bandwidth scales) approximate the relationship between the frequency and ear's according critical bandwidth.

More specifically, they do so using a modeled rectangular passband filter (with the same pass-band center as the auditory filter it models, and has similar response to white noise).

For low frequencies (below ~500Hz), ERB bandwidth functions will estimate bandwidth noticeably smaller than critical bandwidth does.

There are a number of different investigations, measurements, and approximations to calculate an ERB. The formulae most commonly used seem to be from (Glasberg & Moore 1990)(verify).

Bark scale

The term Bark scale (somewhat fuzzily) refers to most critical band rate functions.

The Bark unit was named in reference to Heinrich Barkhausen (who introduced the phon).

Using bark as a unit was proposed in (Zwicker 1961), which was also perhaps the earliest summarizing publication on critical band rate (and makes a bark-unit-is-approximately-a-bandwidth note), and which reports a number of centers, edges.

Bark is related to, but somewhat less popular than the Mel scale.

The value of the Bark scale in tables often goes from 1 to 24, though in practice it can be a function meant to be usable from 0 to 25. (the 24th band covers a band up to 31kHz. The 25th is easily extrapolated, and useful to e.g. deal with 44kHz/48kHz recordings).

Mel scale

There is also the Mel scale, which like Bark was based on empirical data, but has slightly different aims

Frequency warping scale that aims for linear perceptual pitch units (verify)
Defined as f(hz) = 1127.0148 * log_e(1+hz/700)
Which is the same as 2595 * log₁₀(1+hz/700)
and some other definitions (approximation or exactly?)
The inverse (from mel to Hz) is f(m) = 700 * (e^m/1127.0148 - 1)

Various approximating functions

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

(early version of a graphs of various related functions)

CB rate

Bark according to Zwicker and Terhardt 1980:

13*arctan(hz*0.00076) + 3.5*arctan((hz/7500)²)

error varies, up to ~0.2 Bark

Bark according to Traunmuller 1990:

f(hz) = (26.81*hz)/(1960.0+hz) - 0.53

seems more accurate than the previous, particularly for the 200-500Hz range
seems to be the more usual function in use

ERB rate

11.17 * log_e( (hz+312)/(hz+14675) ) + 43.0

(mentioned at least in Moore and Glasberg 1983)

Bandwidth

Very simple rule-o-thumb approximation ('100Hz below 500Hz, 20% of the center above that'):

max(100,hz/5)

(Zwicker Terhardt 1980, or earlier?)

25 + 75*( 1 + 1.4*(hz/1000.)² )^0.69

...with bark (and not Hz) as input:

52548 / (z² - 52.56z + 690.39)

6.23*khz² + 93.39*khz + 28.52

Filterbanks

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The idea here is to get a basic imitation of the ear's response throughout the frequency scale

While the following implementation is still a relatively basic model of the human cochlea, it reveals various human biases to hearing sound, and as such are quite convenient for a number of perceptual tasks.

The common implementation is a set of passband filters, often with overlapping reponse, to model response to different frequencies. On eh (~34mm-long) basilar membrane, a sound will tend to excite roughly 1.5mm of it at a time, which leads to the typical choice of using 20 to 24 of these filters.

That number of centers, and the according bandwidths (which are not exactly based on the 1.5mm) vary between different models' assumptions, and on the frequency linearization/warping used and the upper limit on frequency that you want the model to include. (For a sense of size: the more important bandwidths are usually on the order of 100 to 1000Hz wide)

That frequency resolution may seem low, but is quite decent considering the size of the cochlea. The Just Noticeable Differences seem to be sub-Hz at a few hundred Hz, up to ten Hz at a few kHz, up to perhaps ~30Hz at ~5-8kHz.

Reports vary considerably, probably because pure sines are much easier than sounds with timbre, vibrato, etc. so it can easily double over these values. (verify)

Regardless, note that this is much better than the ~1.5mm excitation suggest; there is obviously more at work than ~24 coefficients, and for us, much of this happens in post-processing in the brain.

Hearing damage

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Masking

Other psychoacoustic effects

Localization

Selective attention

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

@@ Line 63: / Line 63: @@
 ==Results of said physiology, models==
-===Perceptual loudness of frequencies===
+===Loudness perception===
+====Perceptual loudness of frequencies====
 Physical reality can be described in terms of pressure or intensity, but human perception of the same sounds sounds cannot.
@@ Line 125: / Line 128: @@
-===More adapted measures of loudness===
+====More adapted measures of loudness====
 There are a number of filters/measures, most of which try to considering equal loudness curves, and/or bias to specific purposes.

Human hearing, psychoacoustics: Difference between revisions

Revision as of 14:43, 6 September 2023

Contents

Some physiology

Inner and outer ear: Cochlea, basilar membrane, meatus, and more

Some summarizing notes

Results of said physiology, models

Loudness perception

Perceptual loudness of frequencies

More adapted measures of loudness

Frequency perception

For background: Mathematical frequency intervals

Bandwidths, frequency warping, and more

Critical bands

ERB

Bark scale

Mel scale

Various approximating functions

See also (bandwidths)

Filterbanks

Hearing damage

Masking

Other psychoacoustic effects

Localization

Selective attention

Auditory illusions

See also

Navigation menu