Human hearing, psychoacoustics: Difference between revisions

From Helpful
Jump to navigation Jump to search
Line 7: Line 7:


==Results of said physiology, models==
==Results of said physiology, models==
===Frequency perception===
{{stub}}
In music, tones are usually seen in a way that originates in part from scientific pitch notation, in which an octave is a doubling of frequency. This already illustrates the non-linear nature of human hearing, but is in itself not actually an accurate model of human pitch perception.
Accurate comparison of frequencies and frequency bands can be an involved subject.
Perceived equal frequency intervals/distances are not easily caught in a more complex function. If you've ever heard a frequency generator slowly and linearly increase the frequency, you'll know that it sounds to us like fast changes at the start and past ~6KHz it's all a slightly changing high beep.
In addition, or perhaps in extension, the accuracy with which we can can judge similar tones as not-the-same changes with frequency, and is also not trivial to model.
Frequency warping is often applied to attempt to linearize perceived pitch, something that can help various perceptual analyses and visualizations.
The critical bandwidth increases and the pitch resolution we hear decreases.
The non-linear nature of frequency hearing, the existence and the approximate size of critical bandwidths is useful information when we want to model our hearing.
It is also useful information to things like lossy audio compression, since it tells us that spending equal space on perceived tones means we should expend coding space non-linearly with frequency.
====For background: Mathematical frequency intervals====
{{stub}}
In the mathematical '''scientific pitch notation''', an '''octave''' refers to a doubling of the frequency, '''cents''' are a (log-based) ratio defined so that there are 1200 steps in an octave, of a multiple 2<sup>1/1200</sup> each.
Note that the human Just Noticeable Difference tones is around 5 cents,
although there are reasons that ''tuning'' should ideally be more accurate than that, particularly for instruments with overtones for there to be more audible dissonance in {{verify}}.
<!--
In equal temperament tuning (the one common and modern in the west), a '''semitone''' refers to exactly 100 cents (so factor 2<sup>(1/12)</sup>, approximately a factor 1.059).
Note that other tunings may be more complex. They focus on consonance and dissonance in different ways, for example making specific chords (or musical styles/system) sound a little better, at the cost of others.  This was once the only thing we knew, but we wanted to get rid of the bad cases.
-->
Notes can be referenced in a note-octave, a combination of semitone letter and the octave it is in.
This scale is typically anchored at A4, typically settling that as 440Hz - concert pitch, but there are variations.
The limit of 88 keys on most pianos, good for seven and a half octaves, comes from a "...that's enough" (88-key is A0 to C8 - below A0 it's more of a rumble, above C8 it's shrill, though there are some slight extensions). That makes its middle C be C4<!--(261.6Hz)-->.
<!--
Simpler have fewer keys. Keyboards are often smaller yet. Few pop songs use more than three or four, and they can easily transposed anyway.
-->
<!--
* A keyboard is often approximately centered around middle C and often have a range of around 7 octaves (as in the relatively common 88 keys keyboards)
-->
See also:
* http://en.wikipedia.org/wiki/Octave
* http://en.wikipedia.org/wiki/Cent_(music)
* http://en.wikipedia.org/wiki/Semitone
* A lot of music theory
====Bandwidths, frequency warping, and more====
{{stub}}
Note that studies and applications in this area usually address multiple things at once, usually one or more of:
* estimation of the width of the critical band of a band / at a frequency
* frequency warping (Hz to a perceptually linear scale). {{comment|(Note that a fitting critical band data can often be a decent approximate warper)}}
* working towards a filterbank design that is an accurate model in specific or various ways (there are a number of psychoacoustic effects that you could wish to model, or ignore to keep things simple)
Also useful to note:
* since some terms refer to a general concept, it may refer to different formulae, and to different publications that pulls in more or less related details than others.
* There are a good amount of convenient approximations around, which tend to confuse various summaries (...such as this one. I'll warn you against assuming this is all correct.)
* while bands are sometimes reported as given widths around given centers, particularly for approximate filterbank designs, bands do really not have fixed positions. It is more accurate to consider a function that reports a width for arbitrary frequencies.
* There is often some rule-of-thumb knowledge, and various models and formulae that are more accurate than either of those - but many formulae are still noticeably specific and/or inaccurate, often mostly for lowish and for high frequencies.
<!--
* For relatively higher frequencies, a critical bandwidth is somewhere between a whole tone and a third of an octave wide.
-->
It's useful to clearly know the difference between '''critical band rate''' functions (mostly a frequency warper, a function from Hz to fairly synthetic units) and '''critical bandwidth''' functions (estimation of the bandwidth of hearing at given frequency, a Hz&rarr;Hz function). They are easily confused because they have similar names, similarly shaped graphs, and the fact that they approximate rather related concepts.
A quite understandable source of confusion is that in many mentions of critical band rate, it is noted that the interval between whole Bark units corresponds to critical bandwidths. This is approximately true, but approximately at best. It is often a fairly rough attempt to simplify away the need for a separate critical bandwidth function. One of the largest differences between these two groups seems to be the treatment under ~500Hz, and details like that ERB lends itself to filterbank design more easily.
Another source of confusion is naming: Earlier models often have a name involving 'critical band', later and different models often mention ERB (equivalent rectangular bandwidth), and it seems that references can quite fuzzily refer to those two approximate sets, the whole, or sometimes the wrong ones, making these names rather finicky to use.
If you like to categorize things,
you could say that you have CB rate functions (bark units),
ERB rate function (ERB units),
and from both those areas there are bandwidth functions.
=====Critical bands=====
Critical band rate:
* Primarily a frequency warper
* Units: input in Hz, output (usually) in Bark
* often given the letter z
* approximately linear to input frequency up to ~200Hz (~0 to 2 bark)
* approximately linear to log of input frequency for ~500Hz..10kHz (~5 to 22 bark)
Critical bandwidth:
* a function to approximate the width of a critical band at a given frequency (Hz->Hz)
* Units: input in Hz, output in Hz
* useful for model design, usually along with a frequency warper
* approximately proportional to input frequency up ~500Hz {{comment|(sized ~100Hz per band)}}
* approximately proportional to log of input frequency over ~1kHz {{comment|(sized ~20% of center frequency)}}
=====ERB=====
'''Equivalent Rectangular Bandwidth''' (ERB) and ERB-rate formulae (both introduced some time after most mentioned critical-band and bandwidth scales)
approximate the relationship between the frequency and ear's according critical bandwidth.
More specifically, they do so using a modeled rectangular passband filter {{comment|(with the same pass-band center as the auditory filter it models, and has similar response to white noise)}}.
For low frequencies (below ~500Hz), ERB bandwidth functions will estimate bandwidth noticeably smaller than critical bandwidth does.
There are a number of different investigations, measurements, and approximations to calculate an ERB.
The formulae most commonly used seem to be from (Glasberg & Moore 1990){{verify}}.
=====Bark scale=====
The term '''Bark scale''' (somewhat fuzzily) refers to most critical band rate functions.
The Bark unit was named in reference to Heinrich Barkhausen (who introduced the phon).
Using bark as a unit was proposed in (Zwicker 1961), which was also perhaps the earliest summarizing publication on critical band rate (and makes a bark-unit-is-approximately-a-bandwidth note), and which reports a number of centers, edges.
Bark is related to, but somewhat less popular than the Mel scale.
The value of the Bark scale in tables often goes from 1 to 24, though in practice it can be a function meant to be usable from 0 to 25. {{comment|(the 24th band covers a band up to 31kHz. The 25th is easily extrapolated, and useful to e.g. deal with 44kHz/48kHz recordings)}}.
=====Mel scale=====
There is also the '''Mel scale''', which like Bark was based on empirical data, but has slightly different aims
* Frequency warping scale that aims for linear perceptual pitch units {{verify}}
* Defined as <tt>f(hz) = 1127.0148 * log<sub>e</sub>(1+hz/700)</tt>
* Which is the same as <tt> 2595 * log<sub>10</sub>(1+hz/700)</tt>
* and some other definitions (approximation or exactly?)
* The inverse (from mel to Hz) is <tt>f(m)  = 700 * (e<sup>m/1127.0148</sup> - 1)</tt>
<!--
Note also that when a sound with two frequencies close enough to beat are perceived to give a somewhat unpleasant roughness to the sound, which is related to the (size of the) relevant critical bandwidth (quantification of this effect is slightly harder [http://www.music.sc.edu/fs/bain/atmi02/cb/index.html], also in part because of the related frequency masking).
-->
=====Various approximating functions=====
{{stub}}
[[Image:Plot.freq.png|right|330px|(early version of a graphs of various related functions)]]
'''CB rate'''
Bark according to Zwicker and Terhardt 1980:
13*arctan(hz*0.00076) + 3.5*arctan((hz/7500)<sup>2</sup>)
* error varies, up to ~0.2 Bark
Bark according to Traunmuller 1990:
f(hz) = (26.81*hz)/(1960.0+hz) - 0.53
* seems more accurate than the previous, particularly for the 200-500Hz range
* seems to be the more usual function in use
'''ERB rate'''
11.17 * log<sub>e</sub>( (hz+312)/(hz+14675) ) + 43.0
(mentioned at least in Moore and Glasberg 1983)
'''Bandwidth'''
Very simple rule-o-thumb approximation ('100Hz below 500Hz, 20% of the center above that'):
max(100,hz/5)
(Zwicker Terhardt 1980, or earlier?)
25 + 75*( 1 + 1.4*(hz/1000.)<sup>2</sup> )<sup>0.69</sup>
...with bark (and not Hz) as input:
52548 / (z<sup>2</sup> - 52.56z + 690.39)
6.23*khz<sup>2</sup> + 93.39*khz + 28.52
=====See also (bandwidths)=====
* Traunm&#xFC;ller (1990) "{{search|Analytical expressions for the tonotopic sensory scale}}", J. Acoust. Soc. Am. 88: 97-100
* Glasberg, Moore (1990), "{{search|Derivation of auditory filter shapes from notched-noise data}}", Hear. Res. 47, 103-138
* Moore, Glasberg (1987) "{{search|Formulae describing frequency selectivity as a function of frequency and level, and their use in calculating excitation patterns}}"
* Moore, Glasberg (1983) "{{search|Suggested formulae for calculating auditory-filter bandwidths and excitation patterns}}", J. Acoust. Soc. Am. 74: 750-753
* Zwicker, Terhardt (1980), "{{search|Analytical expressions for critical-band rate and critical bandwidth as a function of frequency}}", J. Acoust. Soc. Am., Volume 68, Issue 5, pp.1523-1525
* Zwicker (1961) "{{search|Subdivision of the audible frequency range into critical bands (Frequenzgruppen)}}", J. Acoust. Soc. Am. 33: 248
* http://en.wikipedia.org/wiki/Bark_scale
* http://en.wikipedia.org/wiki/Mel_scale
* http://en.wikipedia.org/wiki/Equivalent_rectangular_bandwidth
Unsorted:
* http://www.ling.su.se/staff/hartmut/bark.htm
* http://www.sfu.ca/sonic-studio/handbook/Critical_Band.html
<!--
* L L Beranek (1949) ''Acoustic Measurements''
-->
* http://en.wikipedia.org/wiki/Equivalent_rectangular_bandwidth
* http://www.ling.su.se/staff/hartmut/bark.htm
====Filterbanks====
{{stub}}
The idea here is to get a basic imitation of the ear's response throughout the frequency scale
While the following implementation is still a relatively basic model of the human cochlea,
it reveals various human biases to hearing sound, and as such are quite convenient for a number of perceptual tasks.
The common implementation is a set of passband filters, often with overlapping reponse, to model response to different frequencies.
On eh (~34mm-long) basilar membrane, a sound will tend to excite ''roughly'' 1.5mm of it at a time,
which leads to the typical choice of using 20 to 24 of these filters.
That number of centers, and the according bandwidths (which are not exactly based on the 1.5mm) vary between different models' assumptions,
and on the frequency linearization/warping used and the upper limit on frequency that you want the model to include.
{{comment|(For a sense of size: the more important bandwidths are usually on the order of 100 to 1000Hz wide)}}
That frequency resolution may seem low, but is quite decent considering the size of the cochlea.
The Just Noticeable Differences seem to be sub-Hz at a few hundred Hz,  up to ten Hz at a few kHz,  up to perhaps ~30Hz at ~5-8kHz.
Reports vary considerably, probably because pure sines are much easier than sounds with timbre, vibrato, etc. so it can easily double over these values. {{verify}}
Regardless, note that this is much better than the ~1.5mm excitation suggest; there is obviously more at work than ~24 coefficients, and for us, much of this happens in post-processing in the brain.
===Hearing damage===
{{stub}}
<!--
Varies with both frequency content and loudness. As to loudness: 110-130dB can be harmful after a few few minutes (at most a few dozen minutes) per day, 85-110dB can be harmful after hours per day.
This matters mostly to work environments, but also to maxed out mp3 players and such, depending on your headphones somewhat.
Loud noise, ''any'' loud music, or any loud sound can make the ear protect itself, which may return after a few minutes, hours, or even up to a day or two, depending on the intensity and duration of the exposure. Note that you don't hear not because of damage, but because your ear protected itself - but also that this protective reaction in your ears is imperfect and not a long-term solution.
High frequencies carry most of the energy in complex signals and so are more si
-->
<!--
===Other ear properties===
OAE (Otoacoustic emissions)
* low-volume sounds produced by the cochlea
* some spontaneous, some evokable
* evoked OAE is useful to non-invasively test for hearing defects, also for subjects that cannot cooperate (babies, etc.)
* http://en.wikipedia.org/wiki/Otoacoustic_emission
* Types:
** SOAE:  Spontaneous OAE, generated without a triggering acoustic stimulus
** TOAE/TEOAE: Transient (Evoked) OAE, generated in response to a very short stimuli
** DPOAE: Distortion Product OAE: response to 2 simultaneous tones of close frequency
** SFOAE: Sustained-Frequency OAE: response to a continuous tone
-->


==Masking==
==Masking==

Revision as of 15:04, 6 September 2023

The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical terms
MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

Psycho-acoustics is a study of various sound response and interpretation effects that happen in the source-ear-brain-perception path, particularly the ear and brain.

There are various complex topics in (human) hearing. If you mostly skip this section, the concepts you should probably know about the varying sensitivity to frequencies, know about masking and such, and know that practical psycho-acoustic models (used e.g. for things like sound compression) are mostly a fuzzy combination of various effects.


Results of said physiology, models

Masking

Other psychoacoustic effects

Localization

Selective attention

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Auditory illusions

See also


  • Brian Moore, "Introduction to the psychology of hearing"
  • H. Fastl, E. Zwicker, "Psychoacoustics: Facts and Models" (relatively mathematical)

Unsorted: