Human hearing, psychoacoustics: Difference between revisions

From Helpful
Jump to navigation Jump to search
Tag: New redirect
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{avnotes}}
#redirect [[Sound physics and some human psychoacoustics]]
 
Psycho-acoustics is a study of various sound response and interpretation effects that happen in the source-ear-brain-perception path, particularly the ear and brain.
 
There are various complex topics in (human) hearing. If you mostly skip this section, the concepts you should probably know about the varying sensitivity to frequencies, know about masking and such, and know that practical psycho-acoustic models (used e.g. for things like sound compression) are mostly a fuzzy combination of various effects.
 
 
==Results of said physiology, models==
 
 
===Loudness perception===
 
====Perceptual loudness of frequencies====
Physical reality can be described in terms of pressure or intensity, but human perception of the same sounds sounds cannot.
 
The most important detail is probably the fact that we hear the same physical amplitudes of different frequencies as different intensities. For an extreme example, a pure 40Hz tone needs to be about 45dB louder than a pure 4000Hz tone to be perceived as just about as loud {{comment|(note that 45dB is roughly a multiple of 30000 of energy used)}}.
 
[[Image:EqLoudCon.png|thumb|270px|Equal loudness contours]]
 
This difference is caused by the way our ears work. This has been measured several times, notably by Fletcher and Munson (1933), and a little more accurately by Robertson and Dadson (1956), and more accurately since that (see e.g. ISO 226:2003 for details).
 
These tests used near-perfect listening conditions, middle-aged people, and perhaps most importantly, pure tones. This last detail limits their value of direct application in the face of complex signals, noise, temporal psycoacoustics (pulses, listener fatigue), and such.
 
 
The test results are often viewed as a graph of equal loudness contours, which indicate the perceived difference when a (simple) sound at a particular db(SPL) level changes frequency.
A different way to see the contours is the amount of amplitude change the ear applies for a tone at a frequency and amplitude.
 
 
There are of course other effects on how you hear each frequency, some from the environment and physics (interference, absorption), some psychoacoustic (frequency masking, temporal masking, listener fatigue, reaction to pulses), some related to quality of your reproduction hardware (headphones often offer better detail than speakers, cheap sound cards may actually alias) and others.
 
 
There are other effects that you could call psychoacoustic. For example, our judge of how loud a sound system is depends not only on how loud it ''is'' but also on how much it is distorting.
 
 
[[Image:Inverse-logHz.gif|right|approximate equal loudness adjustments (graphed with logarithmic Hz scale)]] If you digitize the equal loudness curves, flip it to gear it towards subtracting 0dB to whatever amount of dB applies for a frequency, you'll get something like like the graph on the right. If you have the data this is based on, it's fairly simple to adjust post-FFT data for loudness.
 
 
See also:
* Fletcher, H., and Munson, W.A. ''Loudness, its Definition, Measurement, and Calculation''. Journal of the Acoustical Society of America, 5, (2), 82-108, 1933
 
* Robinson, D. W., and Dadson, R.S. ''A Redetermination of the Equal-loudness Relations for Pure Tones''. British Journal of the Applied Physics, 7, 166-181, 1956
 
 
 
<!--
The result is [http://crimson.scarfboy.com/eql/EqLoudCon.dig EqLoudCon.dig], which contains a few data curves. I'm not going for extreme accuracy, as that is a moot point anyhow as this works only for pure tones in an ideal environment - in reality, these curves are affected by sound setup, but also many more physical effects like masking.
 
Note that I manually extrapolated points at low and high values, making for some odd extremes. This is not meant as clean but as pragmatic. In practice, 1/f noise should be avoided in the lowest bands, and values up to 22khz is probably also a good idea. Above that is inaudible for almost everyone.
 
[http://crimson.scarfboy.com/eql/EqLoudCon.csv EqLoudCon.csv] is a comma separated value list, exported from the digitiser. The numbers per line are the x coordinate (in Hz) and the according values for the 80, 40, 10, 110, 90 and 60 dB equal loudness curves, on the dB (SPL) scale.
 
I set the option to interpolate more points into all the series. (More accurately described: All x coordinates from all curves are used in all curves, not just the one they come from; these values are interpolated)
 
Note that usual music listening levels are 80 to 90dB SPL.
 
[http://crimson.scarfboy.com/eql/Inverse.csv Inverse.csv] is meant to invert the above effect. That is to say, to turn objective data (for example, Fourier transformed data, previously digitised into a computer) into human-subjective power (this is eg. used by replaygain - in fact, I'm for a fair part reproducing what he did - and in a program like mp3gain ).
This was calculated via Excel because I was lazy. This particular inverse curve was calculated from a single curve, that of the 80dB and 90dB averaged. The values have been adjusted to range from negative something up to 0, so it can be used to adjust.
 
 
Ideal would be to approximate this data in a function, so that you can integrate this and quickly get the adjustment for any band of frequencies. But, lazy as I am, I'm just going to interpolate values at every whole Hz value from 1 to 24000Hz, and average it for the bands I'll need in whatever programs end up using this.
The result of this is [http://crimson.scarfboy.com/eql/perHz.csv.gz perHz.csv]. (gzipped for bandwidth reasons)
 
Note: If you're goint to be using this, you may want to take a look at the [http://replaygain.hydrogenaudio.org/rms_energy.html details (eg. for stereo RMS energy calculation) in David Robinson's proposed replay gain standard], or a good DSP book.
 
 
[http://sourceforge.net/projects/digitizer/ Engauge])
 
-->
 
<br clear="both"/>
 
 
 
 
====More adapted measures of loudness====
There are a number of filters/measures, most of which try to considering equal loudness curves, and/or bias to specific purposes.
 
Note that most are designed for simple signals, even if they are regularly used for complex ones (noise, music).
The main exception in this list is ITU-R 468, which is more valid on complex signals than most others.
 
 
Perhaps best known is '''dBA''' and to a lesser degree '''dBC''', commonly used to mechanically measure sound levels in roughly human terms. {{comment|(notation varies - dBA is also seen as dB(A), dBa, and other mild writing variations. Note that dBa has a different meaning in some contexts)}} [[Image:dbabc.png|right|330px|Adjustments according to db(A), db(B), db(C), fairly simple shapes]].
 
'''dB(A)''' seems intended to be a simple approximation of human frequency response at relatively quiet levels<!-- (apparently mostly accurate around 40 [[phons]])-->, and used e.g. in low-level noise pollution - things that are annoying, but below deafening.
 
That said, dBA's weighs the first few dozen Hz down a lot (~30-70dB) - which is a good way to have a device spec sheet pretend 50 / 60Hz humming is barely there (and won't rattle anything else) just because our ears are worse at this. 
 
Sound level meters on mixing panels and similar ''might'' be (roughly) A-weighed -- which is useful when focusing on vocals
but quite a poor indication of bass.
Assume level meters are not great at indicating bass (though the ''way'' they are wrong varies a bunch) until you know what that do.
 
 
 
'''dB(C)''' is meant to approximate the ear at fairly loud sound levels, and leaves in more low frequencies - but not enough to evaluate the effect of low bass.
It seems to often be used in traffic loudness measurements.
 
 
 
'''dB(Z)''' is flat for most of the spectrum (z refers to zero), but has more defined cutoff points than the 'flat' ratings left up to manufacturers.{{verify}}
: so not as useful for loudness perception unless it is well defined where those falloffs are - but apparently it ''should'' be a passband between 10Hz and 20kHz{{verify}}
: yet arguably the most useful to evaluate potential hearing damage
 
 
 
Old and/or specific-purpose:
 
'''dB(B)''' lies somewhere between A and C, both in terms of frequency response and intended loudness levels. You could say it roughly models the ear at medium sound levels - which arguably makes it more useful than either A or C.
Yet it seems to be rarely used, perhaps because there are better models there are better models that are not much more complex.
 
'''dB(D)''' was intended for loud aircraft noise. It has a peak around 6kHz that models how people sense random noise differently from tones, particularly around there {{verify}}.
 
'''dB(G)''' focuses primarily around sub-bass (20Hz) and is useful when measuring large slow movements, like wind turbines.
 
 
 
 
'''ISO 226''' has a more complex frequency response than dBA/B/C, with pre-2003 versions based on the Robertson-Dadson results, and the 2003 version being based on revisions based on more recent equal loudness tests.
 
 
'''[http://en.wikipedia.org/wiki/ITU-R_468_noise_weighting ITU-R 468]''' {{comment|(note: [[ITU-R]] used to be CCIR)}}, is a better approximation for noise and complex signals {{comment|(dBA and its family were designed for pure tones)}}, and also models our reduced sensitivity to short bursts and clicks to some degree. R468 has seen a lot of use in some specific fields.
 
 
 
'''[[Loudness meters]]''' vary in design. They may be unfiltered, use dBA, dBC, and sometimes other filters, they may be slow or fast in response to large level differences (e.g. a peak programme meter is slower, almost ignoring few-millisecond peaks, since human hearing is tolerant of short distortion), decay slower to give a multi-tasking broadcast operator more of a chance to get an idea of recent peaks, and have other variations.
 
 
 
Various weighings have relatively simple circuits, e.g. https://web.archive.org/web/20210507071115/https://sound-au.com/project17.htm
 
 
 
 
 
'''Phons''': At 1kHz, 1 phon is defined as 1 dB SPL. For other frequencies it is adjusted following the equal loudness curves.
: Phons and sones were proposed perceptual loudness units / experiments, and neither are in particularly regular use other than being referenced by some definitions. <!-- (If you see either in advertizing, you may want to be careful - intentional or not, it makes comparisons harder.)-->
 
 
'''Sones''': A phon-based exponential scale with base two. The definition says that at 1kHz, 1 sone is 40 phons {{comment|(probably because that makes 1 sone a practical quiet sound rather than a theoretical hearing limit)}}. The idea behind the base two is that a perceptual doubling of intensity (~10dB) means a doubling of the sones:
  sones    phons / dBSPL@1kHZ
    0.5            30 (verify)
    1              40
    2              50
    4              60       
    8              70
  16              80
  32              90
  64              100   
  128              110   
  256              120   
  512              130
1024              140
(For frequencies other than 1kHz this must be adjusted according to equal loudness curves).
 
 
 
See also:
* http://en.wikipedia.org/wiki/A-weighting
* http://en.wikipedia.org/wiki/ITU-R_468_noise_weighting
* http://en.wikipedia.org/wiki/ITU-R_BS.468
* http://en.wikipedia.org/wiki/Phon
* http://en.wikipedia.org/wiki/Sone
 
And perhaps:
* https://midimagic.sgc-hosting.com/spldose.htm
* http://www.phys.unsw.edu.au/jw/dB.html
* http://www.cross-spectrum.com/audio/weighting.html
* http://www.kolumbus.fi/iain.churches/ThermionicThoughts/TubeAmpNoise.html
 
===Frequency perception===
{{stub}}
 
In music, tones are usually seen in a way that originates in part from scientific pitch notation, in which an octave is a doubling of frequency. This already illustrates the non-linear nature of human hearing, but is in itself not actually an accurate model of human pitch perception.
 
Accurate comparison of frequencies and frequency bands can be an involved subject.
 
 
Perceived equal frequency intervals/distances are not easily caught in a more complex function. If you've ever heard a frequency generator slowly and linearly increase the frequency, you'll know that it sounds to us like fast changes at the start and past ~6KHz it's all a slightly changing high beep.
 
In addition, or perhaps in extension, the accuracy with which we can can judge similar tones as not-the-same changes with frequency, and is also not trivial to model.
 
Frequency warping is often applied to attempt to linearize perceived pitch, something that can help various perceptual analyses and visualizations.
 
 
The critical bandwidth increases and the pitch resolution we hear decreases.
 
The non-linear nature of frequency hearing, the existence and the approximate size of critical bandwidths is useful information when we want to model our hearing.
 
It is also useful information to things like lossy audio compression, since it tells us that spending equal space on perceived tones means we should expend coding space non-linearly with frequency.
 
 
====For background: Mathematical frequency intervals====
{{stub}}
 
In the mathematical '''scientific pitch notation''', an '''octave''' refers to a doubling of the frequency, '''cents''' are a (log-based) ratio defined so that there are 1200 steps in an octave, of a multiple 2<sup>1/1200</sup> each.
 
Note that the human Just Noticeable Difference tones is around 5 cents,
although there are reasons that ''tuning'' should ideally be more accurate than that, particularly for instruments with overtones for there to be more audible dissonance in {{verify}}.
<!--
In equal temperament tuning (the one common and modern in the west), a '''semitone''' refers to exactly 100 cents (so factor 2<sup>(1/12)</sup>, approximately a factor 1.059).
 
Note that other tunings may be more complex. They focus on consonance and dissonance in different ways, for example making specific chords (or musical styles/system) sound a little better, at the cost of others.  This was once the only thing we knew, but we wanted to get rid of the bad cases.
-->
 
 
Notes can be referenced in a note-octave, a combination of semitone letter and the octave it is in.
 
This scale is typically anchored at A4, typically settling that as 440Hz - concert pitch, but there are variations.
 
The limit of 88 keys on most pianos, good for seven and a half octaves, comes from a "...that's enough" (88-key is A0 to C8 - below A0 it's more of a rumble, above C8 it's shrill, though there are some slight extensions). That makes its middle C be C4<!--(261.6Hz)-->.
 
<!--
Simpler have fewer keys. Keyboards are often smaller yet. Few pop songs use more than three or four, and they can easily transposed anyway.
-->
 
 
<!--
* A keyboard is often approximately centered around middle C and often have a range of around 7 octaves (as in the relatively common 88 keys keyboards)
-->
 
See also:
* http://en.wikipedia.org/wiki/Octave
* http://en.wikipedia.org/wiki/Cent_(music)
* http://en.wikipedia.org/wiki/Semitone
* A lot of music theory
 
====Bandwidths, frequency warping, and more====
{{stub}}
 
Note that studies and applications in this area usually address multiple things at once, usually one or more of:
* estimation of the width of the critical band of a band / at a frequency
* frequency warping (Hz to a perceptually linear scale). {{comment|(Note that a fitting critical band data can often be a decent approximate warper)}}
* working towards a filterbank design that is an accurate model in specific or various ways (there are a number of psychoacoustic effects that you could wish to model, or ignore to keep things simple)
 
 
Also useful to note:
* since some terms refer to a general concept, it may refer to different formulae, and to different publications that pulls in more or less related details than others.
 
* There are a good amount of convenient approximations around, which tend to confuse various summaries (...such as this one. I'll warn you against assuming this is all correct.)
* while bands are sometimes reported as given widths around given centers, particularly for approximate filterbank designs, bands do really not have fixed positions. It is more accurate to consider a function that reports a width for arbitrary frequencies.
 
* There is often some rule-of-thumb knowledge, and various models and formulae that are more accurate than either of those - but many formulae are still noticeably specific and/or inaccurate, often mostly for lowish and for high frequencies.
<!--
* For relatively higher frequencies, a critical bandwidth is somewhere between a whole tone and a third of an octave wide.
-->
 
It's useful to clearly know the difference between '''critical band rate''' functions (mostly a frequency warper, a function from Hz to fairly synthetic units) and '''critical bandwidth''' functions (estimation of the bandwidth of hearing at given frequency, a Hz&rarr;Hz function). They are easily confused because they have similar names, similarly shaped graphs, and the fact that they approximate rather related concepts.
 
A quite understandable source of confusion is that in many mentions of critical band rate, it is noted that the interval between whole Bark units corresponds to critical bandwidths. This is approximately true, but approximately at best. It is often a fairly rough attempt to simplify away the need for a separate critical bandwidth function. One of the largest differences between these two groups seems to be the treatment under ~500Hz, and details like that ERB lends itself to filterbank design more easily.
 
Another source of confusion is naming: Earlier models often have a name involving 'critical band', later and different models often mention ERB (equivalent rectangular bandwidth), and it seems that references can quite fuzzily refer to those two approximate sets, the whole, or sometimes the wrong ones, making these names rather finicky to use.
 
 
If you like to categorize things,
you could say that you have CB rate functions (bark units),
ERB rate function (ERB units),
and from both those areas there are bandwidth functions.
 
 
 
=====Critical bands=====
 
Critical band rate:
* Primarily a frequency warper
* Units: input in Hz, output (usually) in Bark
* often given the letter z
* approximately linear to input frequency up to ~200Hz (~0 to 2 bark)
* approximately linear to log of input frequency for ~500Hz..10kHz (~5 to 22 bark)
 
Critical bandwidth:
* a function to approximate the width of a critical band at a given frequency (Hz->Hz)
* Units: input in Hz, output in Hz
* useful for model design, usually along with a frequency warper
* approximately proportional to input frequency up ~500Hz {{comment|(sized ~100Hz per band)}}
* approximately proportional to log of input frequency over ~1kHz {{comment|(sized ~20% of center frequency)}}
 
 
=====ERB=====
 
'''Equivalent Rectangular Bandwidth''' (ERB) and ERB-rate formulae (both introduced some time after most mentioned critical-band and bandwidth scales)
approximate the relationship between the frequency and ear's according critical bandwidth.
 
More specifically, they do so using a modeled rectangular passband filter {{comment|(with the same pass-band center as the auditory filter it models, and has similar response to white noise)}}.
 
For low frequencies (below ~500Hz), ERB bandwidth functions will estimate bandwidth noticeably smaller than critical bandwidth does.
 
There are a number of different investigations, measurements, and approximations to calculate an ERB.
The formulae most commonly used seem to be from (Glasberg & Moore 1990){{verify}}.
 
 
 
 
=====Bark scale=====
 
The term '''Bark scale''' (somewhat fuzzily) refers to most critical band rate functions.
 
The Bark unit was named in reference to Heinrich Barkhausen (who introduced the phon).
 
Using bark as a unit was proposed in (Zwicker 1961), which was also perhaps the earliest summarizing publication on critical band rate (and makes a bark-unit-is-approximately-a-bandwidth note), and which reports a number of centers, edges.
 
 
Bark is related to, but somewhat less popular than the Mel scale.
 
The value of the Bark scale in tables often goes from 1 to 24, though in practice it can be a function meant to be usable from 0 to 25. {{comment|(the 24th band covers a band up to 31kHz. The 25th is easily extrapolated, and useful to e.g. deal with 44kHz/48kHz recordings)}}.
 
 
=====Mel scale=====
 
There is also the '''Mel scale''', which like Bark was based on empirical data, but has slightly different aims
* Frequency warping scale that aims for linear perceptual pitch units {{verify}}
* Defined as <tt>f(hz) = 1127.0148 * log<sub>e</sub>(1+hz/700)</tt>
* Which is the same as <tt> 2595 * log<sub>10</sub>(1+hz/700)</tt>
* and some other definitions (approximation or exactly?)
* The inverse (from mel to Hz) is <tt>f(m)  = 700 * (e<sup>m/1127.0148</sup> - 1)</tt>
 
 
<!--
Note also that when a sound with two frequencies close enough to beat are perceived to give a somewhat unpleasant roughness to the sound, which is related to the (size of the) relevant critical bandwidth (quantification of this effect is slightly harder [http://www.music.sc.edu/fs/bain/atmi02/cb/index.html], also in part because of the related frequency masking).
-->
 
 
 
=====Various approximating functions=====
{{stub}}
[[Image:Plot.freq.png|right|330px|(early version of a graphs of various related functions)]]
 
 
'''CB rate'''
 
Bark according to Zwicker and Terhardt 1980:
13*arctan(hz*0.00076) + 3.5*arctan((hz/7500)<sup>2</sup>)
* error varies, up to ~0.2 Bark
 
 
Bark according to Traunmuller 1990:
f(hz) = (26.81*hz)/(1960.0+hz) - 0.53
* seems more accurate than the previous, particularly for the 200-500Hz range
* seems to be the more usual function in use
 
 
 
'''ERB rate'''
 
11.17 * log<sub>e</sub>( (hz+312)/(hz+14675) ) + 43.0
(mentioned at least in Moore and Glasberg 1983)
 
 
'''Bandwidth'''
 
Very simple rule-o-thumb approximation ('100Hz below 500Hz, 20% of the center above that'):
max(100,hz/5)
 
(Zwicker Terhardt 1980, or earlier?)
25 + 75*( 1 + 1.4*(hz/1000.)<sup>2</sup> )<sup>0.69</sup>
 
...with bark (and not Hz) as input:
52548 / (z<sup>2</sup> - 52.56z + 690.39)
 
 
6.23*khz<sup>2</sup> + 93.39*khz + 28.52
 
=====See also (bandwidths)=====
 
* Traunm&#xFC;ller (1990) "{{search|Analytical expressions for the tonotopic sensory scale}}", J. Acoust. Soc. Am. 88: 97-100
 
* Glasberg, Moore (1990), "{{search|Derivation of auditory filter shapes from notched-noise data}}", Hear. Res. 47, 103-138
 
* Moore, Glasberg (1987) "{{search|Formulae describing frequency selectivity as a function of frequency and level, and their use in calculating excitation patterns}}"
 
* Moore, Glasberg (1983) "{{search|Suggested formulae for calculating auditory-filter bandwidths and excitation patterns}}", J. Acoust. Soc. Am. 74: 750-753
 
* Zwicker, Terhardt (1980), "{{search|Analytical expressions for critical-band rate and critical bandwidth as a function of frequency}}", J. Acoust. Soc. Am., Volume 68, Issue 5, pp.1523-1525
 
* Zwicker (1961) "{{search|Subdivision of the audible frequency range into critical bands (Frequenzgruppen)}}", J. Acoust. Soc. Am. 33: 248
 
* http://en.wikipedia.org/wiki/Bark_scale
 
* http://en.wikipedia.org/wiki/Mel_scale
 
* http://en.wikipedia.org/wiki/Equivalent_rectangular_bandwidth
 
Unsorted:
* http://www.ling.su.se/staff/hartmut/bark.htm
* http://www.sfu.ca/sonic-studio/handbook/Critical_Band.html
 
 
<!--
* L L Beranek (1949) ''Acoustic Measurements''
-->
 
 
* http://en.wikipedia.org/wiki/Equivalent_rectangular_bandwidth
* http://www.ling.su.se/staff/hartmut/bark.htm
 
====Filterbanks====
{{stub}}
 
The idea here is to get a basic imitation of the ear's response throughout the frequency scale
 
While the following implementation is still a relatively basic model of the human cochlea,
it reveals various human biases to hearing sound, and as such are quite convenient for a number of perceptual tasks.
 
 
The common implementation is a set of passband filters, often with overlapping reponse, to model response to different frequencies.
On eh (~34mm-long) basilar membrane, a sound will tend to excite ''roughly'' 1.5mm of it at a time,
which leads to the typical choice of using 20 to 24 of these filters.
 
That number of centers, and the according bandwidths (which are not exactly based on the 1.5mm) vary between different models' assumptions,
and on the frequency linearization/warping used and the upper limit on frequency that you want the model to include.
{{comment|(For a sense of size: the more important bandwidths are usually on the order of 100 to 1000Hz wide)}}
 
 
That frequency resolution may seem low, but is quite decent considering the size of the cochlea.
The Just Noticeable Differences seem to be sub-Hz at a few hundred Hz,  up to ten Hz at a few kHz,  up to perhaps ~30Hz at ~5-8kHz.
 
Reports vary considerably, probably because pure sines are much easier than sounds with timbre, vibrato, etc. so it can easily double over these values. {{verify}}
 
Regardless, note that this is much better than the ~1.5mm excitation suggest; there is obviously more at work than ~24 coefficients, and for us, much of this happens in post-processing in the brain.
 
===Hearing damage===
{{stub}}
<!--
 
Varies with both frequency content and loudness. As to loudness: 110-130dB can be harmful after a few few minutes (at most a few dozen minutes) per day, 85-110dB can be harmful after hours per day.
 
This matters mostly to work environments, but also to maxed out mp3 players and such, depending on your headphones somewhat.
 
Loud noise, ''any'' loud music, or any loud sound can make the ear protect itself, which may return after a few minutes, hours, or even up to a day or two, depending on the intensity and duration of the exposure. Note that you don't hear not because of damage, but because your ear protected itself - but also that this protective reaction in your ears is imperfect and not a long-term solution.
 
 
High frequencies carry most of the energy in complex signals and so are more si
 
-->
 
 
<!--
===Other ear properties===
OAE (Otoacoustic emissions)
* low-volume sounds produced by the cochlea
* some spontaneous, some evokable
* evoked OAE is useful to non-invasively test for hearing defects, also for subjects that cannot cooperate (babies, etc.)
* http://en.wikipedia.org/wiki/Otoacoustic_emission
* Types:
** SOAE:  Spontaneous OAE, generated without a triggering acoustic stimulus
** TOAE/TEOAE: Transient (Evoked) OAE, generated in response to a very short stimuli
** DPOAE: Distortion Product OAE: response to 2 simultaneous tones of close frequency
** SFOAE: Sustained-Frequency OAE: response to a continuous tone
-->
 
==Masking==
<!--
Masking refers to the effect where the presence of one sound influences the perception of another, but often '''simultaneous masking''', referring to simultaneous content masking other parts of it out, primarily by frequency masking, the effect where a frequencies masks out softer content near it. for example, a 1.0kHz tone will make a 1.1kHz tone that is 20dB softer hard to distinguish.
 
* [http://en.wikipedia.org/wiki/Critical_bands Wikipedia: Critical Bands]
* [http://en.wikipedia.org/wiki/Bark_scale Wikipedia: Bark scale]
* [http://en.wikipedia.org/wiki/Equivalent_rectangular_bandwidth Wikipedia: Equivalent rectangular bandwidth]
* http://ccrma.stanford.edu/~jos/sasp/Equivalent_Rectangular_Bandwidth.html
* http://home.tm.tue.nl/dhermes/lectures/SoundAndVision/SoundAndVision_notes2.html
 
 
 
'''Temporal Masking''' refers to a temporary reduced perception of a tone played immediately before/after another.
 
Forward masking refers to a loud sound triggering reduced sensitivity to a somewhat softer one right after it, for up to perhaps half a second. This is largely based in the ear's ability to protect itself in reaction to loud sounds in the short term (also long, but that is just general reduced hearing).
 
There is also backward masking, where a loud sound ''after'' a softer wound drowns out the perception of the earlier one. The effect works because some of the processing involved is not perceived until perhaps 100ms after reception (frequency and time effects are processed somewhat separately), which is the time window in which this effect works.
 
 
==Other time-related effects==
More than approximately a dozen short sounds per second are harder to distinguish, which applies to impulses (above that, we start to hear it as a blur, and then a low hum) as well as clear frequencies (very fast piano is hard to distinguish).
 
 
 
==Subjective quality evaluation==
In the design of lossy signal processing (e.g. compression), transmission (e.g. phone systems), or 
 
* http://en.wikipedia.org/wiki/Sound_Quality
* http://en.wikipedia.org/wiki/Audio_quality_measurement
 
===Algorithms===
 
* PSQM - Perceptual Speech Quality Measure -- '''replaced with:'''
* PESQ - Perceptual Evaluation of Speech Quality [http://en.wikipedia.org/wiki/PESQ]
* PEAQ (Perceptual Evaluation of Audio Quality) [http://en.wikipedia.org/wiki/PEAQ]
 
http://en.wikipedia.org/wiki/PSQM
 
 
===Listening tests===
Listening tests are (often double-blind) tests that compare quality by measuring human judgement.
* http://en.wikipedia.org/wiki/Codec_listening_test
* http://en.wikipedia.org/wiki/Mean_Opinion_Score
* http://en.wikipedia.org/wiki/MUSHRA {{comment|(ITU standard)}}
* http://en.wikipedia.org/wiki/ABX_test
 
 
-->
==Other psychoacoustic effects==
<!--
'''Listener fatigue''' refers to listeners getting used to and tuning out noise content, a partly quantifiable effect.
 
 
'''Frequency selectivity''' (frequency resolution) is the the effect in which we hear some things as separate sounds and others (such as chords) more as complex but single sounds, which relates to simultaneous masking, but also to harmonic content and other details.
 
 
'''Localization''' is based on a difference in reception time, which implies a difference in phase.
 
A mild difference in frequency content can be caused by reflective/absorbing nature of obstacles, which can be used for higher-level conclusions such as that the sound is probably coming through a wall, but also for localization as our head and body are also such obstacles.
 
 
'''Source separation''' refers to the ability to assign frequency content to different sources and selectively focus/ignore on the production of a single source, such as following one conversation among multiple.
 
-->
===Localization===
<!--
Based on:
* relative loudness
* timing information
* phase information {{verify}}
and also:
* reflections from the outer ear
-->
 
 
===Selective attention===
{{stub}}
 
<!--
See also:
* http://en.wikipedia.org/wiki/Selective_attention
* http://en.wikipedia.org/wiki/Dichotic_listening
-->
 
===Auditory illusions===
<!--
 
The '''Haas effect''' refers to the brain concluding that sounds that would normally be perceived as coming from different origins may be perceived as coming from a single origin, when they arrive within perhaps 40 milliseconds. This seems to be related to a sensory echo cancellation effect that assists localization.
 
Sound engineers may specifically design for this effect when serving large areas, such as for public address systems and concerts.
 
http://en.wikipedia.org/wiki/Haas_effect
 
 
'''Missing fundamentals''', also known as '''phantom fundamentals''', refer to the effect where overtones suggest a fundamental frequency that the sound actually lacks. Since the brain uses the presence of overtones to make conclusions about the tones it hears, it may fill in the perception of a lower tone that is not physically present.
 
http://en.wikipedia.org/wiki/Missing_fundamental
 
Similarly, '''combination tones''' (also '''sum tones''', '''difference tones''', sometimes '''Tartini tones''') refer to certain simultaneous tones being perceived as having an additional tone {{comment|(where that additional tone's frequencty is the sum, or the difference between the frequencies of the real tones)}}
 
http://en.wikipedia.org/wiki/Combination_tone
 
 
'''Illusory tone continuity''' refers to the illusion that a tone is continued within a short piece of (spectrum-wide) noise, when that interruption is shorter than about 50ms.
 
http://en.wikipedia.org/wiki/Illusory_continuity_of_tones
 
 
 
 
 
* http://en.wikipedia.org/wiki/Auditory_illusion
 
* http://en.wikipedia.org/wiki/Combination_tone
 
 
==Perceptual filtering==
 
MFC, MFCC
 
 
 
-->
 
==See also==
* [http://en.wikipedia.org/wiki/Equal-loudness_contour Wikipedia: Equal loudness contour]
* http://www.phys.unsw.edu.au/jw/dB.html (Phons, Sones, dbA, dbC)
* http://www2.sfu.ca/sonic-studio/handbook/Phon.html (Phon)
* [http://en.wikipedia.org/wiki/A-weighting Wikipedia: A-weighing]
* [http://en.wikipedia.org/wiki/ITU-R_468_noise_weighting Wikipedia: ITU-R 468 noise weighting]
 
 
* http://en.wikipedia.org/wiki/Psychoacoustics
 
* http://en.wikipedia.org/wiki/Music_psychology
 
* Brian Moore, "Introduction to the psychology of hearing"
 
* H. Fastl, E. Zwicker, "Psychoacoustics: Facts and Models" (relatively mathematical)
 
Unsorted:
* http://is.rice.edu/~welsh/elec431/psychoAcoustic.html
* http://psysound.wikidot.com/
* http://www.phys.unsw.edu.au/jw/hearing.html Frequency response self-test (beware of aliasing sound cards, though)
 
 
[[Category:Audio, video, images]]

Latest revision as of 15:11, 6 September 2023