# Basic sound physics

 This page is in a collection about both human and automatic dealings with audio, video, and images, including Audio physics and physiology Basic sound physics Descriptions used for sound and music Human hearing, psychoacoustics Digital sound and processing Signal analysis, modeling, processing (some audio, some more generic) Image Video For more, see Category:Audio, video, images
 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

### Frequency, wavelength

Frequency is expressed in Hertz (Hz), a unit that itself means "(times 1) per second".

It is used, among other things...

• to indicate the amount of repeated cycles a periodic signal (often a sine wave or similar) repeats per second (eg. '3000Hz tone')
• (in recording) how many samples are taken per second ('sampled at 44100Hz'), or
• the regularity of clock ticks (e.g. in chip synchronization, CPU speed)

A wavelength (sometimes 'period') is the length a wave, often in time. For example, a 200Hz wave has a wavelength of 1/200 = 0.005 seconds.

When produced in the real world, wavelength also has a physical length, though this depends on the medium (and whether it's physical or EM).

For sound this is usually air, in which it travels at 343m/s, so that 200Hz wave would be about {{{1}}} 1.72 meters long. (whereas in underwater acoustics, sound travels at around 1500m/s, so a 200Hz would be about 7.5 meters long). Physical wavelength matters when talking about resonsance (e.g. which frequencies will resonate in a microphone, speaker design, room, mouth, etc.), and when engineering things for phase effects.

For digitally sampled data such as PCM it can be interesting to realize that a 200Hz tone sampled at 44100Hz has about 220 samples to express that wave, while, say, a 8000Hz wave is only 5 samples long. This has mild effects on accuracy, but more importantly, puts some constraints on sampling. See concepts such as aliasing.

Frequency role call

```~0.1-10Hz          Earthquakes (roughly primary and secondary waves respectively)

20Hz:              Roughly the lowest frequency you hear (rather than just feel - if strong enough)

<~25Hz             Seismic noise

20-40Hz:           Lowest produced frequency by subwoofers lies in this range

20-50Hz:           Cat's purr (some quote 'up to 200Hz')  [1]

40Hz-100Hz:        The lowest produced frequency by two-or-three-inch speakers drivers (pretty much regardless of price)
(They may produce audible but relatively negligible amounts. <40Hz are physically larger)

100Hz, 200Hz:      Usual lower limit of most male and female voices (respectively)

500Hz-1kHz:        the bulk of the volume in human speech

2kHz:              the highest frequency from instruments or very high-pitched vocal cords

(or rather base frequency, harmonicwise)
100Hz-5kHz         Bulk of energy from speech

5khz:              Roughly the point above which vocal cord harmonics start to have
fallen off so much that they matter little to intelligibility
upper limit of (historically typical) AM radio transmission

12kHz-14KHz:       Roughly the highest produced frequency by regular few-inch speakers drivers
most instrument harmonics have fallen off to little by this range

~16kHz:            People can tell whether there is a TV nearby by this (see note below)

16kHz-18kHz:       The limit of most recording equipment,
partly through the design of (common) microphones

15kHz-20kHz:       Threshold of frequencies various people can nearly/not hear

16kHz-22kHz:       dog whistles (intentionally partly in human hearing range)
```

Related notes:

• Recording media usually try to store frequencies in the few kHz to perhaps 16kHz range (if they have or want to spend the bandwidth), as that range contains the overtones that makes recorded signals sound crispy.
• Various music compression methods apply a lowpass filter which falls off somewhere around 15kHz-18kHz, since signals above that are generally inaudible.
• The squeak from (CRT) TVs that some people hear seems to come from the transformer driving the refresh, which since that rate is standard is almost always at about 15.7kHz(verify) (525 lines x 30 frames = 15750Hz or 625 lines x 25 frames = 15625Hz), though it can work out as a wider band of noise around that(verify).
• Fluorescent lights may have the same problems, but here the frequency is not influenced by any standard, and may be much higher than 15kHz. Whether fluorescent lights are annoying varies by design.
• The sounds that people say only teenagers hear depends a little is somewhere in the 16KHz-17.5KHz range, and may be hearable or annoying depending on the frequency and the amplitude. 15kHz at respectable volume will be annoying to more people.

• Names for rough ranges of frequencies vary - fairly wildly.
• bass is lower than 250~400Hz
• some make the further distinction of sub-bass, often as below 50Hz
• ...and may use 'upper bass' to refer to the rest of the bass range
• Mid range starts at some number in the 250-400Hz range, and up to some number in the 2-6kHz range
• some additionally split into 'high/upper mids' and 'low mids'
• high frequencies refer to everything abouve the chosen mid range to the edge of hearing, so usually goes from something in the 5-6kHz range to something in the 16-20kHz range

### Phase

Phase is the timing relation between a wave and some reference point, often an arbitrary zero point, or relative to another wave's starting point.

There are different ways of expressing this. One is an angle, in degrees (0 to 360) or radians (0 to 2*Pi), another is fractions (0 to 1). For example, a wave 90 degrees out of phase with another is (Pi/2) radians out of phase, and starts and a quarter of the wave's length (0.25) of the wave's length later.

In the context of tones of a specific frequency, you can also express this in time. For example, 90 degrees is a quarter of the wave's full cycle, so for a 1Hz wave this is 0.25 of a second. For a 2Hz wave 90 degrees is 0.125 of a second, etc.

Values outside the range representing a single wave are valid, but only sometimes directly useful. For example, 1244 degrees is three full waves and 164 degrees. for most purposes the 'three full waves' part is not relevant, and most things programs will (and often can) only report 164. (within a few design cases, you may want to keep results in which phase isn't made to wrap like this)

Phase matters when you mix signals. Two sine waves of the same frequency will add to somewhere between double the amplitude (same phase, purely constructive interference), zero (half a wavelength out of phase, purely destructive interference), and something inbetween for other values of the phase. Phase information is important to concurrent signals, and particularly to digital processing that recreates waveforms from component waveforms, which includes most compressed audio formats and many digital filters. Without phase, the sound would be recognizable, but wouldn't sound very good; it would give unpredictably constructive and destructive interference and other strange effects.

### Pressure, intensity, volume, loudness

Physics talks about force, pressure (force per area), and such. Listeners usually care about intensity and loudness. Any of them can be meant when you say 'volume.'

Note that sound is linear in that pressures and velocities of sources can be combined under simple addition.

Intensity is energy delivered per area, usually in Watt per square meter, W/m2 (sometimes in Watt per square centimeter, a factor ten thousand larger), which is both a useful and common physical measure and one that is directly proportional to energy delivered to a listener's eardrum area.

Loudness is purely perceptional, and somewhat fuzzy. It has been observed that when we listen to intensities, a factor two in intensities is a barely perceptible step louder, and a factor ten in intensities is heard as a solid step louder. Both suggest perceptual loudness is best modeled with an exponential scale.

#### The decibel

 This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
##### Math and the real world

(deci)bels are inspired by the fact that with humans, many senses (sound, liht, touch, etc) all seem perceived in a way where ever-larger steps (exponential) feel like a steady (linear) step up.

E.g. ten times more energy seems twice as loud, a hundred seems three times as loud. Approximately, anyway, this can be overstated - see Wikipedia: Stevens' power law, and also its criticisms.

The short version is that

• Bels are base-10 logarithms, e.g.
factor 0.1 is -1 Bel
factor 1 is 0 Bel
factor 10 is 1 Bel
factor 100 is 2 Bel
• Decibels are tenth of Bels, just because that puts most practical real-world quantities in the range of a few dozens of decibels rather than a few Bels
which is partly just about easier-to-pronounce numbers, yes
even though that means an extra step dealing with decibels over Bels
(but you get used to it because it's more memory than on-the-fly calculation anyway)
• Decibel by itself are purely relative, only comparing the difference in levels this way, and has no direct real-world meaning
until you tie one to it. Which you often do.

If you deal with decibels a lot, you may care to note that adding decibels is equivalent to multiplying their factors. For example,

• 16dB = 10dB+6dB ≈ 3*2 = factor 6
• 30dB = 20dB+10dB ≈ 10*3 = factor 30
• 50dB = 40dB+10dB ≈ 100*3 = factor 300, and so on.

You can split most real-world decibels into one of two types:

Those where quantities relate directly related to power

e.g. energy density, acoustic intensity, luminous intensity
known as power quantity, sometimes energy quantity

Those where the square of the quantity relates to power

e.g. voltage, current, sound pressure, field strength, charge density, speed,
known as root-power quantity (a term introduced in 2009 to be more clear than and replace...), field quantitiy

One of the initially most confusing things is that this seems to make for two types of decibels that seem to behave differently.

In particular, the "factor 10 is 1 bel" mentioned above is true for decibels of power quantities (e.g. Watts), where e.g.:

• factor 0.1 is -10dB
• factor 0.5 is approx -3dB
• factor 1 is 0dB
• factor 2 is approx 3dB
• factor 3 is approx 5dB
• factor 10 is 1 Bels, 10 dB
• factor 100 is 20 dB
• factor 1000 is 30 dB
• factor 1000000000000 is 120dB

Whereas for root-power quantities, (e.g. pressure, volts, current), e.g.

• factor 0.1 is -20dB
• factor 0.5 is approx -6dB
• factor 1 is 0dB
• factor 2 is approx 6dB
• factor 3 is approx 10dB
• factor 10 is 1 Bels, 20 dB
• factor 100 is 40 dB.
• factor 1000 is 60 dB.
• factor 1000000000000 is 240dB

These are not really different cases, it just comes from the fact that in both cases you're talking about the power behind them, but the latter does so via a unit that relates to power indirectly(verify).

Intuitively, consider that pressure is energy per area, e.g. W/m2, so if you've already divided it that way (...e.g. because that makes 'divide onto more area' linear and easy to work with), you need to square it before you're talking about energy again.

So:

```amplification   =  10 * Log10(Intensityeffective/Intensityreference)
```

For pressure and other field quantities (e.g. things in Volts):

```amplification   =  10 * Log10(Peffective2/Preference2)
=  20 * Log10(Peffective/Preference)
```

In definitions you'll only get the 20* line, but it's probably useful to mention/hint that the factor two comes from the way that a square carries out of a log.

##### Decibels and distance

Sound sent down a pipe will carry a long way, because it's only diminished by losses (mostly energy lost to the pipe and self-interference).

In many cases, though, things are much less focused, and you get spherical radiation, which is due to the inverse square law. Intuitively, the falloff comes from the need to excite all of that ever-increasing volume, which means decibel falls over distance in a very predictable way.

(Something very similar is true for many cases of light and radio waves and other EM, and more).

##### Sound, referenced by threshold of hearing

Decibels used on sound can refer to an attenuation (difference), or to sound levels, which implies absolute reference points so that the decibel value can indicate some amount of decibels more (or sometimes less).

For example, various amplifiers choose to not have "0 to 10" in handwavey units, and choose to e.g. place a 0dB reference at reasonably high power, and a range that marks down to perhaps -80dB just because that's about where you stop hearing things. But these are still not linked to the real world. (also these were historically approximate, referring to the implication of using a potmeter(verify))

For sound, we have SPL and SIL. And because their general function is the same and it'd be confusing to mix them, we almost always use dB SPL

Note it still depends on medium.

• dB SPL in air: Preference = 2 * 10−5Pascal (rms), = 20µPa (rms). {{{1}}}
• dB SPL in water: 1µPa (10−6) seems common

In air:

• 0dB SPL is the threshold of hearing of a 1-2kHz tone
• 20dB SPL is often described as gently rustling leaves
• 30dB SPL is a calm room, or whispering, a room fan at a few meters
• 50-60dB SPL is a regular conversation or radio at a few meters
• 60-80dB SPL is singing, the sound inside a car, a car at 1-10m, a loud computer server
• 80-90dB SPL is busy traffic, a subway, a blender, or pretty loud music - and starts being a factor in long-term hearing loss
• 100dB SPL is very loud music music, loud factory machinery and such
• 120dB SPL is a very loud concert (and 1Watt/m2)
• 130dB SPL is approximately the pain threshold
• 100-140dB SPL is a jet plane at 100m-30m
• 140dB SPL can do immediate and permanent damange to human ears under many circumstances (130dB is already pushing your luck, as are lower figures if exposed to them a lot)

Note that the difference between 'barely perceptive' and 'moderately loud' is ~70dB.

dB(A), dB(C), dB(B), db(D), refer to a summed volume after weighing different frequencies differently. A and C are best known, B and D are old and mostly unused

• dbA is probably the most common, and emphasizes the 3kHz..6kHz range, where the human ear is most sensitive.
meant primarily for for relatively quiet sounds, as it is based on the 40-phon Fletcher-Munson curve.
• dbC is
• dbB is
• dbD is not used often, meant for very loud

Some notes:

• Because people are lazy, they often omit SPL frpm dB SPL. You can usually tell by the context, and/or the fact that the values are probably too large to be the amplification a system applies (and only negative in some special cases - for example, NIST has an anechoic chamber rated at -9.4dBA. (dBA is adjusted for human hearing; see elsewhere)
• Little of all the energy around us is received by our ears. Only something on the order of a tenth of a percent of it is actually received by the ears (-30dB from the power level the body receives), and sound isn't directed only at our body to start with.
• Because most speakers are only somewhat directional, they effectively create a sphere of sound (commonly a hemisphere for speakers since they usually sit on the ground or stand near a wall). Each doubling of the distance halves the effective pressure, so the delivered power falls in an in inverse-square relation with distance; a quarter of the power for each doubling of the distance, which is about 6dB.

##### Others use
• Power, voltage:
• dBm (decibel milliwatt): 0dBm is referenced at 1 milliWatt (so e.g. -10dBm is 0.1 milliwatt, 30dBm is 1 Watt). Regularly seen measuring electromagnetic signal strength.
• dBV (decibel volt): 0dBV is referenced at 1V
• dBuV (decibel microvolt) - 0dBuV = 1 µV (mostly a scale convenience - dBV and dbuV are exactly 120dB apart)
• dBu (decibel volt) is used in sound systems. 0 dBu is referenced as 0.775 VRMS, for historic reasons[2]
• (apparently dBv (lowercase v) is sometimes the same as dBu (an american convention?) but is too confusable, really)
• dBi is a comparison in a specific context: it compares compares the antenna signal strength at a point against a theoretical fully-isotropic antenna (one with a spherical equal-energy-everywhere radiation) which by this definition has a 0 dBi gain.
e.g. there are 2dBi, 3dBi, 5dBi, WiFi antennas. These will be more directional in one direction - typically pole antennas with an increasingly-squished apple shape.
(Note that positive-dBi can per definition never be an isotropic antenna, as that would mean more energy comes out than comes in)
More is fairly unusual, at some point specific directional antennas become more practical.
• dBFS is decibel Full Scale, i.e. 0dbFS is the maximum possible strength, so all strengths are negative [3]
in digital audio, the convention is that 0dBFS is the peak value the hardware can handle

...and so on. See e.g. https://en.wikipedia.org/wiki/Decibel#Suffixes_and_reference_values

#### RMS

Consider a sine/cosine wave. Its amplitude refers to its peaks - but not a direct measure of the energy that is used for that signal.

To get such an effective power or effective pressure figure for a signal, you usually calculate its root mean square (RMS). (For simple sine/cosine signals, the RMS figure lies at about about 0.7 (sqrt(2)/2) of the peak of the waves.)

Note that:

• dB SPL figures are generally understood to be RMS measurements. (verify)
• the figure will be less accurate for more complex signals
• if you are showing figures for human consumption, you may want RMS after filtering for hearing

The shorter the sample that is smoothed over, the less accurate the power figure becomes, though it becomes more responsive to sudden spikes. Ideally, you should use a window size of at least two wavelengths of lowest frequency you wish to accurately measure, which generally means at least a few dozen milliseconds.

Having multiple channels (stereo or more) changes things, since phase-based destructive interference comes into play (sometimes intentional, sometimes not). In the case of music, signals tend to be roughly identical and constructive, but instead of mixing them, you may wish to calculate RMS per channel and combine the figures. It is always hard to estimate the delivered power, and particularly the perception.