Descriptions used for sound and music

From Helpful
Jump to navigation Jump to search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · noise reduction · halftoning, dithering · illuminant correction · Image descriptors · Reverse image search · image feature and contour detection · OCR · Image - unsorted

Video: format notes · encoding notes · On display speed · Screen tearing and vsync


Audio physics and physiology: Sound physics and some human psychoacoustics · Descriptions used for sound and music

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction


Digital sound and processing: capture, storage, reproduction · on APIs (and latency) · programming and codecs · some glossary · Audio and signal processing - unsorted stuff

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround
On the stage side: microphones · studio and stage notes · Effects · sync


Electronic music:

Electronic music - musical terms
MIDI · Some history, ways of making noises · Gaming synth · microcontroller synth
Modular synth (eurorack, mostly):
sync · power supply · formats (physical, interconnects)
DAW: Ableton notes · MuLab notes · Mainstage notes


Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

Physical effects and/or fairly well studied

Attenuation

Attenuation in the widest sense refers to the concept in physics where loss of energy (i.e. amplitude reduction) occurs in a medium.

Be it intentional or not - e.g. both sound and a wifi signal are attenuated by a wall, and an attenuator as a device just makes a signal less loud (usually a volume knob that is part of something else).

A reduction in energy (amplitude) of physical energy, or representing signal.


Attenuation is often measured in decibel.

In some contexts it is decibel per length measure or such, for example to specify expected signal loss in electrical wiring, or perhaps in sound isolation.


In electrical signal transmission, it can refer to problems relating to analog transmission over larger distances, and can be related to the expectable SNR (though there are more aspects to both signal and noise in transmission).


Physical attenuation often also varies with frequency, in which case you can make a graph, or give an average in the most relevant frequency region.

For example,

  • attenuation is the major reason we hear our own voice differently on recordings: we hear a good part of the lower frequencies through our body, while others only hear us through air (another reason is that some frequencies make it more directly to our ears)
  • microphones with stands made just of hard materials throughout are likely to pick up the vibrations of the things they stand on, which anything or anyone not in direct contact won't hear
  • materials used for sound insulation can be seen as bandstop filters (often relatively narrowband)


See also:

Tone versus noise content

Reflection, absorption, echo, reverb

Sound hitting a hard surface will be reflected.

Larger rooms are likely to be mostly hard (and also to have reverb)


An echo is an easily identifiable and usually quite singular copy of a sound, arriving later because it was reflected.

The delay is a significant aspects. Near walls, it is minimal, and you may easily receive more energy from reflections than from a source directly. (also note that localization is not affected horribly much)


When many echoes combine to be blurred and hard to identify, this is called reverb.


For some more technical notes, see Electronic_music_-_audio_effects#Delays_and_reverb

Sound field descriptions

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Note that:

  • These describe environments instead of sound qualities,
...yet often still relate to qualities, like how many relate to reverb somehow.
  • 'Sound field' usually refers to a specific area (or rather volume)
  • Note that some of these are more standardized terms (see e.g. ISO 12001) than others.
  • Some of them apply to any waves, so also very much apply to EM, e.g. near field and far field


A free field refers to environments where sound is free to propagate without obstruction. (In practice the most relevant objects are reflective surfaces (like walls), so free field often used to mean lack of reverb - but also other implied effects such as room modes)


Direct field describes environments where you get sound with little to no reflections.


Reverberant field describes environments with at least some reflections.


A diffuse field describes an environment with no preferred direction. A specific (and common enough) case of that is that there are so many reflections that it's more or less uniform. (can also be used to refer to light and other EM)


Most rooms are diffuse and reverberant rather than direct or free, yet with the world of variation.

For example, empty rooms, cathedrals, and gyms and such have noticeably more reverb (larger ones with parallel walls get what is called flutter echo) than rooms filled with randomly shaped and/or soft objects to scatter or absorb sound.

Anechoic chambers are rooms that attempt to remove all echo and reverb, to simulate a space with only the source in question, and at the same time have the environment act as a free field. It is typical to see porous, wedge-shaped sound absorbers (in part because the alternative is to have a huge space - and still some absorption).


Near field is the area around an emitter close enough where the area of emitter still has some effect to how it is received (since all of it emits the sound), via interference and phase effects, and that physically, the pressure sound pressure and velocity are not in phase.

This also tends to imply the volume-per-distance dropoff (usually 6dB per increase) goes a little funny close to an object
size of the near field varies with frequency and sound source size
which is e.g. relevant for microphones specifically used for nearby voices
A 'near-field monitor' (which should actually be called direct field monitors, but studio engineer consider the two the same thing) means placing speakers near you so that the majority of what you hear comes from the speakers rather than room reverb - which is rather useful in mastering/mixing.


Far field is "far enough that the near field effect doesn't apply". Note that there will be a transition between the two, and where that is depends on frequency.

Resonance

Diffraction

Amplitude modulation (a.k.a. tremolo)

Frequency modulation (a.k.a. vibrato)

That is, very mild frequecy modulation is vibrato.

Stronger or more complex FM does more interesting sounds - see FM synthesis

Amplitide envelope (attack, decay, sustain, release)

(also in terms of attention)

http://en.wikipedia.org/wiki/ADSR_envelope


Harmonic content

Beat and tempo

The terminology around beat is often used a little fuzzily, and some of it matters more to performance or rhythmic feel, so in more basic description you care first about pulse, the regularity of the beats regardless of precise rhythmic use.


Which, for a lot of techno and other electronic music, is just every beat. For some other music styles it is a somewhat more complex thing, with short-term and longer-term patterns. Which sometimes get so crazy humans have trouble describing it, or even feeling it.


The tempo of most music lies within the 40-200 beats per minutes (BPM). The median varies with music style, but often somewhere around 105 BPM.




Computing BPM
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


The simplest form to detect tempo of music is to focus entirely on the punchy bassy beat.


The simplest form of that may be to look for onsets, after some heavy lowpassing/bandpassing (leaving mainly 50-100Hz) - so basically just the sudden increase in amplitude.

Onsets are a simple approach because they take away a lot of complex frequency stuff, and also allow you to focus on the slower stuff - after all, 60BPM is one thing per second and 180 BPM still just looking at 300ms-long things.


And it should work decently on techno and such, but is harder on more complex sound.

Research into human judgment of onsets is complex and ongoing, and onsets don't always match the perception of tempo anyway - consider e.g. blues with guitars, where fast strumming being clear and periodic onsets, but often a factor faster than the pacing of the measures or vocals, and the way we perceive it due to style.


Methods may implicitly assume a straight beat, so fall apart around blues shuffles, swing, use of triplets, stronger off-beats, syncopation, and basically any more interesting rhythm.

Some of that can be fixed by trying to detect the pulse, with some basic assumptions, which is closer to what you want but also fundamentally more involved.

And if you're going to try to detect measures/bars, then you probably want to consider downbeat detection, detecting which beat is first in each measure. And know this involves more and more music theory and assumptions, and will fail for some musical styles.


Approaches include

  • lowpass, onset detection, post-processing
Most onsets are easy to detect
...in beat driven music. Others do not have clear onsets
Not all tempo is defined by onsets
Changing tempo makes things harder
live playing also makes things harder


Autocorrelation of energy envelopes

overall energy envelope is poor information
for it to work on more than techno you would probably want to do this on at least a few subbands


Resonators (of energy envelopes)

similar to autocorrelation, though can be more selective (verify)
can be made to deal with tempo changes
based on recent evidence, so start of song is always poor guess due to no evidence (though there are ways around that, and in some applications it does not matter)
Related articles often cite Scheirer (1997), "Tempo and beat analysis of acoustic musical signals"
...notes that people typically still find the beat when you corrupt music to six subbands of noise that still have the amplitude of the musical original (but not when you reduce it to a single band, i.e. just the overall amplitude), suggesting you could typically work on this much-simplified signal.
roughly: six amplitude envelopes, differentiated (to see changes in amplitude), half-wave rectified (to see only increases), and comb filters used as tuned resonators some of which will phaselock, then somewhat informed peak-picking
...the tuned resonator idea inspired by Large & Kolen (1994), "Resonance and the perception of musical meter"


Chroma changes

to deal with beat-less music (verify)


Goto & Muraoka (1994), "A Beat Tracking System for Acoustic Signals of Music"

suggests a sort of multi-hypothesis system looking at several



Tempogram:

local autocorrelation of the onset strength envelope.


Cyclic tempogram

Grosche (2010), "Cyclic Tempogram - A Mid-Level Tempo Representation for Musicsignals"


Beatgraph and "autodifference"

beatgraph itself is more of a visualization,
but it is taken from analysis that tries to maximize the 'horizontality' of columns, where columns are a time period based that ideally is a bar/measure long - see e.g. [1]
it may converge on something horizontabl but get the measure length wrong
a column is a single bar worth of amplitude
Used e.g. in bpmdj
http://werner.yellowcouch.org/Papers/beatgraphs12/



TODO: look at

Goto (2001) "An Audio-based Real-time Beat Tracking System for Music With or Without Drumsounds"
Dixon (2001) "Automatic extraction of tempo and beat from expressive performances"
Dixon (2006) "Onset Detection Revisited"
Alonso et al. (2004) Tempo and Beat Estimation of Musical Signals"
Collins (2012) A Comparison of Sound Onset Detection Algorithms with Emphasis on Psychoacoustically Motivated Detection Functions


Secondary:

DETECTING MUSIC IN AMBIENT AUDIO BY LONG-WINDOW AUTOCORRELATION

Musical key

Computing musical key

Less studied, less well defined, and/or more perceptual qualities

Humans are quick to recognize and follow other properties, better than algorithmic approaches. They include:

(Timbre)

Timbre often appears in lists of sound qualities, but it is very subjective and has been used as a catch-all term that generally it means something like "whatever qualities allow us to distinguish these two sounds (that are similar in pitch and amplitide)".

A large factor in this is the harmonic/overtone structure, but a lot more gets shoved in.


tonal contours/tracks (ridges in the spectrogram)

(particularly when continuous and followable)


Spectral envelope; its changes

microintonation

Some different sounds / categories

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

There are various typologies of sounds, but many are very subjective in that they are not unambiguously resolvable to signal properties -- they are often somewhat forced.


Consider:

  • continuous harmonic sounds, such as sines and other simple predictable signals
  • continuous noise (unpredictable in the time domain)
  • impulses (short lived)

Pulses, noises, and tones cold be seen as some simpler extremes in a continuum, wherevarious inbetweens could be described, such as:

  • tonal pulses / wavelets
  • tonal/narrow-band noise
  • pulsed noise bursts
  • chirp
  • various real-world noises, such as
    • rustle noise [2]
    • babble noise

You can argue about the perceptual use of these categories as they do not distinguish the same way we do.


Some useful-to-know music theory

Unsorted

Moodbar

Assigns a single color to fragments within music, to produce a color-over-time thing that informs of the sort of sound.


Mostly a CLI tool that reads audio files (using gstreamer) and outputs a file that essentially contains a simplified spectrogram.


Apparently the .mood generator's implementation

  • mainly just maps energy in low, medium, and high frequency bands to blue, green, and red values.
  • always outputs 1000 fragments, which means
useful to tell apart parts of songs
visual detail can be misleading if time-length is significantly different
not that useful for rhythmic detail for similar reasons


Something else renders said .mood file into an image, e.g. Amarok, Clementine, Exaile, gjay (sometimes with some post-processing).

The file contains r,g,b uint8 for each of the (filesize/3) fragments.


See also: