Visuals DIY

From Helpful
Jump to: navigation, search
The physical and human spects dealing with audio, video, and images

Vision and color perception: objectively describing color · the eyes and the brain · physics, numbers, and (non)linearity · color spaces · references, links, and unsorted stuff

Image: file formats · image processing

Video: format notes · encoding notes · On display speed

Audio physics and physiology: Basic sound physics · Human hearing, psychoacoustics · Descriptions used for sound and music

Digital sound and processing: capture, storage, reproduction · programming and codescs · some glossary · Audio and signal processing - unsorted stuff

Electronic music: Some history, ways of making noises · Gaming synth · on APIs (and latency) ··· microphones · studio and stage notes · Effects · sync ·

Music electronics: device voltage and impedance, audio and otherwise · amps and speakers · basic audio hacks · Simple ADCs and DACs · digital audio · multichannel and surround ·

Noise stuff: Stray signals and noise · sound-related noise names · electronic non-coupled noise names · electronic coupled noise · ground loop · strategies to avoid coupled noise · Sampling, reproduction, and transmission distortions · (tape) noise reduction

Unsorted: Visuals DIY · Signal analysis, modeling, processing (some audio, some more generic) · Music fingerprinting and identification

For more, see Category:Audio, video, images

Analog video notes

VGA hacking

You can mess with the pixel voltages.

This is easier to do than you'd think, because VGA is not only analog, but it separates alternates sync, back porch, luminance, chrominance, and front porch (rather than mixes is, as in e.g. composite video)

The only things that are truly required are

  • Hsync, Vsync
TTL signals (0..5V)
  • red, green, and blue
0V for black, 0.7V for full intensity.

You can mess with the R,G,B at will. Feeding in audio is mostly just a diode away, though it'll mostly look like lots of horizontal lines due to the way scanlines work, and the fact that audiorate is much slower than pixel rate.

It tends to look better if you know the timings for the particular mode -- or some harmonic, so analog knob tweaking works too.

You can generate a a basic digital VGA signal from a microcontroller, see e.g.

More interesting things:

ewa justka, e.g.

When modes are sort of weird things

Let's assume for a moment that we're a CRT, so we're bending a ray of lightin'-up-the-phosphors, line by line.

Hsync is the time the beam has to go to the next line, Vsync is the time in which the beam goes from bottom right to top left for the next frame.

The porches are breathing room around that, in part to make it easier to ensure the beam wasn't visible while moving.

How does a video card know what a monitor supports?

Old monitors would only support a smallish number of modes, sometimes just one. Before EDID, you would choose fairly standard modes and assume they worked, or asked you what mode you wanted so it would be your fault if it didn't.

The EDID pin in VGA is meant to read out an EPROM, which contains basic information about timing, and that information implies a set of resolutions that are likely to work.

How does a monitor detect the mode of a signal?

With hsync and vsync being signals on separate pins, it is relatively simple to see what's happening on them, so monitors can guess which mode it corresponds to, as they obviously know the timing of what they support themselves. (Yes, matching the closest one too loosely sometimes makes for weird results)

(for those old enough to remember setting up and fiddling with the timings in X windows, this is why)

Newer monitors have to do more accurate detection, both because of higher-speed modes go faster on hsync and vsync, and because there are more modes they support, that may resemble each other timing-wise.

CRTs would not like being driven at unusual timings, whereas LCDs are more forgiving, also in detection.

On sync and pixel speed - and hobby electronics applications

640x480 at 60Hz is one of the slowest-clocked standard modes, namely 25.175MHz, and also a bit of an industry standard, so you can expect even very old monitors will accept it. see timing details here

Additionally interesting is that 640x480@60's timings closely related to NTSC. This is useful for some electronics projects, in that with fairly little extra work you can output to either/both to a monitor and e.g. composite output.

Most halfway decent resolutions are clocked above 50MHz, and modern resolutions a few hundred MHz. (Yes, VGA can carry 1080p, over a short wire.)

For PICs and AVRs, even ~25MHz is too fast, so sending out all individual pixels can't be done directly. There's also no internal hardware we can fully delegate such output to, so you have to directly control GPIO, and the overhead of bit banging it means you can't practically do more than about half your own clock rate, and the rate during pixel output is fixed so you spend all CPU getting that timing right.

There's one saving grace to that, in that hsync+vsync mean that ~30% of the time is spent not sending pixels, so you have some time thinking of what the next frame will be.

The blanking region is spent with the RGB pins blanked. On CRT, not doing that would put stripes all over the screen. LCDs are more forgiving if you don't actually blank, but blanking helps auto adjust to, well, work. Similar for the color burst in PAL and NTSC - you probably want to do this.

So what do we do on <25MHz microcontrollers?

HSync and VSync happen in the kHz range, so getting the monitor to go to the right mode is well within your capabilities here (possibly using timed interrupts, for accuracy).

It's only the pixels that go faster. Because it's supposed to be voltages back to back, and analog, screens don't care (or know) if you hold the voltage that happen to represent pixels. It's only when you want pixels to be distinct and non-bleeding that exact timing matters.

So if you can accept wider pixels, just go at a multiple slower. you can easily change it at half or quarter the rate, you just get wider pixels.

If you want to display a stored image, then the RAM you have limits the amount of pixels you can store anyway (a full screen of pixels would take ~300KB), so many people decide to also repeat lines, to get larger square pixels, and use less memory.

Showing text is easier, because it's effectively a lookup table of ~100 possible bitmaps.

For a lot of these tricks, look also to early TV-connected home computers - they did a lot of these things too, even if they usually separated video output from main work.

Another reason you can't get perfectly crisp images is that the an AVR at 16MHz or 20MHz isn't a divisor of 25.175, so the timing won't align so well, but call it retro and it's fine.

More on timing

If you take 640x480@60Hz and consider the blanking areas to be pixels but unused, you can consider it 800x524@60Hz (which is where almost all of that 25.175MHz comes from).

From that view, horizontal lines are split like (in pixel-clock units):

  • 96 sync
  • 48 back porch
  • 640 visible
  • 16 front porch

Say you make an AVR go at 20MHz. That same line, in terms of wallclock time, sums to ~636 clocks at 20MHz, hich you can e.g. divide like 76+36+512+12, which steals some time from the porch to make pixels regular.

Since an AVR can output a value every two clocks, this gives you an effective horizontal resolution of 256 pixels.

You could also get some color, because AVR GPIO lets you set an 8-pin port at once, so with some resistor-ladder work you could do e.g. 2-bit color for the RGB channels (or basically 8-bit color if you're clever).

For a sense of magnitude: a pixel takes ~40ns, a line takes ~30us, a frame takes ~17ms

Output to TV?

NTSC is 720x480@60i, PAL is 720x576@50i.

The timing is close enough to 640x480x60, this was intentional at the time.

They're different enough that there are no simple passive conversions between the two, yet in theory if you watch the constraint of both, then you could make something that can output to either of the two at a time, or if you're clever, both at the same time.

Sync signals are similar.

There are two differences: The image signal is luma and chroma (though you can get away with just luma) and on TV there's a color burst thing(verify)

See e.g.

On slightly faster uCs

On things like STM32s, even the simpler STM32F103, we have 72MHz of speed, and more RAM, so we can do 640x480 with more colors/detail, and we can also manage 800x600@56Hz (36MHz pixel clock, conveniently half of a 72MHz STM32's main clock)

Particularly for the latter you now want to learn about DMA, and still don't have enough RAM for a full res framebuffer, so these seem seem to do monochrome, or color at 400x300 or so.

See e.g.

VGA (and sometimes TV) hacking

Microcontrollers don't have much memory, or spare time, to generate fine graphics.

You can have external ram, from a few hundred KB from SPI SRAM, to more from things like e.g. or the later stages of But even then, putting something interesting in that buffer is challenging because of the limited CPU in simple uCs, and the fact you're using much of the time outputting pixels. This is one of the reasons that video cards quickly moved to using RAMDACs.

(Mind you, the Due can do better, see e.g.]

Just the hsync and vsync is fairly easy to do.

For the R,G,B most take one of three approaches:

  • use PWM outputs for R,G,B
e.g. vgax uses two for just two of the colors
  • use a digital port and resistor ladder for R,G,B
and a basic 2-bit resistor ladder, e.g. for 2-bit on each channel (sometimes cleverer)
e.g. [1]
e.g. [2]
  • Monochrome is somewhat easier

They could feed through some other data, though, or just be used to set the mode without the need for a computer.

VGA and TV:

Note that for composite PAL/NTSC TV output (only) you may like things like the AD724

Audio as image

The basic idea is to put two audaight onto the color pins of VGA, while have something else (uC or computer) set the mode via sync signals.

Why is is darkish

For starters, consumer audio is roughly -0.3V ... 0.3V, and VGA color in voltages are 0 .. 0.75V. So you're probably clipping half your signal, and the other half is weak.

You'll have to bias the signal a few hundred millivolt upwards to see less black. An op amp per channel would help, and once you're at that you may be able to mix something for the third color too.

Why is audio horizontal stripes?

tl;dr: you get no nice patterns out of the box, mainly because

anything not a multiple of 60 will walk around like crazy
also meaning even if it is there, it will much less visible
and music and speech, which do most of their stuff in the first few kHz, will mostly be horizontal lines a few pixels thick.
audio output is filtered to not contain anything faster than that

This one's nice to intuit, along with the math, and some physical experimentation.

So hook up a tone generator - a website will do for basic experiments. You can pretty much wire audio directly into one or two of the R,G,B lines, when you also have a 'duino or such generating just the sync pins.

Now, consider that in 640x480@60, the screen updates sixty times per second.

That means a 60Hz sine wave takes exactly one screen (1/60th of a second, ~17ms), and would look like a single overall vertical gradient.

In part because there's not enough between-line difference to be noticeable. At higher frequencies thing start mattering in fewer lines, though for two adjacent lines to differ, you need a signal faster than audible range (since a single line is ~31kHz).

At 60Hz it will also be entirely still. Have it slightly off, e.g. 59 and 61Hz, and that gradient will seem to move around. (fiddling with a tone generator that also does sweeps, perhaps binaural combinations, may be nice here).

Multiples of 60Hz will be a multiple complete waves/gradients per frame, which'll look thinner because, well, it completes in the time of fewer lines for each full wave.

Music is mostly in the few-kHz region, so this will be many multiples of complete waves per frame, and 99+% of them won't sync up with a frame. Or with a line, because the start of each line happens at 31kHz, above audible.

This means most audio contentwill be a few scanlines long.

This is also why audio has no apparent vertical content -- a vertical line would have to be regular on the scale of 31kHz, which audio-geared DACs intentionally filter out. Even if they wouldn't, 99+% of that content won't line up with a line.

If you want thin vertical content, that also implies it needs to pulse. e.g. a single pixel vertical line would be a ~40ns pulse (remember the pixel clock is ~25MHz) with ~30us (799px*~40ns) of nothing inbetween.

TV, Composite and CRT hacking

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Similar to VGA, but with a few differences.

Mixed output



Modular adjacent

Visual output of CV

If you go very modular, then you may want to see your CV.

This can be as simple as a LED, or as complex as a


Level meter

LM3914 - linear, e.g. for level indication

LM3915 - 3db log, e.g. for VU


Video modular


Atari Video Music (C240)

A few op amps, and a custom chip that does all the work.