Experiment building - non-software-specific notes, and timing: Difference between revisions

Latest revision as of 17:33, 26 February 2024

Notes related to setting up behavioural experiments and such.

Experiment building - on timing · on online experiments · on counterbalancing

E-Prime notes · PsychoPy notes · Experiment builder notes · Gorilla notes · PsychToolbox notes · OpenSesame notes · DMDX notes

Experiment builders are one possible term for 'software that lets you fairly easily create and run a well-controlled behavioural experiment, to research the validity of a theory or belief'.

This may sound no harder than making a nice powerpoint, but since various experiments care about reaction speed (e.g. as a measure of confusion), you care about precise stimuli timing, and precise response timing.

The lower you want to push this, the more specialized both the software and other practical details become.

Semi-sorted

D Bridges et al. (2020) "The timing mega-study: comparing a range of experiment generators, both lab-based and online"

@@ Line 12: / Line 12: @@
-===Thinking about counterbalancing===
-===On online experiments===
-Online experiments, as convenient as they are, means there are many things you can no longer control for -
-display time,
-hardware response time,
-browser details,
-whether it is a computer or a phone (I have a years-old phone and I wouldn't trust its timing),
-'''Browsers'''
-Assume that browsers tend to merge movement into 60Hz intervals - or whatever monitor-limited speed it's drawing at - so
-negating the effects of a 1000Hz keyboard / mouse / button device.
-Also, multitasking in browsers varies more between browsers. Maybe there's a video stream in another tab making things... more varied.
-Since it's not something you can control, at all, it's not a good environment for precision timing.
-Easy for questionnaire style stuff, though.
-==Hardware and timing==
-<!--
-The more precise you want timing to be,
-the more it comes down to little details.
-If things are allowed to be 30ms off, you don't have to think so hard.
-You want better than 10ms?
-There's a lot of things to check, and some hardware to replace.
-You want it down to 1ms?
-You have boatloads of things to check.
-And sometimes physics are in your way.
-Sometimes ''standards'' are, and throwing money at the problem will solve neither of those.
-'''Planning versus reaction'''
-If what you are doing is last-minute reaction,
-then for every action you put in the PC has to scramble to react on screen or in audio, and it is going to be later than the input by ''easily'' more than 10ms, because of lots of little delays that all add up.
-If you have a precise plan, though,
-'''Different clocks'''
-Add to that the fact that there are things running at different rates, and the exact timing may differ.
-A regular monitor refreshes its content every 16ms or so.
-When you are wanting to present isn't on that same clock, your question of "please show this on screen"
-Say, a normal USB keyboard is read out every 8ms or so.
-Depending on when you pressed a in that time window between readouts, the delay from physical press to PC acknowledging varies by around 7ms.
-Could you do better? Fairly easily, but you have to want ti.
-'''...so resolution isn't accuracy'''
-Just because the unit that a timer works in is a millisecond, does not mean it will be consistent to that degree.
-Any part of the underlying system could be working "with "I'll do it when I get to it" design.
-Just because it is consistent does not mean it is accurate (it could be thinking in units of 10ms).
-Just because one piece of hardware is accurate does not mean another is.
-If you have 1ms accuracy measuring a robot pressing a button every 100ms, don't be surprised to see ''10ms'' variation -- if that button happens to be in a keyboard - and there are much sneakier cases.
-And just because it is consistent doesn't it's accurate -- there could be an extremely consistent 5ms delay somewhere in the system.
-Electronics can be made consistent to within ''microseconds''. Mechanics and computers cannot - for some very different reasons.
-Assume keyboards are 20 ms late.
-Assume monitors can be 10 ms late.
-Assume sound cards are 10 to 30ms late.
-Some of those things you can improve.
-: keyboard debounce can be set lower (at some risk of registering twice)
-: sound card latency can be lower with some choices (at some risk of skipping)
-Some you can fix just by measuring them.
-: ''if'' they happen to be very consistent, you can subtract them from your result timing
-Some things you can fix with planning
-: e.g.
-:: if you know when the monitor starts drawing, you can start your timer then,
-:: if you know ahead of time you can
-Some things are just fundamentally there, though.
-: Say, a button may travel for a millisecond or two before registering contact.
-: a monitor redraw takes ''at least'' a few milliseconds to update the entire screen
-But also, relative to what?
-We can at least try to be consistent in the recording
-'''Can we do better?'''
-Absolutely!
-But you need to know exactly what you want.
-If we can plan ahead, a lot of things can be done a lot more precise more simply.
-Various sellers promise 1ms latency,
-''but'' they don't necessarily mention which parts that does ''not'' apply to.
-If you can you can record responses down to ~1ms resolution,
-you still need to veryify that that accuracty is also within within 1..2ms.
-Even if you know the accuracy of the ''recording'' is that good,
-you still need to think about all the other parts of the chain - operating system (~1ms),
-interconnects (~1ms), travel in buttons (~2ms).
-Also humans. If your experiment design is about coginition,
-and you are using a switch you are implicitly saying differences in motor function is unimportant,
-and the only way to do better is to use an EEG.
-'''There is also a very real quesion in "relative to what?"'''
-Both in a "we need to synchronize these two things"
-...but also in a "if this is comparative research, if all responses are consistently 20ms late but precision is 1ms,
-then we can still absolutely tell a difference between participants with exactly the statistical significance than if that 20ms was 0.
-'''So do we need separate hardware, or the same hardware?'''
-That depends a little on the task. You can design specific hardware for any one task,
-but the more complex it is, the more you look at dedicated hardware.
-For example, consider the history of sound cards.
-Can the same CPU that also has to do ''absolutely everything else'' produce sound? Yes.
-Can it always guarantee the microsecond-scale regularity for high detail sound? No.
-Dedicated sound hardware is a thing.
-Yet this directly brings up an issue: two distinct pieces of hardware need to be on the same schedule.
-{{zzz||Also, with sound you have to feed it new samples very regularly, every dozen microseconds.
-If a lazy sound producing program may sometimes be a few milliseconds late giving the next bit of sound, things would stutter,
-so sound has an absolutely intentional delay.
-If we pretend for a moment that this isn't a reason sound cards rarely go under 3ms latency...}}
-Could we ask it "I want you to start playing at exactly this time, to the millisecond or better"?
-To design this electronically would not be fairly easy.
-But almost no one ever cares to do that, so the more relaxed "ASAP with a little anti-stutter delay" is what PCs do.
-In experiments, you ''may'' care to have such a device.
-What you need for such a tight schedule includes:
-- agreeing on the time with good precision (but as NTP teaches us, that's doable)
-- starting production with negligible delay (not have other things that get in the way)
-- production to stay
-Say, if you tell your sound card "start playing this ASAP", this may come out 5ms later.
-Say, if you have a device you know you share a clock with to within 1ms, and which you can say "here is sound, I want you to start playing it exactly 10ms from now",
-you may be able to make that device
-Experimental setups might care about this
--->
-===Buttons and timing===
-[[Image:Bouncy_switch.png|thumb|right|298px|Electric output from a switch that just closed and is still bouncing (most of the mess in 2ms)]]
-The hardware buttons bounce. This is literal: metal contacts bounce physically, which is reflected in the electric signal.
-{{comment|(and yes, there are e.g. optical and magnetic switches that do not bounce in this way, though they are not free of other issues)}}
-You can assume this bouncing lasts at least a few milliseconds, up to a dozen or two.
-[[Debouncing]] refers to some way of ensuring that this mass of flip-flopping is registered as a single press.
-A lot of buttons react ''dozen''s of  milliseconds, later and you never noticed, so in general this is a non-issue,
-but in reaction time experiments this is pretty central.
-Consider that if your hardware assumes a switch never bounces longer than 10ms, then if it actually ''is'' still bouncing after that, you will register multiple presses.
-Rather fewer than if you didn't debounce, but still multiple.
-There is often a tradeoff where the more sure you want to be that it never triggers weirdly, the longer the delay must be.
-In the "can we do better" camp, absolutely, but only if you are willing to build in some specific assumptions that better be true for your hardware setup.
-If you know specific hardware (ideally down to capacitance of the wires that connect it and the size of the pullups/pulldowns), you can safely have a shorter debounce time.
-But also, consider that in experiments, it is enough to afterwards have a record of down to the millisecond, without the need to have something ''react'' to down to the millisecond.
-: you can record both when activity started and whether it was later verified to be a real press.
-:: As long as you timestamp it, this is much more accurate than "I am now reporting a debounced press that ''probably'' started idunno milliseconds ago"
-:: you could do this in an external device, if you synchronize clocks to do that timestamping
-:: if you want to avoid that complexity, you you could choose to do this in post-processing
-: This isn't perfect (consider e.g. some EMI noise right before a real press).
-: ...but is typically good for reporting response to with ~1ms accuracy ''later''.
-===Keyboards and timing===
-tl;dr:
-* Assume a keyboard may have 8-25ms of latency, probably on the 20ms end.
-:: It might well be lower, but you should never assume it.
-* sometimes more importantly, there is ''variation'' in the response time, which will be reflected in a standard deviation in your data even if triggered by atomic-clock precision
-* Avoid wireless. In particular bluetooth can adds a handful of milliseconds.
-* So '''if we care about 1ms accuracy, do not use keyboards.'''
-The OS or hardware does not care about ''precisely'' the human acted. It's not that it couldn't, it's that it doesn't.
-And when you step back, there is a very pragmatic reason for this imprecision: most keyboardists don't care.
-You probably never noticed that your keystrokes arrived two dozen milliseconds later,
-and no one presses the same key more than 50 times per second, or types at 3000 characters per minute it would take for it to probably start losing presses{{verify}}.
-There are also low level reasons.
-One is debounce, as just mentioned.
-A relatively safe 10 to 20ms of debounce time is perfectly fine for all typing needs - same numbers as just mentioned.
-Another is an OS / USB / driver level choice: with most USB keyboards,
-the PC will only check whether the keyboard had anything to say at most 125 times per second - again, because that's faster than most people type.
-'''Variation'''
-When you ''do'' care about timing at the millisecond scale, you have two issues.
-One, you know it arrives late. You do not know how much.
-Two, you do not know this "fetch from USB" schedule.
-While it's probably strictly regular, it's on its own timer, at a different rate from your screen updates.
-So relative to screen updates (specifically the ''first'' in a stimulus, which determines timing start),
-the keystroke arrives in an also-regular but unrelated schedule.  Which, to frame-based timing, looks like it is variable.
-Gamer keyboards may both have a faster readout (e.g. 1000Hz=1ms interval instead of 8ms),
-and may choose a shorter debounce time.
-Note that "1000Hz keyboard" means it reduces the average latency as well as reducing the variation,
-but is unrelated to the delay from debounce -- ''which was larger to start with''.
-Assume even fancy stuff never gets under a handful of milliseconds - and measure to actually know.
-===Mice and timing===
-<!--
-Avoid wireless. In particular bluetooth adds a few milliseconds.
-'''Mouse buttons''' are comparable to keyboard buttons in that they need to be debounced,
-: While there ''may'' be reasons it's better, assume 10-20ms.
-'''Mouse movement'' is rarely recorded in experiments, so not hugely relevant
-If you do:
-Optical/laser mice are tiny cameras[https://youtu.be/bci7Gi05BNc?t=240] that do calculation and filtering. In theory they may be reading out that camera thousands of times per second, but they might ''easily'' choose to average a lot of results, preferring lower movement jitter over lower latency.
-(this is independent from polling rate, and only somewhat related to DPI)
-: Movement ''can'' show up 10 milliseconds later{{verify}}.
-: Probably usually much less, but you can't be sure until you test.
-Gamer mice may poll at 1000Hz (1ms) intervals, regular ones at 125Hz
--->
-===Displays and timing===
-{{stub}}
-<!--
-'''Display delay'''
-There are a number of steps between your intent to draw something, to a monitor emitting light as instructed.
-# application instructs how to draw things
-# GPU drawing it in the framebuffer
-# framebuffer starting to be sent to the monitor
-# monitor sending received pixels to the panel's pixels
-# pixels changing state
-Some of those will be 'as fast as possible' so knowing the approximate delay time is enough.
-Other parts have their own plans, and reasons for the interplay to become nontrivial and somewhat irregular.
-Let's start with monitors.
-Monitors start drawing a new frame on a regular schedule (often 60 frames per second, which is 16 milliesconds per frame).
-That rate may be ''set'' by the PC, but not directly controlled, so the exact timing isn't ''directly'' exposed to us either.
-Now, the GPU at lower levels does know, which means that it ''can'' choose to synchronize to the monitor's schedule.
-This is ''roughly'' what [[VSync]] is.
-The drawing takes nontrivial time, so you may want to only send completed frames.
-This is what [[double buffering]] ensures. ...usually in combination with Vsync.
-* application instructs how to draw things
-:: will take some time. The CPU does little else, so could potentially spend all its time telling the GPU what to draw
-:: still, the CPU will aim to stay well under the monitor's refresh rate, because it would hold up the GPU's drawing of the next image
-* GPU drawing it in the framebuffer
-:: this too will take more than zero time.
-:: Games may try to keep the GPU as busy drawing more interesting things to the frame, we have a more minimal approach.
-* framebuffer (in whatever state it is) being sent to the monitor
-:: this is where vsync and double buffering ''optionally'' come in
-* monitor sending it to its pixels
-:: generally still passed through -- often on a per-line basis{{verify}} in a way that puts it ''well'' under a ms
-:: this in itself should add negligible time
-:: keep in mind that depending on the transfer speed (and whether the video cable setup is sequential in nature, and on how fast the panel can be addressed and ''is'' addressed), a frame may be sent throughout much of the refresh interval - even if lines update very fast, the entire screen might only be updated within 10ms or so
-* pixels changing state
-:: needs to happen at least as fast as the referesh rate, but it varies how much
-:: Gamers talking about BtB are talking about how fast you can go from fully black to fully bright, GtG (Gray to Gray) is a way for producers to give you a smaller number based on "you know, in games it's usually one shade going to another, which happens faster than BtB (e.g. 2ms instead of 5ms)" or MPRT (moving picture response time) which is the even more subjective "how long is is still visible". Note that BtB, GtG, and MPRT testing methods are ''not'' standardized so not directly comparable
-What delays can you measure and/or remove?
-* if you know what to draw ahead of time, you can prepare the framebuffer and remove the first two steps above
-: this does not apply to games because they react to your input
-* measure: #1, #5, others with more difficulty
-:: by explicitly measuring when the screen changes with a light sensor
-'''Synchronization?'''
-:: note that this does ''not'' solve synchronization between time of change and other things
-Monitors have two major aspects to their delay
-* they start drawing a new frame on a regular schedule
-: e.g. the fairly standard 60Hz means a new frame starts every (1000ms/60Hz=) ~16.6ms
-* it takes time to draw all the screen
-: you can assume that it's a good portion of that 16ms
-And then there are things like [[double buffering]] and [[vsync]],
-which even gamers still misunderstand.
-They come from the fact that if the computer is drawing pixels into
-a full screen's worth of pixels (framebuffer),
-the transfer of that to the monitor may happen in the middle of drawing a frame and you'll see someting incomplete.
-Double buffering means you have ''two'' framebuffers, drawing into one and showing the other.
-Lack of vsync means flipping those to whenever (might still be in the middle of drawing but will be complete, possibly a bit disjointed but this is rarely visible unless things are moving a lot).
-Enabling vsync means delaying that flip until the start of the monitor's next start of frame.
-For at-the-last-moment rendering (like games), double buffering adds milliseconds.
-Vsync can matter even more, because consider that the worst case is delaying almost an entire 16ms frame.
-"Flipping Enabled" (DirectX terminology for what is basically  based [[Vsync]].
-: (DirectX also calls it 'page flipping' or 'back buffering')
-https://support.pstnet.com/hc/en-us/articles/360021117733
-It is discouraged to do screen mirroring, because it is not guaranteed what vsync means when you do that (probably just one of the two monitors, so the other may be (16.6/2=) ~8ms off on average.
--->
-<!--
-'''Can we do better?'''
-While there are faster screens, they aren't ''that'' much faster.  That figures is 13.3ms for 75Hz, 11.1ms for 90Hz (monitors from the nineties could already do this). Even for modern pricy gamer monitors this is still 8.3ms (120Hz), 6.9ms (144Hz), 4.2ms (240Hz)
-It makes a lot of sense to flip that around:
-figure out when that image swap is exactly, and plan everything else around ''that''.
-If you know exactly when a new frame starts to be drawn,
-and which frame is the first one with a new new visual stimulus,
-we can make that the zero point for timing.
-This isn't perfect for a few reasons
-: because most monitors still take a while to redraw the entire screen - if it's in the middle, that may be drawn 5ms after frame start, who knows?
-: knowing the ''rate'' doesn't mean we know the ''start''.
-: Waiting for vsync gets us pretty close to the time at which the GPU says a new frame is
-:: but that isn't always when the monitor's new frame is
-If we really need to know to the millisecond, we probably want to measure live when the new frame shows up - strap a sensor to the screen,
-and use part of the screen as indication for it.
--->
-===Response boxes and timing===
-<!--
-SRBox (a serial device) order of 3ms
-USB response boxes tend to do
-* faster pollling, and/or
-* the following trick
-In keyboards, debounce results are late by, say, 20ms, because it's only after observing for 20ms we can be pretty sure it's a real press.
-But what if we recorded how long ago that ''maybe-press'' started?
-We can only report it 20 late, but we can pinpoint how long ago it actually started.
-Keyboards have no reason to care, but we do and at electronic level it's not hard at all.
-(Chronos tries to synchronize clocks.
-BBTK sends a "probably start" electronically{{verify}}
-Black Box Toolkit seems to send a pulse on ''any'' input activity,
-so that you can ''electronically'' record the start, and then ~20ms later may get a "yes actually that was this button".
-A few, like the Chronos box, runs clock synchronized to less than a millisecond{{verify}}, so that it can send timestamped events some time later.
-This lets it take however long it needs to do a proper debounce, and afterwards send a "yup, that was a real click at (specific time, some time ago)".
--->
-===Sound and timing===
-{{stub}}
-<!--
-{{info|For context|
-'''Sound takes some time to move through a digital system''',
-so any playback (or indeed recording) that is unplanned will happen some milliseconds later.
-How many? Depends. If you control the hardware directly it can be on the order of 5ms,
-but in modern desktops, programs by ''default'' speak to the operating system's mixer
-which ensures multiple programs can all output at the same time, and it adds 30ms
-(to avoid choppiness if one program happens to be a bit late).
-You can work around that extra buffer by asking the OS to give a sound card to you in exclusive mode -- not that it's always able to say yes.
-But this amounts to mean (for output latency):
-: if you don't think about this, it's easily 40ms
-: a program that ''does'' know better can, in many situations push that down to maybe
-:: 5 to 10ms on generic hardware
-:: 2 or 3ms on hardware/drivers that thought about low latency
-The above mostly describes the latency of ''output'', from expressed intent to things being audible.
-Input has similar issues, though less so because it seems to always be exclusive, because it's not common
-for many things to want record at the same time.
-So this may be ~10ms delay.
-But you won't actually know what these values are for your system unless you measure - and that turns out to be nontrivial.
-}}
-Can you do better?
-Yes, if
-* you know fully ahead of time what the waveform should be,
-:: That does not describe games, or anything else where the sound to be made is interactive, and probably mixed with other sounds.
-:: That does not describe a lot of music production - while a drumkit may care to make sounds as soon as technically possible (delays more than 5ms or so can be a problem), anything interactive means you only know the sound to output as you make it. Also, ''mixing'' multiple sounds into a single output takes more than zero time.
-:: It doesn't even describe your graphical interface going *ding*.
-* when exactly it should start
-:: few things care. The examples named above only know at the last moment, so their answer is 'as soon as possible', not 'this specific time'
-* so that you can queue it up, in a place that is ''immediate'' to access.
-Stimulus presentation is one of the only cases where you might care about the least amount of delay.
-: In general, if the thing goes ding 10ms later, no one cares. In fact, if it goes ding 100ms later, few people would even notice.
-It is in fact such a niche wish that, despite that audio devices _could_ specialize in this way, they have not bothered to do that.
-Exceptions to this are often specific-purpose.
-'''"Can you correct for sound latency?"'''
-If we can measure precisely when a stimulus actually ''did'' start (to the millisecond),
-we might care less about getting controlling over when it start.
-If you can measure the delay, and notice that it is essentially a constant, you can subtract that from your measurements.
-And if not entirely constant, or your configuration during use isn't 100% the same as the configurastion during tests,
-that subtraction is not so valid and you won't even know.
-If you can measure it live, then you can correct for it even if it ''isn't'' constant.
-This will be part of experiment/setup design, and some steps in your processing.
-'''"Can you lower output latency?"'''
-Consider the problem that samples take time to move through.
-If we know the samples we want to play well ahead of time,
-and know ahead of time that it should be played,
-we can start moving samples ahead of that time
-(there are some other requirements, most quite manageable).
-For example, PST's Chronos box claims it can start playing in under a millisecond. [https://support.pstnet.com/hc/en-us/articles/360008833253]
-From description (&le;10ms buffer, ''must'' .Load before a .Play) this is still streamed over USB,
-but presumably the first samples will essentially be in the device's own RAM,
-so we can start the first samples almost immediately,
-and then only need to service it with new samples every <10ms to keep that buffer full enough.
-The only reason windows can't do that is that APIs aren't made for that.
-Because nobody wants that.
-The planning ahead is so different from the more everday 'I have a bunch of streams I want you to play concurrently'.
-that constitutes with most everyday sound card use that while probably any sound hardware could do this, but none of it is designed to try.
-ASAP meaning 5ms to 30ms is good enough and you probably never noticed.
-It also doesn't apply to live processing of sound, but that's not even a very common wish in the audio world
-https://pstnet.com/wp-content/uploads/2017/09/Chronos-Operator-Manual.pdf
----
-And yes, you need separate sound hardware.
-Audio at 44100Hz has to put a new sample out every 22 microseconds (''precisely'', or it'll sound bad),
-and while computer processors ''could'' be made to pay unconditional attention that fast but would be very inefficient at everything else if they did.
-So they have dedicated sound hardware which has ''only'' that job of regularity,
-and only the requirement that you fill its buffer ''never'' late (that would also sound bad).
-...and the smaller that buffer, the more you ''still'' converge on the "CPU is barely doing nothing else",
-This is a tradeoff of efficiency and stability versus latency, and why few buffers are smaller than 3ms or so,
-which is why the ''overall'' output latency (which involves more steps) is rarely under 5ms or so
-even if it's a single program with exclusive access to the hardware (like DAWs like to be).
-Talking of exclusive access helping, this is one reason that
-* if your device speaks ASIO, use ASIO
-: ASIO does not involve a mixer
-: ASIO suggests the driver will have lowish latency
-* CoreAudio a.k.a. WASAPI has exclusive mode more explicitly
-: there was a way to try to get that before, but it was messier so less guaranteed
-* Fall back to DirectSound
-: this may well include that ~30ms mixer (e.g. [https://support.pstnet.com/hc/en-us/articles/229363067 E-Prime recommends against it], implicitly suggesting CoreAudio/ASIO)
-Falling back to MME is an unknown.
-It's not nearly as bad as it was when MME was the only choice (because these days the MME API calls actually go to CoreAudio{{verify}}), but there are no guarantees.
--->
-===Wireless devices and timing===
-<!--
-'''Wireless'''
-Digital wireless, from bluetooth to
-both because you do not control congestion of these bands, you should assume
-* it's on the order of 5-10ms,
-* it's variable
-In any given situation things may be better. Or not.
-Measuring has limited effect, because the amount of use/congestion will vary.
-More basic RF may actually be quite close to wired -- but on free-to-use bands you are also more likely to run into nearby devices you can't easily eliminate.
--->
 ==Semi-sorted==

Experiment building - non-software-specific notes, and timing: Difference between revisions

Latest revision as of 17:33, 26 February 2024

Semi-sorted

Navigation menu