Benchmarking, performance testing, load testing, stress testing, etc.
|This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)|
The many more names for tests. More than most of us can remember or define on the spot.
And some are fuzzily defined, or treated as interchangeable in some contexts.
- 1 Code testing
- 2 Also relevant
- 3 Testing larger chunks
- 4 Tests in short release cycles
- 5 Caring about users
- 6 Load, performance, stress testing
- 6.1 Longevity/endurance testing, soak testing
- 6.2 Stress testing, recovery testing
- 6.3 Common pitfalls in benchmarking
- 6.3.1 Measuring latency / client rate when you wanted to measure system rate
- 6.3.2 Measuring rate when you wanted to measure latency
- 6.3.3 Measuring the overhead
- 6.3.4 Measuring the startup, hardware tuning, etc.
- 6.3.5 Timing inaccuracies
- 6.3.6 Measuring your cache more than your code
- 6.3.7 Micro-benchmarking
- 7 See also
"is this piece of code behaving sanely on its own, according to the tests I've thought up?"
Typically for small pieces of code, often functions, behaviour within classes, etc.
- during development, it often makes you be more precise about the behaviour of a piece of code, think about its edge cases
- particularly useful to hammer down the lower level utility functions
- the more dynamic the behaviour of the code, the more it forces you to think about edge cases
- the more dynamic the language (as in dynamic typing), the more it forces you to think about edge cases
- unit tests are often often a form of self-documentation
- say, example cases are often unit tests as well
- and say, if I want to dig into details, then seeing gives me more information than a comment/docstring saying "percent-escapes for URI use" (which is imprecise in practice)
- can be particularly helpful in helper/library functions
- unit tests are part of regression testing
- basically, for 'this part is fragile, and we expect future tweaks may break this again'
- you probably won't check your error paths without tests
- it helps others see your approach to code quality
- practically not via the presence of unit tests, but what you're actually testing
- you will only write tests for the things you thought of (and probably coded correctly), not the things you forgot (and probably didn't code)
- this a decent indication of completeness, but not a guarantee against bugs.
- the less dynamic the behaviour, the more that unit testing converges on testing if 1 is still equal to 1
- which wastes time
- which can give a false sense of security
- the more dynamic the behaviour (in the 'execution depends on the actual input) sense, the less that a few tests actually prove correctness at all.
- In fact, tests rarely prove correctness to begin with, even in the most overzealous forms of TDD
- most of the time, they only prove you didn't make the most obvious mistakes that you thought of
- while on paper the idea is "try your hardest to think of everything that would break it", which is great
- ...if you look around, almost all unit tests are more "think of things you know probably works anyway"
- ...because a lot of people write unit tests only because someone told them sternly
- arguably where most bugs sit in interactions of different code, and unit tests do not test that
- The more OO-ey code is, the less unit tests do anything (TODO: elaborate)
- if it involves locking, IPC, network communication, or concurrency, or interact with other parts of the program that has state (think OO), or other programs that have state, the less you really test - or can even say what you have tested or not.
- such things are hard to test even with much fancier techniques
- there is no good measure of completeness of your unit tests
- if you think code coverage is that thing, you are probably a manager, not a programmer.
"Is this still working as it always was / doing what it always did?"
Refers to any kind of test that should stay true over time.
Regression tests are often as simple as they need to be, and are frequently unit tests.
Any test that you keep around can act as a regression test, but we tend mean "a test we wrote when we fixed a nasty bug, to ensure we won't regress to that bug later" - hence the name.
Reason you may keep a test around include:
- a fixed bug was subtle and easy to miss, so more likely to be reintroduced. A guard against that is great
- this code is often touched and often altered
- this code is used by a lot of the codebase, so bugs (or breaking changes, or false assumptions) are far reaching
- should avoid such bug regression well
- May also help avoid emergence of similar bugs.
Arguables / downsides:
- very specific, may cover very little
- which can mean a false sense of security
"Does this code interact sanely with the other code / parts of the program?"
Where unit tests tend to test components in isolation (because in theory you can be more complete), integration tests take those components and tests whether they interact properly.
They work on parts that are assumed to have unit tests, so you don't have to do that.
It's not a test of your product as a whole, because that tends to move to later in a process.
Integration tests are often the medium-sized tests you can do during general development.
"Are we living up to the specific list of requirements in that document over there?"
Said document often says 'functional design' at the top.
Which can involve any type of test, though in many cases is a fairly minimal set of tests its overall function, and basic user interaction, and is largely unrelated to bug testing, security testing, or such.
These draw in some criticism, for various reasons.
A design document tends to have an overly narrow view of what really needs to be tested, you're not necessarily even testing whether the whole actually functions, or acts as people expect.
The more formally it's treated, the less valued it is when people do their own useful tests.
Mocking, monkey patching, fixtures
A little design for testability
Testing larger chunks
Fuzz testing, a.k.a. fuzzing, feeds in what is often largely random data, or random variation of existing data.
If software does anything other than complain about bad input, it may reveal bordercases you're not considering, and e.g. the presence of exploitable buffer overflows, injection vectors, ability to DoS, bottlenecks, etc.
Perhaps used more in security reviews, but also in some tests for robustness.
On code coverage
Tests in short release cycles
Sanity testing, Smoke testing
Caring about users
Load, performance, stress testing
Longevity/endurance testing, soak testing
Stress testing, recovery testing
Common pitfalls in benchmarking
Measuring latency / client rate when you wanted to measure system rate
Measuring rate when you wanted to measure latency
Measuring the overhead
Measuring the startup, hardware tuning, etc.
Measuring your cache more than your code