Programming in teams, working on larger systems, keeping code healthy: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
Line 158: Line 158:




-->
=Tests=
There are more possible names for tests than most of us can remember or define on the spot.
And some are fuzzily defined, or treated as interchangeable in some contexts.
Here is some context for some of them.
==Code testing (roughly smaller to larger)==
<!--
The way I was taught to do testing is "don't ask, do it because your grade will be lower otherwise".
Also, the systems we were testing were unrepresentative of real world software anyway,
and the tests we were writing didn't teach us anything.
I think that's the wrong approach.
I think the first question is always '''why are we doing tests?'''
There is, for example, a clear answer around regression tests:
"I have wasted X hours on it breaking in the same spot, and want to avoid doing that again"
Assertions are testing at a level of "this dynamic programming function may have edge cases I didn't think about,
and if that ever happens, let's not ignore that".
What we call testing is often on a larger scale - but also often much more disconnected.
In practice, you aim tests at where you think things will go wrong.
Code tests are more about "things will change, things will go wrong, you want to find out ''before'' production".
Exactly where to aim your testing is a balance.
If you write unit tests for absolutely everything, you will spend most of your time on tests (and many of them will feel like "is 1 still equal to 1").
If you write them only for things that have gone wrong before, you're probably not catching anything.
-->
===Unit testing===
'''"is this piece of code behaving sanely on its own, according to the tests I've thought up?"'''
Typically for small pieces of code, often functions, behaviour within classes, etc.
Upsides:
* reveals your approach to code quality and knowledge thereof - probably not so much via the ''presence'' of unit tests, as much as ''what'' you're actually testing
* unit tests are part of [[regression testing]] - "this part is fragile, and we expect future tweaks may break this again"
* unit tests can be a form of self-documentation
: example cases are often unit tests as well
: and say, if I want to dig into details, then seeing {{inlinecode|<nowiki>assert uri_component('http://example.com:8080/foo#bar') == 'http%3A%2F%2Fexample.com%3A8080%2Ffoo%23bar'</nowiki>}} gives me more information than a comment/docstring saying "percent-escapes for URI use" (which in practice is pretty imprecise)
: can be particularly helpful in helper/library functions
* forces you to think about edge cases
:: ...around more [[dynamic programming]]
:: ...around more [[dynamic typing]]
:: doing this sooner rather than later avoids some predictable mistakes
* ''sometimes'' you discover edge cases you didn't think of, and didn't implement correctly, and/or didn't describe very precisely
: easily overstated, yet probably everyone has done this
<!--
* when you're also looking at code coverage, you are more likely to (eventually) write tests that check error paths
* it makes you be more precise about the behaviour of a piece of code
: particularly useful to hammer down lower level utility functions, for you and others
* tests may have the side effect of motivating programmers to move one-time helper function into libraries (sometimes actually easier for code coverage)
-->
Arguables:
* you will only write tests for the things you thought of, not the things you forgot (and probably didn't just write into the code)
: this a more of a indication of completeness than of correctness of function
* the tests you write give you more security, the thoroughness you forget will give you a false sense of security
* The more OO code dabbles in abstractions, the more black-box it is and the harder it is to say how much they even really test
* a lot of real-world bugs sit in interactions of different code, and ''unit'' tests do not test that at all
: sure that's not their function, point is that we that 'write tests' often leads to writing unit tests, not on finding bugs
* while on paper the "try your hardest to think of everything that would break it" idea is great
: ...if you look around, a buttload of unit tests are of the "think of things you know probably work anyway" sort
: ...because a lot of people write unit tests only because someone told them sternly (often by someone who barely understands when they are useful and when not)
Downsides:
* if it involves locking, IPC, network communication, or concurrency, or interact with other parts of the program that has state (think OO), or other programs that have state, the less you really test - or can even say ''what'' you have tested or not.
: such things are hard to test ''even with much fancier techniques''
* there is no good measure of how thorough your unit tests are
: if you think code coverage is that thing, you are probably a manager, not a programmer.
* the less dynamic the behaviour, the more that unit testing converges on testing if 1 is still equal to 1
: which wastes time
: which can give a false sense of security
* the more dynamic the behaviour (in the 'execution depends on the actual input' sense), the less that adding a few tests actually prove correctness at all
: In fact, '''tests rarely prove correctness''' to begin with (because this is an ''extremely'' hard thing to do), even in the forms of [[TDD]] that TDDers would find overzealous
:: most of the time, they ''only'' prove you didn't make the more obvious mistakes ''that you thought of''
===Regression testing===
'''"Is this still working as it always was / doing what it always did / not doing a bad thing we had previously patched up?"'''
Refers to any kind of test that should ''stay'' true over time.
: particularly when you expect that code to be often touched/altered,
: particularly when that code is used (implicitly) by a lot of the codebase, so bugs (or breaking changes, or false assumptions) are far reaching
Yes, any test that you do not throw away acts as a ''sort'' of regression test,
but when we call it this, it more specifically often means "a test we wrote when we fixed a nasty bug, to ensure we won't regress to that bug later" - hence the name.
Regression tests are often as simple as they need to be, and frequently a smallish set of unit tests is enough.
Upsides:
* should avoid such bug regression well
* may also help avoid emergence of ''similar'' bugs.
Arguables / downsides:
* the same specificity that avoids that regression means it's covering very little else
: ...even similar issues in the same code
* which can lead to a false sense of security
===Integration testing===
'''"Does this code interact sanely with the other code / parts of the program?"'''
Integration tests takes components and checks whether they ''interact'' properly (rather than testing those components in isolation).
Integration tests are often the medium-sized tests you can do during general development.
...so not testing the product as a whole - that tends to be later in the process. 
Or, these continuous-delivery days, sometimes never a.k.a. "user tests means deploying to users and see if they complain, right?"
<!--
If you look for resources, you'll see some more classical material that'll e.g. specifically makes it a step after [[unit testing]], and before [[system testing]] and [[acceptance testing]].
And possibly even including interaction tests and usability tests, when that makes sense for the product and actually tests various layers anyway ('if I move things to the trash, does it show up there').
This broadness led to there being some more specific (though not necessarily covering) terms around integration testing.
Some of them are just reminders of what not to forget
:: is it client-server? Then spend tests on that.
:: is it distributed services? Then spend tests on that.
and others are general approaches, e.g.
* '''big-bang''' - combine everything
: More of a 'see if we can break it in general' test
: upsides: easy to do, required no planning, and can reveal the presence of bugs
: downsides: not always easy to localize that bug. Impractical on very complex systems
* '''risky-hardest''' - focus on the hardest/central/fragile first
: so you'll fix this part first, and find any related design flaws earlier
: upsides: finds more important / further reaching bugs earlier (which can also save time when it implies some redesign)
: downsides: you are probably not as thorough as you think
* '''bottom-up''' - start combining lower-level modules
: upsides: compared to big-bang it's typically a lot clearer where the bug comes from
: disadvanages: more work, never going to be as complete as you think, and the most high-level things that are most likely to have edge cases, are tested late
* '''top-down''' - tests the higher level things first
: upsides: higher-level bugs found earlier (which can reveal design flaws earlier)
: downside: little focus on lower level modules, tends to mean a lot of stubs and mocking until late in development
* '''sandwich testing''', a.k.a. '''hybrid''' - mix of top-down and bottom-up
: upsides: postpones neither low level or high level until later, makes more sense in large projects with separate teams
: downsides: more work, more costly, generally not very well defined what you're proving, or trying to
Further terms include:
'''high-frequency integration'''
: often seen around CI/CD, often just to point out you test things as often as you change things
: though they tend to mean just the more mechanical unit and regression tests, because even they realize integration testing tends to be a "done some time later, questionmark" category and it's worth making a distinction to testing now what you ''can'' test right now.
-->
===Fuzz testing===
Fuzz testing, a.k.a. fuzzing, feeds in what is often largely random data, or random variation of existing data.
If software does anything other than complain about bad input,
it may reveal bordercases you're not considering,
and e.g. the presence of exploitable buffer overflows, injection vectors, ability to DoS, bottlenecks, etc.
Perhaps used more in security reviews, but also in some tests for robustness.
Can apply
: for relatively small bits of code, e.g. "add random number generator to unit tests and see if it breaks",
: up to "feed stuff into the GUI field and see if it breks".
See also:
* https://en.wikipedia.org/wiki/Fuzzing
===Acceptance testing===
'''"Are we living up to the specific list of requirements in that document over there?"'''
Said document classically said 'functional design' at the top.
In agile, it's probably the collection of things labeled 'user stories'<!--
(though if they come exclusively from your manager, that's no different)-->.
Which can involve any type of test, though in many cases is a fairly minimal set of tests its overall function,
and basic user interaction, and is largely unrelated to bug testing, security testing, or such.
These draw in some criticism, for various reasons.
A design document tends to have an overly narrow view of what really needs to be tested.
You're not necessarily testing whether the whole actually... functions, or even acts as people expect.
The more formally it's treated, the less valued it is when people do their own useful tests.
<!--
Recent development methodologies may trust they are implicitly doing this via other tests.
-->
<!--
===System testing===
System testing, when done, is often a step after [[integration testing]],
but note that by most definitions, it doesn't test code as much as it is [[acceptance testing]], and sometimes extending into [[usability testing]].
-->
===End-to-End testing===
{{stub}}
Basically, testing whether the flow of an application works, basically with a '''simulated'' user.
The goal is still to test the application at mostly functional level - whether information is passed between distinct components, whether database, network, hardware, and other dependencies act as expected.
End-to-end testing is often still quite mechanical, and you might spend time specify a bunch of test cases and expected to cover likely uses and likely bugs.
This is in some ways an extension of integration testing, at a whole-application and real-world interaction level,
finding bugs
While you are creating a setup as real users might see it,
It's ''not'' e.g. letting users loose and see what they break.
<!--
You typically do this later, and after you've already got unit tests and integration tests covering your basics.
...both because it is generally useful to catch bugs with lower level tests (happens earlier, easier to locate, often easier to fix),
...and also because E2E tests tend to not be as thorough.
See also test pyramids for considerations
https://automationpanda.com/2018/08/01/the-testing-pyramid/
-->
==Tests in short release cycles==
===Sanity testing, Smoke testing===
<!--
Both of these can be as simple as "is it broken now" checks and as complex as "does this series of actions do a sensible thing"
'''Sanity testing''' is often meant for new functionality, checking broadly and shallowly whether it works in the whole.
It's not a very official type of test, and is sometimes just a manual, early part of "does it make sense to integrate this?"
[https://en.wikipedia.org/wiki/Sanity_check#Software_development]
'''Smoke testing''' is more about critical functionality
: ...and often comes in the form of build verification testing, confidence testing,
testing whether a particular build / staged instance seems stable.
Smoke testing can be considered close to
* integration testing - in that you are testing whether the system functions as a whole, when deployed
* regression testing - in that you are checking whether things didn't break
and arguably things like
* acceptance testing - in that you are checking whether it is functional according to written demands
Smoke testing is often more associated with preparing a particular release,
where developers are checking whether a version is good enough to be handed over for ''functional'' testing,
as this sometimes takes a few passes of polishing.
Both ''may' be relatively manual.
-->
==Caring about users==
===Usability testing===
<!--
See [[Usability]]
-->
===Accessibility testing===
==Also relevant==
===Black box versus white-box===
<!--
'''Black-box''' testing refers to seeing if something gives the right output for given input ''without'' considering program structure
: which you can see both as
:: a reason you will not test as much of your code, because you're not targetedly doing so
:: a reason you will not get biased in what you test, because the testers don't actually know your system at all
'''White-box testing''' also considers the code structure.
: one addition is '''code coverage''', basically the question "how many of the code paths and/or lines of code do the tests actually touch"
This white/black terms can be applied at varied levels, from unit tests to integration tests to system tests.
For example, 'functional testing' and 'acceptance testing' is primarily black-box testing,
because you're just testing whether it doesn't seem broken and does what some specs said it should do.
...while unit tests and regression tests are frequently tests for the internals a coder just wrote,
seeing if it does what we think it does.
-->
===Self-testing code===
{{stub}}
Self-testing code is code that includes some checks inside its own code.
This often amounts to mean
: assert() statements within a function, e.g.
:: testing important [[invariants]]
:: doing your own regression checks
:: intentionally borking out earlier rather than later when a bug could have wide-reaching implications (e.g. around concurrency)
https://en.wikipedia.org/wiki/Self-testing_code
===Mocking, monkey patching, fixtures===
<!--
Around various tests (integration, unit), code under test may have dependencies or APIs that have to be ''present'' (but not necessarily do anything) for the code to work at all.  '''Mock/stub/fake objects''' help that work.
'''Mocking''' also refers to means faking enough of the environment to make it possible to do the tests you need, sometimes in a wider sense (e.g. installing it into an isolated environment)
'''Monkey patching''' is mocking at done at runtime
'''Fixtures''' are things that make tests (and mocking) easier (more below).
Mocking and fixtures are often made easier if you have [[inversion of control]] in your design
: ''roughly'' because when all these dependencies are things you can hand in (rather than just have to magically be part of a shared environment)
: so that you ''don't'' have to resort to monkey patching
'''For context:'''
For tests on functions without [[side effects]], the only things that matter are the function parameters and return values.
A good bunch of basic unit testing can be just this, and a lot of tests can be very little code.
For a lot of more real-world tests, code other than yours exists, and state other than yours exists.
That is either
* '''...the point:''' In some cases, testing that you and/or that other part does its state management correctly is the thing under test (e.g. integration testing), so the test will ''want'' to set up all the different parts, in as realistic a setting as possible.
* '''...unavoidable:''' You want to test ''your'' piece of code in as reproducible a way as possible
: everything external is not important to the test, except that it needs to ''exist'' for the code to function.
: Those other parts doing little to nothing may actually let you narrow down the test to what you ''actually'' wanted to test.
In both cases,
* part of the point is that tests are only even possible when inserting such an other system
: lets you test more code, and more codepaths
* part of the point is that these tests are more meaningful when they are deterministic and reproducible
:: which tends be easier if they are minimal
Additionally,
* forces you to think about [[side effects]] - like accidentally altering production database,
sending test mails to actual clients, etc.
''Examples'''
* a mocked database
:: might just be an unimplemented class that accepts everything and returns nothing, or always returns the same data
:: might put up an in-memory sqlite file behind what would in production be a clustered RDBMS
::: that could test storage at all, but not production behaviour
:: might use a temporary table in a development database, or even the production database
::: to test issues under load in a realistic configuration
* software like mailhog
:: accepts mail sent via SMTP, doesn't actually send it
:: great to test whether your generated MIME seems valid (without spamming people)
:: doesn't actually test whether it would make it through the real world, though
* the HTTP API
:: is great to test that we don't break aggregating the thing we want to send in
:: but not that the API would actually accept that data
Some of the above is a little overstated, because it's more of a sliding scale
* sometimes you want the other thing to be an entirely do-nothing thing
: It quacks like a duck, and a purr would also have been fine, then the response is not the point
: e.g. if you want to test submission to an API, then consider that...
:: knowing that aggregation of the data doesn't fail, encoding that data doesn't fail
:: knowing that your client side code deals sensibly with timeouts, server overload, network errors,
: ...are all really useful, are best tested by running that client code -- and none of that cares about whether that API actually stored the data.
:: so maybe all you care about is that there is an URL you can post to that really just says 200 OK unconditionally.
* sometimes it can just give any not-invalid response. It's a passing simulation simulate what the real thing would say -
: It quacks like a duck and a quack was all you were looking for
* be ''just the parts'' of the ral system yo need for the test -
: this is one reason some people go hard on decoupling like [[interfaces]], [[dependency inversion]] and such - it tends to make such "insert mock system here" things easier
* sometimes it should really do the thing -- just not on the production system
: it quacks like a duck and also first waddled up to in response to having bread
: e.g. test that a database serializes operations correctly -- but still working on a temporary instead of a real table
* Sometimes it's even ''about'' measuring side effects. Say, if you're testing your app installer, you may both want a clean VM image to install on, and one which had a previous installation.
: That's something you can only do with automation.
'''Mocking''' mostly refers to faking exactly enough of the environment to do useful tests in relative isolation.
And in tests of object oriented code, this often specifically means [https://en.wikipedia.org/wiki/Mock_object mock objects].
Exactly how much you mock, and how fake it is, and how involved it is, will vary depending on what tests you want to do.
When those external objects do ''nothing'', they are often called ''stubs'''. Or '''fakes'''.
But as wikipedia notes, the '''uses of these terms is highly inconsistent'''.
And mocking itself was already introduced as anything between "do nothing" and "best imitation short of the actual production system".
...so tl;dr: just keep thinking about what is necessary for a good test.
'''Monkey patching''' refers to ''runtime replacement'' of an existing function/class.
It's named with the acknowledgement that it's hackier than [[dependency injection]].
Monkey patching can be entirely valid, and less work.
It's potentially messier and hackish, so comments will often point out when we're doing this, and how it's valid.
'''On fixtures'''
Test fixtures[https://en.wikipedia.org/wiki/Test_fixture] are often-mechanical devices that make it much easier to do tests automatically.
E.g. {{imagesearch|test fixture|in electronics}}, this it includes things like pressing contacts it into test positions (spring loaded for good contact), powering it, communicating with it, and checking that the response makes sense. This ''may'' be so automated that it can be part of production, with little or no human intervention.
In software testing, there is a similar spectrum,
from 'the setup/teardown code we reuse that makes it much easier to run tests'
to 'framework that is completely in control of picking up everything'.
Both can be about sharing boilerplate code - sometimes fixtures just mean "think about structuring and code duplication in your tests, -- it's code too, you know".
The second is more structural, and requires an [[inversion of control]] setup, picking up tests, doing [[dependency injection]] etc.
For example,
* pytest
:: can pick up test functions from based just on name
:: e.g. the mere presence of an argument called [https://docs.pytest.org/en/6.2.x/reference.html#tmpdir tmpdir] means pytest does all the work of creating a unique directory of each run of that test, and cleans up afterwards - and you don't even have to know ''how'' it does that
*  [https://tox.wiki/en/latest/ tox] can set up a complex environment, but lets you focus on ''specifying'' that environment rather than implementing it for each test.
:: to install libraries in an isolated environment,
:: copying files into a temporary directory and cleaning it up afterwards{{verify}}
Aside from meaning you don't have to have "run this script to do tests", such a framework can also do a lot of (necessary) work for you:
and such.
From the varied things that people call fixtures, it seems that
* sometimes, fixtures just amounts to "the framework we do tests in",
: in a "it just does some boilerplate" way
* ...the last, "but this library makes that easier"
: e.g. making a temporary directory, copying in files, and removing all that afterwards
: and possibly pointing out modular design of fixtures for reusability
* Sometimes fixtures means "The code that gives the same test data to a number of different test cases"
: ...which in that specific meaning makes fixtures useful only when reusing data exactly across many tests
: ...and worthless if you are testing variations
* sometimes 'fixture' just means "we're isolating from a real system"
: i.e. fixture as a near-synonym of mocking, in a "mocking is the verb you do, fixtures the noun you do it ''with''" way
'''Upsides'''
* Mocking lets you isolate ''your'' part of exchanges
* fixtures can keep your tests succinct
: and saves you time while writing them
Downsides
* Mocking isolates to ''only'' test your part of exchanges
* fixtures/mocking may mean your unit tests really become more like integration tests - which you already had
* Fixtures can have their own learning cruve
* fixtures/mocking can make testing unnecessarily ''slow'', particualrly if each test is wrapped into its own run for maximum isolated reproducability, and/or this framework punishes you for wanting it any other way
* mocking easily means you have put up ''your own'' limit of how much of a real system you are testing
: mocked tests sit in the space between unit tests and integration tests - but how much ''useful'' space there is there depends a lot on the project
* fixtures may become as complex as your code, be slow to maintain (whether manual or generated), and slow down your development - so compare with the benefits
https://michalzalecki.com/fixtures-the-way-to-manage-sample-and-test-data/
-->
===A little design for testability===
<!--
for example, say you want to test UIs.
A little more thoroughly than at 'user pokes button' level.
Now, UIs are their own state system.
That's an issue, because tests would have to keep in mind what state that system is in,
which seems like an interaction that can get a little too hopeful/magical/fragile.
That said, you can get surprisingly far instantiating parts of an UI,
and interacting mechanically.
''That'' said, you will eventually you will run into things that are hard to test -
threading, races, rendering issues, slowness. Automated interaction with GUI/website being blind to these things where people wouldn't.
-->
===On code coverage===
<!--
Unit and regression tests often think about some specific cases.
Code coverage points out that when if stability and security is very important,
then you probably want to test all code paths.
Which feels thorough, but arguably is just the middle step.
Because in terms of strong guarantees, tests can only prove that it's not broken ''for the cases you've tried''.
Even if you prove code is predictable, it may still not be correct.
Case in point: Test Driven Development basically requires 100%, but is somehow not bug free.
And touching all code doesn't necessarily mean testing all code paths (though that difference ''can'' be small).
Which brings up the question of how high-quality are the tests, and how much time do you want to spend?
Code coverage is itself split into:
* function coverage - has each function been called?
: basic indication that said functions are not fundamentally broken
: {{comment|(while the things not called may be a signal that they are being left behind)}}
* statement coverage - has each statement in the code seen at least one use?
* branch coverage, decision coverage, condition coverage
** branch coverage: has each possible path through the code been tested?
** condition coverage: have all possible booleans (subexpressions) been evaluated to be both true and false? (focuses on combination of possible values and the logic that handles them)
** (condition and branch coverage are similar, but consider short-circuit evaluation)
:: roughtly "does this code work for more than just the most obvious inputs"
There are more details to this.
For example, safety-critical applications may have stricter requirements:
* MC/DC (Modified condition/decision coverage)
* Multiple condition coverage
: all possible (even if they evaluate to the same boolean subexpression values)


-->
-->

Revision as of 12:08, 17 September 2023


Keeping codebases healthy

Refactoring

Technical debt

Technical debt refers to the idea that

when you decide to apply a simplified, quick-fix, incomplete solution now,
particularly when you know you know will probably have to redo it properly later.


Note that this lies on a scale between

quick fix now, polish it up later
some double work between now and then
the work later will probably be a complete restructuring


Whether this is a good or bad idea depends on context, because yeah, often enough the ability to get other people started is worth some extra hours spent overall.


Yet when postponing mean a complete redesign there is a very good argument for more effort up front.

Particularly when the thing we talk about is something other parts will build on.

Particularly when that postponing it will lead to entangled things you will also need to change, increase that future work, on top of a moderate amount spent on a quick fix now.





Everything is experimental

Tests

There are more possible names for tests than most of us can remember or define on the spot. And some are fuzzily defined, or treated as interchangeable in some contexts. Here is some context for some of them.


Code testing (roughly smaller to larger)

Unit testing

"is this piece of code behaving sanely on its own, according to the tests I've thought up?"

Typically for small pieces of code, often functions, behaviour within classes, etc.


Upsides:

  • reveals your approach to code quality and knowledge thereof - probably not so much via the presence of unit tests, as much as what you're actually testing
  • unit tests are part of regression testing - "this part is fragile, and we expect future tweaks may break this again"
  • unit tests can be a form of self-documentation
example cases are often unit tests as well
and say, if I want to dig into details, then seeing assert uri_component('http://example.com:8080/foo#bar') == 'http%3A%2F%2Fexample.com%3A8080%2Ffoo%23bar' gives me more information than a comment/docstring saying "percent-escapes for URI use" (which in practice is pretty imprecise)
can be particularly helpful in helper/library functions
  • forces you to think about edge cases
...around more dynamic programming
...around more dynamic typing
doing this sooner rather than later avoids some predictable mistakes
  • sometimes you discover edge cases you didn't think of, and didn't implement correctly, and/or didn't describe very precisely
easily overstated, yet probably everyone has done this


Arguables:

  • you will only write tests for the things you thought of, not the things you forgot (and probably didn't just write into the code)
this a more of a indication of completeness than of correctness of function
  • the tests you write give you more security, the thoroughness you forget will give you a false sense of security
  • The more OO code dabbles in abstractions, the more black-box it is and the harder it is to say how much they even really test
  • a lot of real-world bugs sit in interactions of different code, and unit tests do not test that at all
sure that's not their function, point is that we that 'write tests' often leads to writing unit tests, not on finding bugs
  • while on paper the "try your hardest to think of everything that would break it" idea is great
...if you look around, a buttload of unit tests are of the "think of things you know probably work anyway" sort
...because a lot of people write unit tests only because someone told them sternly (often by someone who barely understands when they are useful and when not)


Downsides:

  • if it involves locking, IPC, network communication, or concurrency, or interact with other parts of the program that has state (think OO), or other programs that have state, the less you really test - or can even say what you have tested or not.
such things are hard to test even with much fancier techniques
  • there is no good measure of how thorough your unit tests are
if you think code coverage is that thing, you are probably a manager, not a programmer.
  • the less dynamic the behaviour, the more that unit testing converges on testing if 1 is still equal to 1
which wastes time
which can give a false sense of security
  • the more dynamic the behaviour (in the 'execution depends on the actual input' sense), the less that adding a few tests actually prove correctness at all
In fact, tests rarely prove correctness to begin with (because this is an extremely hard thing to do), even in the forms of TDD that TDDers would find overzealous
most of the time, they only prove you didn't make the more obvious mistakes that you thought of

Regression testing

"Is this still working as it always was / doing what it always did / not doing a bad thing we had previously patched up?"

Refers to any kind of test that should stay true over time.

particularly when you expect that code to be often touched/altered,
particularly when that code is used (implicitly) by a lot of the codebase, so bugs (or breaking changes, or false assumptions) are far reaching


Yes, any test that you do not throw away acts as a sort of regression test, but when we call it this, it more specifically often means "a test we wrote when we fixed a nasty bug, to ensure we won't regress to that bug later" - hence the name.

Regression tests are often as simple as they need to be, and frequently a smallish set of unit tests is enough.


Upsides:

  • should avoid such bug regression well
  • may also help avoid emergence of similar bugs.


Arguables / downsides:

  • the same specificity that avoids that regression means it's covering very little else
...even similar issues in the same code
  • which can lead to a false sense of security

Integration testing

"Does this code interact sanely with the other code / parts of the program?"

Integration tests takes components and checks whether they interact properly (rather than testing those components in isolation).


Integration tests are often the medium-sized tests you can do during general development.

...so not testing the product as a whole - that tends to be later in the process. Or, these continuous-delivery days, sometimes never a.k.a. "user tests means deploying to users and see if they complain, right?"



Fuzz testing

Fuzz testing, a.k.a. fuzzing, feeds in what is often largely random data, or random variation of existing data.

If software does anything other than complain about bad input, it may reveal bordercases you're not considering, and e.g. the presence of exploitable buffer overflows, injection vectors, ability to DoS, bottlenecks, etc.


Perhaps used more in security reviews, but also in some tests for robustness.


Can apply

for relatively small bits of code, e.g. "add random number generator to unit tests and see if it breaks",
up to "feed stuff into the GUI field and see if it breks".


See also:

Acceptance testing

"Are we living up to the specific list of requirements in that document over there?"

Said document classically said 'functional design' at the top.

In agile, it's probably the collection of things labeled 'user stories'.


Which can involve any type of test, though in many cases is a fairly minimal set of tests its overall function, and basic user interaction, and is largely unrelated to bug testing, security testing, or such.


These draw in some criticism, for various reasons.

A design document tends to have an overly narrow view of what really needs to be tested. You're not necessarily testing whether the whole actually... functions, or even acts as people expect.

The more formally it's treated, the less valued it is when people do their own useful tests.



End-to-End testing

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Basically, testing whether the flow of an application works, basically with a 'simulated user.

The goal is still to test the application at mostly functional level - whether information is passed between distinct components, whether database, network, hardware, and other dependencies act as expected.

End-to-end testing is often still quite mechanical, and you might spend time specify a bunch of test cases and expected to cover likely uses and likely bugs.

This is in some ways an extension of integration testing, at a whole-application and real-world interaction level, finding bugs

While you are creating a setup as real users might see it,

It's not e.g. letting users loose and see what they break.


Tests in short release cycles

Sanity testing, Smoke testing

Caring about users

Usability testing

Accessibility testing

Also relevant

Black box versus white-box

Self-testing code

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Self-testing code is code that includes some checks inside its own code.

This often amounts to mean

assert() statements within a function, e.g.
testing important invariants
doing your own regression checks
intentionally borking out earlier rather than later when a bug could have wide-reaching implications (e.g. around concurrency)


https://en.wikipedia.org/wiki/Self-testing_code

Mocking, monkey patching, fixtures

A little design for testability

On code coverage