Benchmarking, performance testing, load testing, stress testing, etc.
From Helpful
| This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine. |
Contents |
Web server related tools
| This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine. |
Realize that when you're testing throughput on hello-world apps, you're measuring only basic request parsing latency. If this takes 1ms instead of 100us, that may look ten times as fast, but when either of those are placed in dynamic page generation that takes, say, 50ms, you're not even going to notice the difference. Particularly when since the handler is IO bound, which many dynamic apps are.
ab, ab2
Apache Benchmark, that comes with apache or apache2. You could also use it with lightweight settings to see how concurrency is working.
Most interesting options:
- -n amount-of-requests: keep it busy for a while, OR:
- -t seconds: ...to keep it busy for a given amount of seconds
- -c concurrency uses a number of paralllel fetchers. Use this to realistically simulate many clients, and see whether they are handled in parallel.
- -k: use keepalive feature, to simulate clients doing various requests on the same connection. Arguably not very realistic for various real-world tests (but can be useful to see the maximum operation rate).
Example:
ab -t 10 -c 20 -k http://example.com/
The final slash is important as ab won't follow the redirect.
Notes on reading results
Ignore 'Failed requests' on dynamic pages
Since ab was written for static pages, it will treat differently-sized responses as probable errors.
Failed requests is overly cautious when you are testing a dynamic server. For example, for a -n 20 test you might get:
Complete requests: 20 Failed requests: 19 (Connect: 0, Length: 19, Exceptions: 0)
This only means that the reported length was different in each response, which on dynamic pages may make sense or be so by design.
reading results when concurrency>1
| This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine. |
Consider that given a constant amount of CPU time, you can expect n parallel requests to be served in n times the time it would take in isolation. (Note that availability of CPU time in multi-core systems is actually a little weird, since it can depend on the way the server does or doesn't multiprocess, and possibly on configuration)
For example, a request taking 100% CPU for 0.1 sec means the same processing time as four processes taking 25% CPU for 0.4sec each -- that's 0.1 CPU-seconds for each requests, totalling 10 requests-per-second either way, but the wall-clock time in each was longer because, well, a CPU has a fixed speed and you're dividing it up among more jobs.
With higher many-client load, you can expect the time taken in each request to rise like this. In some cases multiple core processing may divide by 2 or 4 or so to make concurrency of 2 or 4 almost as fast as sequential requests, and more if you are load balancing to multiple servers. (Note however that you won't see this reflected perfectly in millisecond-scale latency details. Larger-scale tests will be better at showing such things)
You can see this in ab with higher concurrencies. For example, tests on a hello world app on the same URL, with concurrency of 1 and 4, could yield results like:
-c 1: Time per request: 2.657 [ms] (mean) Time per request: 2.657 [ms] (mean, across all concurrent requests)
-c 4: Time per request: 10.950 [ms] (mean) Time per request: 2.738 [ms] (mean, across all concurrent requests)
In the second case, that's almost perfectly a multiple of four indeed.
Effectivly, the first shows an estimation of the wallclock time, the second of CPU time under the assumtion of the given concurrency.
This example was tested on a single-thread process, and most real web servers will do better. Note that they do only a little better per client -- the effective concurrency to a single client IP (such as your ab process) is likely to be limited by server configuration as well as browser setup (both to avoid DoS-like usage), and may be something like 2 or 4(verify).
This means that a single-client test like this will test how well a single client will be served, and may not test how idependent connections to multiple clients may show an effective increased overall server resource use efficicenct and/or request handling rate.
However, at the same time, most any single server can only do a few concurrent jobs before it has to divide resources and will do each proportionally slower, so usually the difference is not very large. This means that a single client may hit something close to the sweet spot of the concurrency a server can usefully provide, so it may actually be a decent benchmark.
High concurrency from a single client may mean that the server queues further incoming connections while waiting for the previous to finish, meaning that the server may actually not be loaded that even though you're trying to pound it with requests.
Because of the various details mentioned above and some others, good benchmarks / stress tests are made on real-world applications, and from several hosts.
Those that are not are usually still good indication of something, but what that is may need some clever puzzling.
Others
| This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine. |
- TheGrinder (free)
- OpenSTA (free)
- WebLOAD (free)
- IBM LoadRunner (commercial)

