Difference between revisions of "Webpage performance notes"

From Helpful
Jump to: navigation, search
m (script load ordering)
m (Expires (unconditional caching))
Line 988: Line 988:
It can make sense to mix <tt>Expires</tt> with other mechanisms.  
It can make sense to mix <tt>Expires</tt> with other mechanisms.  
For example, for people that are actively clicking around on your website, even an 'access time plus 3 minutes' Expires will make repeat loads of some resources low-latency (lower than a lot of If-Modified-Since/304, since that's a network interaction for each item). {{comment|(...though both sides' computer clocks must be set accurately for this to work on a few-minute scale (and not be timezone-ignorant)}}
For example, for people that are actively clicking around on your website, even an 'access time plus 3 minutes' Expires will make repeat loads of some resources low-latency (lower than a lot of If-Modified-Since/304, since that's a network interaction for each item). {{comment|(...though both sides' computer clocks must be set accurately for this to work on a few-minute scale (and not be timezone-ignorant)}}
However, as https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expires notes, the presence of Cache-Control may make UAs ignore Expires

Revision as of 11:03, 22 June 2021

Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · plugin notes · unsorted

Server stuff:

Dynamic server stuff:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Client side

reflow and redraw

For reference:

  • browser reflow, a.k.a. relayout
means something changed the sizes and/or positions, which due to the hierarchical relations tends to mean many/most/all of the page needs to be moved/resized (and then redrawn) as well
considered bad to do unnecessarily, particularly if it's a number of times on initial page load, because as far as humans care things are done when they stop jerking around.
can be expensive on pages with complex structure (where it might easily take over ~100ms)
also usability-wise - consider a page spends ten seconds inserting ads in the text you're trying to read, then maybe doing some modal popovers
  • browser redraw refers to having to draw parts of a web page again
more often a small part, due to changing the styling on just that part, animating something to draw attention
this is often much lighter operation than reflow, in that you can (aim to) avoid recalculating all the other things

The render tree depends on DOM as well as the CSSOM (which are basically the parsed in-memory versions of the HTML and the CSS).

It gets updated as things load, and can both cause reflow and redraw.

Changes to the CSSOM

may only cause redraw
can cause reflow
particularly things that cascade down to eveything
restyle may refer to CSSOM changes that trigger redraw but not reflow

Some basic suggestions for faster times to a complete render tree, and/or fewer reflows, include:

  • to link to CSS before your JS in your HTML
(the render tree will be complete earlier, just because the fetches started earlier)
  • to combine multiple CSS files into one
(the render tree will be complete earlier due to not incurring request overhead more than once)
  • to uglify CSS (and/or transfer it compressed).
usually has little effect, unless the CSS is large or the network slow, because this only alters transfer time, not latency, and rarely does much to CSS parse time.

  • minimize image loads
combine all interface images into sprites
large images are often the main thing you're presenting, and are sensibly part of the "thing you're loading"
some secondary images can get late-loaded, e.g. via scripting

Note that there are further causes for reflows:

  • img tags without a pre-set width and hight will cause reflow, because they change size when loaded
so ideally you know and have set the image size in the document
  • scripting that alters the DOM and/or CSSOM may cause reflow
and consider that most tends to only start once the DOM is loaded.
(or, if started earlier, necessary finish no earlier than that - if it's doing anything to the DOM)

Note that when a lot of things happen in a very short time, browsers tend to be clever enough to not reflow for all of them individually.

The above is an above-the-fold view. Non-relayouting (re)paints below the fold are freeish from from the perspective of initial load (verify)


load ordering

loaded/ready state and events

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


  • you mainly get a
    • 'HTML parsed' event (readyState==interactive // DOMContentLoaded )
    • 'everything else loaded too' event (window's load event)
  • undeferred JS delays both
  • undeferred images delays the latter
  • the difference between the last two matters mostly to scripting, really

delayed loading

resource loading hints

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

<link rel="dns-prefetch" href="//shop.example.com">
do DNS resolving for a probably-unknown host people may go to next

<link rel="preconnect" href="https://cdn.example.com">
Get the DNS, TCP connect, and (if applicable) TLS over with, but don't request
(various browsers guess at this already)
there are CORS details

<link rel="preload" href="/eg/other.css" as="style">
says "load as soon as you can and with these priorities", meant for this navigation
these requests do not affect the onload event
...so these are not for your page-central script and style tags (verify)
only really useful when you know things to load before the browser does
true for some things hidden in CSS (think fonts, background images) and some JS
there are CORS details
as= includes style, image, font, document (relates to CSP)
Note there is also a HTTP-header equivalent of this

<link rel="prefetch" href="http://example.com/thing.ext"> 
says "after you're done, maybe load these things because they're likely needed soon"
more useful for the next navigations than for this one
there are CORS details

<link rel="prerender" href="https://example.com/page2.html">
analogous to opening a page in a hidden tab (parses everything, starts scripting)
Which makes it a good thing if you are pretty certain this is the next navigation
and otherwise just a drain on all resources
not widely supported

See also:

Both client and server

Some related underlying mechanisms

Persistent connections, a.k.a. keepalive

For context, HTTP 1.0 originally did just one fetch, meaning a separate connection for each script, image, and whatnot, after which the server always closed the connection. The extra time adds up, more so for HTTPS).

Persistent connections, a.k.a. keep-alive connections, are a HTTP1.1 feature that lets you keep a connection open and do requests and responses back-to-back.

So still sequential only, but that saves you the overhead of connection setup for each request. This makes sense when a client will be doing a bunch in quick succession - such as the initial load of most web pages.

Actually, keepalive was also tacked on to many a HTTP1.0 server, but a HTTP1.0 request/client has to ask for it using
Connection: Keep-Alive
. If the server sends back the same in the response, the server is signaling that it both supports keepalive and agrees to use keepalive and will leave the connection open. If not, then not.

In HTTP 1.1, all connections are keepalive by default.

In terms of specs, it closes only when(verify):

  • the client specifically asks for the connection to be closed
which a client does using the
Connection: close
request header
(and only do when it won't need another in a while)
  • the server sends
    Connection: close
    response header.
It may do so
when it cannot know content length ahead of sending the response
if it chooses not to support keepalive (but still be otherwise HTTP 1.1 compliant, which is probably rare)

At the same time, clients should always be prepared to open a new connection for the further requests it needs.

Because there are a handful of reasons that at no point is there a guarantee that a connection actually will stay open, or for how long it will stay connected while idle. (...though in practice the initial load of a webpage will often be so fast that it typically will be reusing a few connections)

servers typically close keepalive connections when they've idled for a short while. I've found figures for apache and nginx like 5 or 15 seconds, IIS 2 minutes (verify)
possibly longer when listening to Keep-Alive headers(verify)

See also:

HTTP1.1 Pipelining

tl;dr: Nice idea, but not useful this time 'round.

Pipelining is an option (on top of a persistent connection) where further requests can be sent without having to wait for the previous response.

This means that instead of keepalive, which is

request, wait, readresponse,
request, wait, readresponse,
request, wait, readresponse,

you can do e.g.

request request request,
readresponse readresponse readresponse

I.e. request all images for a page at once. In ideal cases, this means that instead of getting the latency for every request, you get it roughly once, because you don't have to wait for one response before you can do the next request, and the responses are likely to arrive fairly back-to-back.

Best case: the client spends less walltime waiting, and doesn't need multiple connections for multiple documents.

Bad case: one thing holds up everything, because things must be returned in request order. In which case it would be no better than just persistent connections.

Average case varies, though is often at least a little better.

However, there are too many HTTP proxies and HTTP servers that have pipelining-related bugs, so most browsers have disabled it for HTTP1.1, in hope of better luck with whatever replaces it.

Technically HTTP1.1 requires pipelining support of servers(verify), but clients should not assume it is there.

Pipelining is not negotiated. Clients can try to pipeline their requests on an persistent connection. If they do, they must be able to detect failure, and be prepared to silently fall back to a new connection and not pipeline on that. (There are some further limitations. See e.g. RFC 2616, section

...which means that trying to pipeline on a server that doesn't support it is initially slightly slower than not trying, which is one reason various browsers won't try(verify). (Others are that there have been pipeline issues in various proxies, and a few servers. As such, various browsers support it but have it disabled by default)

Notes for webdevs:

  • will only work on persistent connections, which means making those work first
  • The reponses must still arrive in the same order as the requests, so the front of that queue could block the rest. Sometimes it pays to try to control the order clients do those requests in

On persistent connection - ends of bodies, Content-Length, and chunking

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

With one-request-per-connection connections, clients can take end-of-stream as meaning end-of-body.

With persistent connections and pipelining (HTTP 1.1), that approach won't work - the client must in some way be able to tell responses apart.

As a result, HTTP 1.1 servers will close a connection if it knows the client cannot tell where it ends, so a persistent connection is only possible when the client can know when a response is over.

Persistent connections are possible for, mainly:

  • responses defined to never have bodies, e.g.(verify) 1xx, 204, 304
  • bodies with Content-Length header
means the receiving end can read the next that-many bytes and know that was exactly the response
On static content this is fairly trivial (and servers may do this for you), on dynamic pages you need to pay more attention.
  • bodies using Transfer-encoding: chunked (see RFC 2616, section 3.6.1)
means that the reponse body will be sent as a sequence of:
length as a hex string (and optionally a semicolon and some parameters, but none are standardized yet so practice probably doesn't see that yet(verify))
that amount of bytes
a finished response is marked by a zero-length chunk - which means you don't need the content-length to know where the content ends.

An example stolen from here



  • assume that very small chunks are inefficient, because servers may send them in individual calls, rather than combine/buffer them
  • There are no hard constraints on the chunk sizes(verify)
  • When using chunked transfers, a server can tell the client that it is delaying certain headers until the trailer after the chunked body - which is useful to delay things like Content-Length, Content-MD5 (meaning a server send huge amounts of data and calculate both of those while sending).
There are a number of restrictions on delayed headers, see e.g. RFC 2616, section 14.40.
  • All HTTP 1.1 implementations (servers and clients) are required to understand chunked transfer-coding. Servers almost always do, but not all clients do (particularly those that are mostly HTTP1.0 with a few HTTP 1.1 features).
Servers can decide by themselves whether to send a message chunked or not.
  • Clients can make chunked requests, though this is rarely very useful (verify)

See also:

HTTP connection limit

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

HTTP clients are expected to limit the amount of connections they make to any one server.

This was originally mostly to avoid congestion on the client side, because on dial-up could only carry a few full-sized IP packets per second anyway, meaning they would see no improvement from more connections, and soon actually degradation.

Many servers cannot impose such a limit per client, and effectively rely on RFC-observing clients being nice, or or rely on QoS style things at lower level(verify). Some can but may not do so by default(verify).

Servers will often limit the amount of requests they handle at once (amount of workers), mainly to limit resource use (memory, CPU), but this is effectively an overall measure. It works out as moderately per-client for clients that play nice, though.

Types and amounts

RFC2616 (The HTTP 1.1 standard) mentions:

  • clients should not use more than two (persistent) connections to a server
  • HTTP 1.1 servers should not try to solve temporary overloads by closing connections. This can cause problems in itself, and relying on TCP flow control often works better.
  • a proxy should use up to (2*users) connections to a server or other proxy

In practice, the numbers these numbers were increased somewhat in the twenty years since (more resources per page, more async requests), but not much (values seem to be 4 or 6), in part because persistent connections help anyway. It's an ongoing discussion, but some consensus that between the choice of lots-of-connections, or alternatives, most alternatives are better.

When browsers follow the above hints, the congestion is often the client waiting on itself (for more connections to open up).

Note that a server responding more slowly will effectively slows the client request rate a little. which is arguably a feature.

Browsers may also have a global limit(verify)

Further notes

Note that the network stack also play in this. If incoming TCP connections pile in faster than they can be handled, they are placed in a queue, and when that queue is full, further connections are rejected at TCP level.

It seems the browser limit is typically per hostname, not per IP address(verify), which means a server side can use some basic DNS and/or vhost tricks to get a browser to make a few more connections to what is effectively the same server/proxy/load balancer (but beware of making caching much less efficient in the process - you often want to distribute rather than duplicate)

Note that if you use a (non-transparent) HTTP proxy, 'it is the the server - which effectively makes the per-server limit the overall limit. (verify)

http://www.openajax.org/runtime/wiki/The_Two_HTTP_Connection_Limit_Issue#Detailed_write-up http://www.ajaxperformance.com/2006/12/18/circumventing-browser-connection-limits-for-fun-and-profit/


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

HTTP/2 smooths a few things which in HTTP1.x were workarounds, and tends to lower latency in the process. (it seems that SPDY was the experimental version, and everything useful ended up in HTTP/2(verify))

HTTP/2 does not change HTTP1's semantics, but is a completely different transport at byte level. (You can see it as an API where the byte level is now handled for you. In HTTP/1.0 you could still do it all yourself because it was minimal, while proper 1.1 compliance was already quite hard.)

Because of the same semantics, dropping it in shouldn't break anything, but it can still be a bunch of work.

Interesting things it adds include:

  • Request/response multiplexing
basically a better version of pipelining, in that it does not have head-of-line blocking issues
...except that under packet loss it still does, because of how TCP recovers in-order
  • server push
Basically means the server can pre-emptively send responses, to prime a browser's cache before it knows it needs parts
(fallback for non-supporting browsers is that it would just do the request)
the server itself has to know precisely what to push -- this is actually more complex than you think
  • Request/response priorities
e.g. send css first, js second, images last

And details like:

  • compresses HTTP headers
helps (only) when they're not trivial
and primarily applies to request headers, very little on not response headers
(arguably mostly useful for some CDNs)

Some notes:

  • Browsers seem to have chosen to only support the TLS variant(verify)
  • single connection, so can be more sensitive to packet loss (which is essentially head-of-line at TCP level)


HTTP/2 is now supported


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

QUIC is an always-encrypted transfer, acting mostly like TCP+TLS+HTTP/2, but implemented on UDP.


  • faster initial connection setup (in part because encryption was designed into the protocol, not wrapped around it)
  • better, HTTP/2 style connection multiplexing
  • always encrypted and authenticated


  • because it's sort of TCP over UDP, firewalling is harder
  • more complex to set up

There are two of 'em now, google QUIC and IETF QUIC, which have diverged enough to be considered separate beasts, though hopefully the two will converge again.

The choice of UDP seems to relate to

adoption - many switches would drop unknown new protocol
dealing with packet loss differently(verify)
with the inability to encrypt TCP in itself(verify)



This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Basically HTTP over QUIC, and solves the issue of head of line blocking under packet loss (well, improves the recovery).


AOTW not supported by anything yet, except experimentally

Details and arguments about page loading

Hosting Elsewhere, static/dynamic split

When part of your content is static (images, and also css, and scripting) there is some value in it being fetched from a server other than the one already busy enough with dynamic pages.

Options include using

  • a server for just static content (There are also some tricks you can use to lessen disk IO and lower response latency in such a static server)
  • using a CDN (same thing, functionally, but management is different)
  • using something like nginx (to handle just these requests) in front

If the browser now connects to more hosts than before, it may now be using more connections (its own per-host connection limit, now with more hosts) and pages can load a little faster. It can pay to split page layout (html,script,js) separately from media(verify).

This does not necessarily make that much latency difference if the browser was already pipelining, but it can't hurt.

In theory, some cookies can also be avoided - if a few KB of cookies apply to each request to a dynamic host, that adds up.

Splitting static and dynamic content may also make your management of Vary and other cache directives simpler (since static content is non-Vary, Cache-Control public).


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


  • Saves bandwidth from the origin server
  • May lead to slightly-to-noticeably faster page loads, depending on how you do it, and on how many external resources need to be loaded.
In many cases, most time until the page is considered loaded is spent on external resources (css, scripts, and images).
For some things you can avoid the request entirely
For larger things a request to check may still be necessary yet you can avoiding some actual transfer
  • Having proxies (transparent or not) closer to the browsers can helps
providing the origin server lets them cache content.
It makes most sense for static stuff.


  • For very small files, most time is spent in the network, not in actual data transfer.
For example, ten 1K files files versus ten 304 responses is not a very noticeable difference in speed. You would not see much improvement until you do unconditional caching, but that may not be practical.
...which is an argument for things sprites (images), combining compilers (js and css) because that reduces the amount of requets
  • dynamic stuff is not cacheable

Some basic things relevant to understanding caching:

  • Private cache mostly means the browser cache. Public cache means caching somewhere between browser and origin server (proxy caches, transparent caches)
  • headers control how private and/or public caches ought to act: whether they may cache it, how long, what to base conditional requests on (if applicable)
allowing proxies to cache data from the end server means the data can come from a server closer than the origin server - which reduces unnecessary transfer on the 'net. (latency may be better, e.g. if it means an ocean more or less)
allowing browsers cache data from the end server means less transfer from the server. Avoiding requests at all also tends to be noticably faster (particularly Expires is a powerful tool)

  • when / how requests are done:
    • requested each time (no caching). Makes sense on some dynamic pages that change a lot and/or have to be as recent as possible.
    • conditional requests - ('if changed from what I have'). If it has changed, it gets the new data in response. If it has not changed the server sends a tiny response that lets the the browser know it can use the copy it has in cache.
      • Largely about saving bandwidth, so is nice primarily for large-ish resources.
      • Has less effect on latency / loading speed, so not as useful when loading many small resources.
      • Web servers usually serve static files in ways to allow conditional caching
      • By date: "Send me content only if it has changed since (specific date), otherwise give me a body-less 304 Not Modified response)"
        • Content is served with a Last-Modified header mentioning a date. The browser will remember that date and ask the server If-Modified-Since with that date.
        • With static file sharing, it's easy enough to use the file's modification time - the server will need no data store, only to stat() the file. (In modern dynamic content sites, you may want to use the Etag system instead.)
    • unconditional caching - no request as long as cache entry is present and not stale
      • mostly the use of Expires: (see the section below)
      • Tells the browser that until some later time, the content will not change at all and the browser need to check cached content (once the date passes the content is considered stale, and other things determine whether it then does a conditional or unconditional request)
      • useful to avoid 'no change' responses when you have many small resources (such as scipts and images)
      • This information cannot be revoked or changed, since the browser won't contact the server until expired (or the cache is cleared), so this is problematic for things that might sometimes change, (particularly when they are related to each other, such as scripts and images from the same page style - some people might see a half changed and possibly even a broken interface. Note it's easy enough to work around: refer to the new content by different URLs)

Further notes:

  • When testing cache behaviour, don't use reload
In many browsers, hitting reload means 'use cache but consider it stale'. If you use Expires, you'll see 304s where you expect no request at all.
If you want to simulate 'user comes back later', hit enter in the address bar (or open the URL in a new window or tab). (Note: Ctrl-L goes to the address bar in many browsers)
  • Proxy caches are fairly unlikely to cache reponses for POST requests. You may want to consider this in your site design.
  • Things still in present in the browser's recent history may' be served from its local cache or memory, even if the cachr logic would suggest it check with the server. Can apply to use of the back button, and to history.
  • Keep in mind there is some variation in UA behaviour. And there were some bugs (e.g. older IE had some known problems).

  • Developer tools may not show things as you expect. Spend a little time to lean to read it.

HTTP 1.0 and HTTP 1.1

Different headers apply to HTTP 1.0 and HTTP 1.1. While web browsers are mostly compliant to 1.1, other UAs may be compiant only to 1.0.

The HTTP 1.0 specs had a few cases with no defined behaviour, which has led to some creative (and hard to predict) behaviour in clients and proxies. In part, HTTP1.1 is simply better defined (e.g. 13.2, Expiration Mechanisms), and it also has some more powerful features.

It seems that developers aren't always clear on what mechanisms should be used, so it helps to read the specs and various summaries out there.

To summarize the mechanisms, HTTP 1.0 has:

  • Last-Modified, and If-Modified-Since: - conditional requests based on time (usually file modification time)
  • Expires: - unconditional caching
  • Pragma: no-cache, forcing the request to go to the origin server, and not from a cache (such as Squid)

HTTP 1.1 has:

  • Last-Modified: and If-Modified-Since: (like HTTP 1.0)
  • Expires: (like 1.0)
  • Cache-Control:, which allows origin server and browser (and caches) to specify rules that apply to public and private caches, including:
    • max-age, in seconds in the future from now. Useful as a relative time measure instead of the absolute-date Expires header - but is considered more of a hint(verify) (unlike Expires)
    • expiration changes
    • how important stale-based re-checks are to serving cached content
    • have a public/private(/no-cache) distinction. Public means it may be cached on transparent caches, private means only the user's browser may cache it, and no-cache (as before) that it must not be cached at all.
  • ETag ('entity tag') system, which lets you do conditional caching based on content identifiers rather than based on time


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


  • cacheing proxies don't cache HTTPS (unless you MITM it)
  • endpoints can cache just as much
  • make sure your cacheing headers are correct

Intuitively, you might thing that things that are secure should not be cached.

Yes and no.

There used to be a good argument that the content you want to use HTTPS on is going to be personal and therefore dynamic content, and you will be setting headers so that this is only endpoint-cacheable and not proxy-cacheable.

This is less true now that Chrome has pushed everyone to consider HTTPS, because that basically means cacheing proxies don't really work anymore -- because proxying and secure transport are fundamentally at odds.

Unless you specifically MITM them. Which makes sense in business/university where your admins have control over workstation certificates and their proxy settings. And possibly within some other well-controlled structures, e.g. within CDNs.

Note that to browsers, little has changed. Endpoint-cacheable content is just as cacheable.

(HTTPs is a secure transport. The browser delivers the decrypted content to itself, and the transport has no direct bearing on cacheing logic)

Disabling cache

Sometimes you want to make sure that generated content always comes from the server.

To make sure you are disabling the case in the face of HTTP1.1, HTTP1.0 and older HTTP1.0-only proxies, you'll probably want the server to use the following headers:

  • Cache-control: no-cache - a HTTP 1.1 header, which works on current browsers and decent proxy software
  • Pragma: no-cache for HTTP 1.0 browser and HTTP 1.0 proxies - but various things do not honour Pragma, so you usually want:
  • Expires: (date in the past) Giving an Expires value of 0 or an invalid date should be interpreted as immediate expiration. It's slightly safer to just pick a valid date squarely in the past, though.

TODO: check

  • Apparently, Pragma: no-cache is invalid/deprecated in responses, but not yet in requests?

Expires (unconditional caching)

Expires means the server tells the browser that "until [date], you should not check cache-freshness with me at all, not even with a conditional request"

Expires is most useful for anything that will never change and be requested somewhat regularly by the same UA or proxy.

For public-cache content, this also means it can be cached closer to home, so it also relieves the origin server from doing quite as much work.

Not very useful for content that is expected to change every now and then (basically at all), because of the granularity.

(If you want to change the theme images & CSS for a page, you can always have your HTML refer to new names for everything -- if not, you'll have the problem of having frequent visitors see the old theme for a while (or even mixed or broken content)

As RFC2616 notes

  • "All HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception"
and you can take that to mean UTC

It can make sense to mix Expires with other mechanisms. For example, for people that are actively clicking around on your website, even an 'access time plus 3 minutes' Expires will make repeat loads of some resources low-latency (lower than a lot of If-Modified-Since/304, since that's a network interaction for each item). (...though both sides' computer clocks must be set accurately for this to work on a few-minute scale (and not be timezone-ignorant)

However, as https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expires notes, the presence of Cache-Control may make UAs ignore Expires


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Vary tells proxy caches what part of a request it should consider when determining whether two requests are identical - which it does to decide when the request cannot by a cache and must come from thew origin server.

Reason to do so include

  • With dynamic sites, it happens more often that the same URL shows different things for different people - often because of personalized content (/home, /friends)
could e.g. vary on cookies - different cookies imply different users
  • for translated versions of the same content at the same URL
(meaning you're using Accept-Languages -- which a lot of sites don't)
  • sometimes technical reasons (such as page compression)

Caches may choose to cache each version individually, or not cache them at all, so to get the most out of caching (...proxies) you generally want to vary on as little as possible so that the cache applies exactly as often as it usefully can.

The value of this field is a case-insensitive, comma-separated list.

They are usually names of request-header fields (but the RFC notes that's not a restriction).

The field-names given are not limited to the set of standard request-header fields defined by this specification. Field names are case-insensitive.

Some of the more commonly used things (some usually in combinations):

  • User-Agent
    • has been used to sort out agents that support compression from those that don't
    • now particularly for mobile content
    • note that there are *many* specific user agent strings, and this will store one for each. It won't necessarily save much bandwidth to have dozens of variations of each page
  • Accept-Encoding
    - e.g. used by mod_deflate, to help ensure that deflacted content won't be sent to agents that don't understand it
  • Cookie
    • on personal pages presented under the same URL as others, like /home)
    • perhaps for pages that use cookies to significantly change its presentation
  • accept-language
    - use if you actually serve different pages for different languages
  • accept-charset
    - regularly combined with accept-language


  • It seems that modern browsers will generally not cache anything that has Vary set to anything other than User-Agent: (verify)



Cache-Control: max-age=3600, must-revalidate

A HTTP1.1 header. Requests and responses can contain this header, with different meanings (and with different parts).

Note that 'private cache' generally means browser's cache, and 'shared cache' usually means 'proxy cache', and that there is sometimes a difference between a browser's disk cache and memory cache.

Responses (which are the things intercepted by caches, so which contains hits from the origin server) can contain:

  • private - may be stored in a private/browser cache but not in a shared/proxy cache
  • public - may be cached in a shared and in a private cache
  • no-cache - content may be stored, but must be checked for freshness before local content is served (that is, caches must never return stale cached requests even if they are configured to do so). Useful for public content that requires authentication.
  • no-store - may not be stored in even the browser's cache
  • no-transform - tell proxies not to change the data (such as recompressing images for space)
  • must-revalidate - Force strict obeying of your values (without this, HTTP allows agents to take liberties with values such as max-age and Expires when evaluating freshness)
  • proxy-revalidate - Make proxies strict this way, but allow browsers to take more liberties
  • max-age - the maximum time something should be kept in a cache (in seconds)
  • s-maxage - like max-age, but applies only to caches (like an override)
  • ...and possible extensions

Requests can use

  • no-cache
  • no-store
  • max-age
  • max-stale
  • min-fresh
  • no-transform
  • only-if-cached - Apparently used among sibling proxies, to synchronize content without causing origin requests(verify)
  • ...and possible extensions (all values not mentioned here are ignored if not understood - allowing values belonging to specific extensions to be ignorable)


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Etag ('entity tag') allows a modification check system based not on date/time but on some chosen identifier, one that is meant to be representative of whether the file has changed -- possibly a hash of the contents, but there are often resource-cheaper alternatives.

An Etag-aware client can choose to remember received Etags, and next time ask the server "I have the version you tacked this particular identifier on. Is that the version you would serve to me now, or do you want to give me another one?"

Or rather, you can both do a request if the Etag doesn't match and if it does match.

  • If-None-Match: something - often used for 'give me content only if it's changed since the time you handed me this indentifier
  • If-Match: something

Many web servers now automatically create ETags as part of their static file serving, based on something easy to reproduce; apache2 uses "inode-size-mtime" (see also FileETag), IIS bases it on mtime and an internal config change counter. This makes Etags unique per host at best, which means that if you have more than one server sharing the load of serving the same content, you need to tweak the Etag system.

When doing dynamic content generation, it's fairly easy to write your own Etag system, and frameworks may do most of the work for you. Exactly how you generate the identifier is up to you. Sometimes a content hash makes sense - but if that means IO and CPU on each access, it can make sense to check against a database, perhaps making filenames the hash, possibly memcache it, or some other trick that makes it a simple and fast read-out (preferably without IO in the case of 'no change').

You can combine Etag with byte-range operations. That is, instead of Range, you can use If-Range, which allows uses like "send me parts that I am missing, but if things have changed, send me the whole new version" in a single request.

You could even use Etag for conditional execution, particularly to have rules about things that have side effects (PUT, GET with database access, etc.). Some HTTP-based protocols use it this way(verify).

See also:

browser bugs

See Browser eccentricities#Caching bugs


In apache, you can use mod_expires, which allows you to set a minimum time in cache (or some time since last file change).

You can have settings at server (not advised!), vhost (if you know what you're doing), and directory/htaccess level, and can set it per MIME type - and practically also per extension.

Besides the inherent overkill behaviour of the Expires header, there seem to be a few gotchas:

  • It seems to apply to all content regardless of whether the source was dynamic or not, which is bad on dynamic sites.
  • It does not interact with other cache headers, which is regularly also not what you want.
  • Server-level ExpiresByType overrides more specif (e.g. directory-level, FilesMatch) ExpiresDefault. This is one reason you shouldn't setting things at server level even when you're not using vhosts.


ExpiresActive On
#That's the shorthand form for 'access plus 0 seconds' 
ExpiresDefault A0
#I prefer the longer form used below, as it is more readable. 
<Directory /var/www/foo/>
  ExpiresByType text/css     "modification plus 5 minutes" 
  ExpiresByType image/png    "access plus 1 day"
  ExpiresByType image/jpeg   "access plus 1 day"
  ExpiresByType image/gif    "access plus 1 day"
  ExpiresByType image/x-icon "access plus 1 month"
<Directory /var/www/foo/static>
  ExpiresByType image/png    "access plus 1 day"
  ExpiresByType image/jpeg   "access plus 1 day"
  ExpiresByType image/gif    "access plus 1 day"
  <FilesMatch "\.(xm|jp2|mp3)$">
    ExpiresDefault "access plus 3 months"   
    # or larger. Browser caches will have likely forgotten it anyway, and chances are so will public caches.
<Directory /var/www/foo/weeklycolumn>
  ExpiresDefault "modification plus 6 days"
  # This is a *file* based timeout, independent of when it was accessed. 
  # Beyond that specific future time the agent will *always* check, 
  # so this is most useful for data that actually changes regularly
  #  If this were 'access', clients might not check until, 
  # in the worst case, six days after you changed the page.


  • To be compatible with servers that don't have the module, always wrap in a module test,
    <IfModule mod_expires.c>
  • know the difference between 'access' and 'modification' - it's not a subtle one.
  • Be conservative and don't use Expires as your only caching mechanism. Clients will fall back to If-Modified-Since anyway (and if they don't, that is the mechanism you should be focusing on) so you're basically setting the interval of real checks.
  • Things like styles and scripts should not have long expire times - old styles will apply to previous visitors for a while after you may have changed them completely. . (unelss of course you use new filenames for each))

Manual apache statements

mod_expires is so basic that it can only set Expires, no other cache control headers.

In some cases, you may want to abuse mod_headers, for example:

<FilesMatch "\.(html|htm|php)$">
  Header set Cache-Control "max-age=60, private, proxy-revalidate"
<FilesMatch "\.(jpg|jpeg|png|gif|swf)$">
  Header set Cache-Control "max-age=604800, public"

Note that Cache-Control is a HTTP1.1 header


HTTP compression will often easily reduce HTML, CSS, and javascript to 20-40% of its original size, depending on the method of compression (gzip and deflate/zlib) and the content.

Browser rendering speed gains are negligable unless the data is relatively large or the client is on a low-bandwidth connection, but the reduced bandwidth use is useful, even when only in terms of server bandwidth bills.


  • IE6 and previous never cache compressed pages (yes, this is a stupid bug). Whenever there is repeat downloading of fairly small files, caching is more important than compressing (to both sides). This basically means that you never want to send compressed content to IE, so if you want to use compression you may want some browser-specific behaviour. Ugh.
  • IE (versions?(verify)) may decide that compressed error pages are too small to be real(verify), and decide to show its own. You may want to avoid compressing these.


  • In some implementations gzipping implies that the document can only be delivered as a whole (and not shown incrementally in the browser as it is downloaded). In other implementations, gzipped delivery can happen in chunks.
  • If you code compression yourself, you should check the Accept-Encoding: header for which compression format, if any, the browser will understand in a response. (HTTP1.1 clients technically must support it, but simpler ones may not. In HTTP1.0 it was optional)
  • Compressing small files is often not useful at all; trying to compress 500 or so bytes of output is rarely really worth the CPU time spent on it.


In apache, mod_deflate is implemented as a transparent output filter and likely to be installed but not enabled.

Check that there is a line like the following in your apache config:

LoadModule deflate_module /usr/lib/apache2/modules/mod_deflate.so

Perhaps the simplest way to use is to apply to few specific mime types (whitelist-style), such as:

AddOutputFilterByType DEFLATE text/plain text/css text/javascript 
AddOutputFilterByType DEFLATE text/html application/xml application/xhtml+xml

You could set these globally if you wish.

The module listens to environment options like no-gzip and dont-vary. This allows 'enable globally, disable for specific things' (blacklist-style) logic:

SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI \.(?:png|jp2|jpe?g|jpeg?|gif)$  no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:t?gz|bz2|zip|rar|7z|sit)$  no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$                          no-gzip dont-vary

Since apache can set environment based on various tests, you can also use this behaviour to disable compression for IE (which you usually want), and probably want to do in global apache config. It seems everyone copy-pastes from the apache documentation:

BrowserMatch ^Mozilla/4         gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSI[E]           !no-gzip !gzip-only-text/html
# The bracketed E there is a fix to a past apache parse bug.
# Tells proxies to cache separately for each browser
Header append Vary User-Agent   env=!dont-vary
# This varies everything for user-agent by default unless dont-vary is set,
# which you can set on content you know it won't matter, for example
# when you won't compress it.


  • can be set in server, vhost, directory, and .htaccess
  • You can also tweak the compression ratio versus resources tradeoff -
    DeflateCompressionLevel value
  • It seems some browsers have problems with compressed external javascript specifically when it is included from the body section of a document, not the head. Something to keep in mind (and (verify) and detail here).
  • You can get apache to log the compression rates, to see how much it's helping. See [1] or [2] for details


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(section very unfinished)

mod_gzip works in a similar way to mod_deflate

<IfModule mod_gzip.c> 
   mod_gzip_on  Yes      
   mod_gzip_dechunk yes  
   #What to use it on: (example)
   mod_gzip_item_exclude file "\.css$"  
   mod_gzip_item_exclude file "\.js$"
   mod_gzip_item_include file \.htm$
   mod_gzip_item_include file \.html$
   mod_gzip_item_include mime ^text/.*
   mod_gzip_item_exclude file "\.wml$"

It has some extra features, such as checking for an already-compressed version (.gz on disk) when doing static file serving, and being more configurable.


PHP filter

The sections above apply to specific types of static files - well, depending on how they are configured. They can be used to handle PHP's output as well, but you may want to do it in PHP (Support for compression was added around 4.0.4). Doing it in PHP can be a little more work, but it can be smarter about output chunking, and you can do it selectively the way you control.

If zlib is not compiled in, PHP will ignore you silently.

In practice you probably don't want to set it globally, but do it selectively via apache config or .htaccess, often per directory (or even for specific scripts, using Files or FilesMatch). When PHP is compiled in, apache has the directives php_value and php_flag which let you control this:

php_flag zlib.output_compression On
# When you give a size (Note: using php_value, not php_flag),
# you enable it and also set the output buffer size (default is 4KB):
php_value zlib.output_compression 2048
# Optional:
php_value zlib.output_compression_level 3
#Default seems to be 6, which is relatively heavy on CPU. 3 is lighter and decent. 
# Even 1 will be noticable improvement on most text.


  • The documentation says you can use iniset to enable "zlib.output_compression", but this seems to apply to few PHP versions(verify). It is non-ideal in other ways: You can't seem to iniset the compression_level(verify).

Also, if a higher level setting caused a script to compress, you can disable compression with iniset, but it will still use output buffering - even when you set explicit flushing.

Writing gzip from your own code

Check whether you can:

Supporting browsers will send a header like:

Accept-Encoding: gzip
Accept-Encoding: gzip, deflate

Some old browsers, like netscape version 4, have bugs and effectively lie about what they support - you'll want to test for them and not send them compresed content.

Signal that you are:

When you decide to use one of the advertized methods of compression, tell the browser about it, probably using:

Content-Encoding: gzip

There is also a Transfer-Encoding. The difference is largely semantic; the idea seems to be that Content-Encoding signals the data is meant to be a .gz file, while Transfer-Encoding states it's just about transfer - such as compressing (static or dynamic) HTML to save bandwidth. ((verify) both are well supported)

In practice, Content-Encoding serves both purposes; there little difference other than choices the browser may make based on this -- but things such as 'whether to save or display' are usually controlled by headers like the Content-Type response header.

Do it:

This is mostly just doing what you are saying.

Note that
header should report the size of the compressed data.

Pseudocode (for gzip only):

if request.headers['Accept-Encoding'].contains('gzip'):
    gzip_data = gzip.compress(output_data)
    response.headers["Content-Encoding"] = 'gzip'
    response.headers["Content-Length"]   = length(gzip_data)
    #serve headers and data as usual

Chunked output involves telling Transfer-Encoding: chunked (something all HTTP1.1 agents must support), then writing fairly self-contained chunks (but I'm not sure about the details, either without or with compression)

Server side

Request, size and amount

Server side

Client side


Divide and conquer: offloading cacheing, spreading, balancing, etc.

Nginx notes

See also:

Web server related tools

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Notes on benchmarking:

If you are testing a hello-world apps, request-per-second rate are almost meaningless.

Because you are only measuring connection setup time. This is unavoidable latency you always get. Yes, you want this to be low, but it usually is.

10000 req/s may look ten times as fast as 1000req/s, but that difference is also just 0.9ms.

If most of your requests are served on that scale, then it's important. But I can guarantee that if you are doing anything interesting, you're probably easily using a few milliseconds, and frequently a few dozen, e.g. talking to a database or disk.

That ten times as fast has now become 10% of your overall problem. More than nothing, but probably less than any other part.

Even when you're making something for throughput, you'll find that most things are IO bound (and, when loaded, cause most things on the server to be), so it is usually much more interesting to relieve IO somehow (e.g. memcache anything you can, add cache headers to have some of the IO be conditional), than it is to look at that last unavoidable less-than-a-millisecond.

And importantly, it is the time your requests spend that is the main thing that caps the request rate you can handle. When you hit that rate, you will get slowness.

ab, ab2

Apache Benchmark, comes with apache.

Useful to check how your concurrency settings are working.

Keep in mind that the precise request rate is probably less meaningful, because you're probably testing a do-little page, and even if you aren't you're probably testing a best-case way (because asking for exactly the same thing over and over relies on caches more than everyday use does).

The most interesting options:

  • -n amount-of-requests: keep it busy for a while, OR:
  • -t seconds: ...to keep it busy for a given amount of seconds
  • -c concurrency uses a number of paralllel fetchers. Use this to realistically simulate many clients, and see whether they are handled in parallel. (Note that most web servers will limit the amount of concurrent connections to a single IP / client)
  • -k: use keepalive feature, to simulate clients doing various requests on the same connection. Arguably not very realistic for various real-world tests (but can be useful to see the maximum operation rate).


ab -t 10 -c 20 -k http://example.com/

Note that the final slash is significant, as without that you're effectively asking for the redirect page (or, sometimes, a 404). Redirects will not be followed, only retrieved.

Notes on reading results

reading results when concurrency>1
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Note that request time is not a direct indication of rate.

For example, ten sequential requests taking 0.1sec at 100% CPU is the same CPU time as 10 concurrent processes processes taking 1.0sec at 10% CPU each.

Either way, that's 0.1 CPU-seconds for each request, and 10 requests-per-second, but with different amounts of latency, throughput, and scheduling.

For a more concrete example, tests on a simple hello world app (a single single-thread process) with concurrency 1:

Time per request:       2.657 [ms] (mean)
Time per request:       2.657 [ms] (mean, across all concurrent requests)

...and concurrency 4:

Time per request:       10.950 [ms] (mean)
Time per request:       2.738 [ms] (mean, across all concurrent requests)

Effectively, the first shows an estimation of the wallclock time requests take, the second of CPU time under the assumption that the concurrency is the only factor.

This is often not true for dynamic requests. A lot of real-world requests are more IO-bound than CPU-bound, so you'll probably never see linear curves when databases, networking, use caches, and such are involved.

In this case it's an exact multiple of four, which indicates the four handlers each took four times when four were running concurrently - meaning the concurrency works (if they were handled sequentially it would look more like the first case), and it suggests handler is likely entirely CPU-bound.

Keep in mind that:

  • multiple-core processing means that you may very good scaling up to concurrency of 2 or 4 -- assuming the work will be spread among cores
  • ...beyond that you're just dividing CPU time among more concurrent processes, and it does nothing for the average rate of requests
  • A test run from a single client / source IP is almost never a good load test
  • a single IP/client may use and/or be granted only a few connections (web browsers as well as web servers often see 2 or 4), so a single-client test will only test how well a single client is served, and won't stress-test, and won't necessarily be a good indication of expectable request handling rate. (still, many servers have to divide resources above concurrency 2 or 4 anyway, so the difference is not necessarily large)

Ignore 'Failed requests' for dynamic pages

Since ab was written for static pages, it will assume different-sized responses are errors.

For example, a -n 20 might get you:

Complete requests:      20
Failed requests:        19
   (Connect: 0, Length: 19, Exceptions: 0)

This only means that the reported length was different in each response, which on dynamic pages may be entirely expected.


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)



ab/ab2 [3]






See also