Webpage performance notes

From Helpful
Jump to: navigation, search
Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · plugin notes · unsorted

Server stuff:

Dynamic server stuff:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Logical parts (browser stuff, mostly)

reflow and redraw

For reference:

  • reflow, a.k.a. relayout
means something changed the sizes and/or positions, meaning most/all of the page needs to be moved/resized as well
considered bad to do unnecessarily, particularly on initial page load, because as far as humans care things are done when they stop jerking around.
...and easily expensive on more complex pages (where they can easily take over ~100ms)


  • redraw refers to having to draw parts of a web page again
may be a small part, e.g. when changing/animating the color of a box to draw attention
often considered not so bad, in that webdevs often consider them necessary.


The render tree depends on DOM as well as the CSSOM.

It gets updated as things load, and will cause reflow and redraw.

Changes to the CSSOM

may only cause redraw
can cause reflow
particularly things that cascade down to eveything
restyle may refer to CSSOM changes that trigger redraw but not reflow


Some basic suggestions for faster times to a complete render tree, and/or fewer reflows, include:

  • to link to CSS before your JS in your HTML
(the render tree will be complete earlier due to fetches starting earlier)
  • to combine multiple CSS files into one
(the render tree will be complete earlier due to not incurring fetch latency more than once)
  • to uglify CSS (and/or transfer it compressed).
The effect of this is often smaller than the above, unless the CSS is large or the network slow, because this only alters transfer time, not latency, and rarely does much to CSS parse time.


  • minimize image loads
combine all interface images into sprites
large images are often the main thing you're presenting, and are sensibly part of the "thing you're loading"
some secondary images can get late-loaded



Note that there are further causes for reflows:

  • img tags without a pre-set width and hight will cause reflow, because they change size when loaded
so ideally you know and have set the image size
  • scripting that alters the DOM and/or CSSOM may cause reflow
and consider that scripting tends to only start once the DOM is loaded.
(or, if started earlier, necessary finish no earlier than that - if it's doing anything to the DOM)


Note that when a lot of things happen in a very short time, browsers tend to be clever enough to not reflow for all of them individually.

The above is above-the-fold logic. Non-relayouting (re)paints below the fold are freeish, at least from from the perspective of initial load (verify)


http://www.phpied.com/rendering-repaint-reflowrelayout-restyle/



script load ordering

loaded/ready state and events

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

tl;dr:

  • you mainly get a
    • 'HTML parsed' event (readyState==interactive // DOMContentLoaded )
    • 'everything else loaded too' event (window's load event)
  • undeferred JS delays both
  • undeferred images delays the latter
  • the difference between the last two matters mostly to scripting, really


delayed loading

resource loading hints

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


<link rel="dns-prefetch" href="//shop.example.com">
do DNS resolving for a probably-unknown host people may go to next
https://www.w3.org/TR/resource-hints/#dns-prefetch
https://caniuse.com/#feat=link-rel-dns-prefetch


<link rel="preconnect" href="https://cdn.example.com">
Get the DNS, TCP connect, and (if applicable) TLS over with, but don't request
(various browsers guess at this already)
there are CORS details
https://www.w3.org/TR/resource-hints/#preconnect
https://caniuse.com/#feat=link-rel-preconnect


<link rel="preload" href="/eg/other.css" as="style">
says "load as soon as you can and with these priorities", meant for this navigation
these requests do not affect the onload event
...so these are not for your page-central script and style tags (verify)
only really useful when you know things to load before the browser does
true for some things hidden in CSS (think fonts, background images) and some JS
there are CORS details
as= includes style, image, font, document (relates to CSP)
https://www.w3.org/TR/preload/
https://w3c.github.io/preload/
https://caniuse.com/#feat=link-rel-preload
Note there is also a HTTP-header equivalent of this


<link rel="prefetch" href="http://example.com/thing.ext"> 
says "after you're done, maybe load these things because they're likely needed soon"
more useful for the next navigations than for this one
there are CORS details
https://caniuse.com/#search=prefetch


<link rel="prerender" href="https://example.com/page2.html">
analogous to opening a page in a hidden tab (parses everything, starts scripting)
Which makes it a good thing if you are pretty certain this is the next navigation
and otherwise just a drain on all resources
https://www.w3.org/TR/resource-hints/#prerender
https://caniuse.com/#search=prerender
not widely supported



See also:

Mechanical parts

Some related underlying mechanisms

Persistent connections

Persistent connections, a.k.a. keep-alive connections, are a HTTP1.1 feature that lets you keep a connection open and do requests and responses back-to-back.


In contrast, HTTP 1.0 originally allowed one fetch, meaning a separate connection for each script, image, and whatnot. This latency adds up, more so for HTTPS) -- though keepalive was also tacked on to many a HTTP1.0 server (in part because it's much easier to write a fairly minimal 1.0 server with this, than a compliant 1.1 server).

This implies a HTTP1.0 request/client has to ask for it using
Connection: Keep-Alive
. If the server sends back the same in the response, the server is signaling that it both supports and agrees to use keepalive and will leave the connection open. If not, then not.


In HTTP 1.1, all connections are keepalive by default.

It closes only when:

  • the client specifically asks for the connection to be closed
which a client does using the
Connection: close
request header
(and only do when it won't need another in a while)
  • the server sends
    Connection: close
    response header
It may do so
when it cannot know content length ahead of sending the response
if it chooses not to support keepalive (but still be otherwise HTTP 1.1 compliant, which is probably rare)


Notes:

  • At no point is there a guarantee that a connection actually will stay open, or for how long it will stay connected while idle.
Clients should always be prepared to open a new connection for the further requests it needs.
...though the initial load of a page can typically be expected to reuse the same connection(s)
  • both client and server can have their own idea about how soon to close a connection
due to timeout, due to some amount of requests already served
  • As part of browser's connection limit to a server, it makes sense that not all of them be persistent (that would be likelier to lead to occasional blocking, when waiting on slow and stalled responses)


See also:

HTTP1.1 Pipelining

tl;dr: Nice idea, but not useful this time 'round.


Pipelining is an option (on top of a persistent connection) where further requests can be sent without having to wait for the previous response.

This means that instead of

request, wait, readresponse,
request, wait, readresponse,
request, wait, readresponse,

you do e.g.

request request request,
wait,
readresponse readresponse readresponse


I.e. request all images for a page at once. In ideal cases, this means that instead of getting the latency for every request, you only get it once, because the responses are likely to arrive fairly back-to-back.

Best case: the client spends less walltime waiting, and doesn't need multiple connections for multiple documents.

Bad case: one thing holds up everything, because things must be returned in request order, it's no better than just persistent connections.

Average case varies, though is often a little better.

However, there are too many buggy HTTP proxies and buggy HTTP servers, so most browsers have disabled it for HTTP1.1, in hope of better luck with whatever replaces it.


Technically HTTP1.1 requires pipelining support of servers(verify), but clients should not assume it is there.

Pipelining is not negotiated. Clients simply try to pipeline their requests on an persistent connection. If they do, they must be able to detect failure (non-persistent connection, or pipelining not supported), and be prepared to silently fall back to a new connection and not pipeline on that. (There are some further limitations. See e.g. RFC 2616, section 8.1.2.2).

...which means that trying to pipeline on a server that doesn't support it is initially slightly slower than not trying, which is why many browsers won't try(verify).


Notes for webdevs:

  • will only work on persistent connections, which means making those work first
  • The reponses must still arrive in the same order as the requests, so the front of that queue could block the rest. Sometimes it pays to try to control that order

On persistent connection - ends of bodies, Content-Length, and chunking

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

With one-request-per-connection connections, clients can take end-of-stream as meaning end-of-body. With persistent connections and pipelining (HTTP 1.1), that approach won't work - the client must in some way be able to tell responses apart.


As a result, it is e.g. expected of HTTP 1.1 servers to close a connection if it knows the client cannot tell where it ends, so a persistent connection is only possible when the client can know when a response is over.


Mostly it's possible for:

  • responses that never have bodies, e.g.(verify) 1xx, 204, 304


  • bodies with Content-Length header
means the receiving end can read the next that-many bytes and know that was exactly the response
On static content this is fairly trivial (and servers easily do this for you), on dynamic pages you need to pay attention.


  • bodies using Transfer-encoding: chunked (see RFC 2616, section 3.6.1)
means that the reponse body will be sent as a sequence of:
length as a hex string (and optionally a semicolon and some parameters, but none are standardized yet so practice probably doesn't see that yet(verify))
CRLF
that amount of bytes
CRLF
a finished response is marked by a zero-length chunk - which means you don't need the content-length to know where the content ends.

An example stolen from here

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0\r\n
\r\n


Notes:

  • assume that very small chunks are inefficient, because servers may send them in individual calls, rather than combine/buffer them
  • There are no hard constraints on the chunk sizes(verify)
  • When using chunked transfers, a server can tell the client that it is delaying certain headers until the trailer after the chunked body - which is useful to delay things like Content-Length, Content-MD5 (meaning you could send huge amounts of data and calculate both of those while sending).
There are a number of restrictions on delayed headers, see e.g. RFC 2616, section 14.40.
  • All HTTP 1.1 implementations (servers and clients) are required to understand chunked transfer-coding. Servers almost always do, but not all clients do (particularly those that are mostly HTTP1.0 with a few HTTP 1.1 features).
Servers can decide by themselves whether to send a message chunked or not.
  • Clients can make chunked requests, though this is rarely very useful (verify)


See also:

HTTP connection limit

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

HTTP clients are expected to limit the amount of connections they make to a server.

Web servers may also impose such a limit per client, though this is somewhat less common.

This was originally to avoid congestion - particularly on the client side and dial-up, which could only carry a few full-sized IP packets per second anyway, meaning they would see no improvement from more connections, and soon actually degradation.


Which side?

RFC2616 (The HTTP 1.1 standard) mentions:

  • clients should not use more than two (persistent) connections to a server
  • HTTP 1.1 servers should not try to solve temporary overloads by closing connections. This can cause problems in itself, and relying on TCP flow control often works better.
  • a proxy should use up to (2*users) connections to a server or other proxy


Most servers will (by default) not to have hard limits on connections per client. They rely on RFC-observing clients being nice.

Servers can limit the amount of requests they handle at once, and that mainly as a rough measure to avoid swamping memory or IO. (This may also mean relying on the network stack below the serving process. If connections pile in faster than they can be handled, they are placed in a queue, and when that queue is full further connections are rejected)


...but since most browsers follow the above hints, you more often have the other problem: that two connection is a bit limited. With both persistent connections and pipelining means two is plenty, you can't always count on pipelining, and without it a page's various initial resources, as well as async requests, will easily tie up limited connections and make for high latencies for other such requests, and rather slow page loads. (this grew the convention of putting such resources on CDNs or at least separate host(name)s)

Browsers's real-world limits have risen somewhat in recent years, but still keep a limit. It's an ongoing discussion, but some consensus that between lots-of-connections and alternatives, most alternatives are better.


It seems the browser limit is typically per hostname, not per IP address(verify), which means a server side can use some basic DNS / vhost tricks to get a browser to make a few more connections to what is effectively the same server/proxy/load balancer (but beware of making caching much less efficient in the process - you often want to distribute rather than duplicate)


Note that if you use a (non-transparent) HTTP proxy, 'it is the the server - which effectively makes the per-server limit the overall limit. (verify)



http://www.openajax.org/runtime/wiki/The_Two_HTTP_Connection_Limit_Issue#Detailed_write-up http://www.ajaxperformance.com/2006/12/18/circumventing-browser-connection-limits-for-fun-and-profit/




HTTP/2

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

HTTP/2 smooths a few things which in HTTP1.x were workarounds, and tends to lower latency in the process.


While HTTP/2 is a different transport at byte level, it does not alter HTTP1's semantics, meaning that when both sides support it, it's a drop-in.

You can see it as an API where the byte level is now handled for you. (In HTTP/1.0 you could still do it all yourself, and with 1.1 sometimes but proper compliance is hard)


Mostly:

  • Request/response multiplexing
(in HTTP1 an earlier request could hold up a later one (called head-of-line blocking), because it required responses came in the same order as requests)
  • server push
Basically means the server can pre-emptively send responses, to prime a browser's cache before it knows it needs parts
(fallback for non-supporting browsers is that it would just do the request)
footnotes
the server itself has to know what to push -- this is actually more complex than you think


  • Request/response priorities
e.g. send css first, js second, images last
  • compresses HTTP headers
helps when they're not trivial
and primarily applies to request headers, very little on not response headers
(arguably mostly useful for some CDNs)



Some notes:

  • Browsers seem to have chosen to only support the TLS variant(verify)
  • single connection, so can be more sensitive to packet loss (which is essentially head-of-line at TCP level)


https://www.smashingmagazine.com/2017/04/guide-http2-server-push/


HTTP/2 is now supported


QUIC

Details and arguments about page loading

Hosting Elsewhere, static/dynamic split

When part of your content is static (images, and also css, and scripting) there is some value in it being fetched from a server other than the one already busy enough with dynamic pages.

Options include using

  • a server for just static content (There are also some tricks you can use to lessen disk IO and lower response latency in such a static server)
  • using a CDN (same thing, functionally, but management is different)
  • using something like nginx (to handle just these requests) in front


If the browser now connects to more hosts than before, it may now be using more connections (its own per-host connection limit, now with more hosts) and pages can load a little faster. It can pay to split page layout (html,script,js) separately from media(verify).

This does not necessarily make that much latency difference if the browser was already pipelining, but it can't hurt.


In theory, some cookies can also be avoided - if a few KB of cookies apply to each request to a dynamic host, that adds up.


Splitting static and dynamic content may also make your management of Vary and other cache directives simpler (since static content is non-Vary, Cache-Control public).


Caching

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Upsides:

  • Saves bandwidth from the origin server
  • May lead to slightly-to-noticeably faster page loads, depending on how you do it, and on how many external resources need to be loaded.
In many cases, most time until the page is considered loaded is spent on external resources (css, scripts, and images).
For some things you can avoid the request entirely
For larger things a request to check may still be necessary yet you can avoiding some actual transfer
  • Having proxies (transparent or not) closer to the browsers can helps
providing the origin server lets them cache content.
It makes most sense for static stuff.


Limitations:

  • For very small files, most time is spent in the network, not in actual data transfer.
For example, ten 1K files files versus ten 304 responses is not a very noticeable difference in speed. You would not see much improvement until you do unconditional caching, but that may not be practical.
...which is an argument for things sprites (images), combining compilers (js and css) because that reduces the amount of requets
  • dynamic stuff is not cacheable


Some basic things relevant to understanding caching:

  • Private cache mostly means the browser cache. Public cache means caching somewhere between browser and origin server (proxy caches, transparent caches)
  • headers control how private and/or public caches ought to act: whether they may cache it, how long, what to base conditional requests on (if applicable)
allowing proxies to cache data from the end server means the data can come from a server closer than the origin server - which reduces unnecessary transfer on the 'net. (latency may be better, e.g. if it means an ocean more or less)
allowing browsers cache data from the end server means less transfer from the server. Avoiding requests at all also tends to be noticably faster (particularly Expires is a powerful tool)


  • when / how requests are done:
    • requested each time (no caching). Makes sense on some dynamic pages that change a lot and/or have to be as recent as possible.
    • conditional requests - ('if changed from what I have'). If it has changed, it gets the new data in response. If it has not changed the server sends a tiny response that lets the the browser know it can use the copy it has in cache.
      • Largely about saving bandwidth, so is nice primarily for large-ish resources.
      • Has less effect on latency / loading speed, so not as useful when loading many small resources.
      • Web servers usually serve static files in ways to allow conditional caching
      • By date: "Send me content only if it has changed since (specific date), otherwise give me a body-less 304 Not Modified response)"
        • Content is served with a Last-Modified header mentioning a date. The browser will remember that date and ask the server If-Modified-Since with that date.
        • With static file sharing, it's easy enough to use the file's modification time - the server will need no data store, only to stat() the file. (In modern dynamic content sites, you may want to use the Etag system instead.)
    • unconditional caching - no request as long as cache entry is present and not stale
      • mostly the use of Expires: (see the section below)
      • Tells the browser that until some later time, the content will not change at all and the browser need to check cached content (once the date passes the content is considered stale, and other things determine whether it then does a conditional or unconditional request)
      • useful to avoid 'no change' responses when you have many small resources (such as scipts and images)
      • This information cannot be revoked or changed, since the browser won't contact the server until expired (or the cache is cleared), so this is problematic for things that might sometimes change, (particularly when they are related to each other, such as scripts and images from the same page style - some people might see a half changed and possibly even a broken interface. Note it's easy enough to work around: refer to the new content by different URLs)



Further notes:

  • When testing cache behaviour, don't use reload
In many browsers, hitting reload means 'use cache but consider it stale'. If you use Expires, you'll see 304s where you expect no request at all.
If you want to simulate 'user comes back later', hit enter in the address bar (or open the URL in a new window or tab). (Note: Ctrl-L goes to the address bar in many browsers)
  • Proxy caches are fairly unlikely to cache reponses for POST requests. You may want to consider this in your site design.
  • Things still in present in the browser's recent history may' be served from its local cache or memory, even if the cachr logic would suggest it check with the server. Can apply to use of the back button, and to history.
  • Keep in mind there is some variation in UA behaviour. And there were some bugs (e.g. older IE had some known problems).


  • Developer tools may not show things as you expect. Spend a little time to lean to read it.


HTTP 1.0 and HTTP 1.1

Different headers apply to HTTP 1.0 and HTTP 1.1. While web browsers are mostly compliant to 1.1, other UAs may be compiant only to 1.0.

The HTTP 1.0 specs had a few cases with no defined behaviour, which has led to some creative (and hard to predict) behaviour in clients and proxies. In part, HTTP1.1 is simply better defined (e.g. 13.2, Expiration Mechanisms), and it also has some more powerful features.

It seems that developers aren't always clear on what mechanisms should be used, so it helps to read the specs and various summaries out there.


To summarize the mechanisms, HTTP 1.0 has:

  • Last-Modified, and If-Modified-Since: - conditional requests based on time (usually file modification time)
  • Expires: - unconditional caching
  • Pragma: no-cache, forcing the request to go to the origin server, and not from a cache (such as Squid)


HTTP 1.1 has:

  • Last-Modified: and If-Modified-Since: (like HTTP 1.0)
  • Expires: (like 1.0)
  • Cache-Control:, which allows origin server and browser (and caches) to specify rules that apply to public and private caches, including:
    • max-age, in seconds in the future from now. Useful as a relative time measure instead of the absolute-date Expires header - but is considered more of a hint(verify) (unlike Expires)
    • expiration changes
    • how important stale-based re-checks are to serving cached content
    • have a public/private(/no-cache) distinction. Public means it may be cached on transparent caches, private means only the user's browser may cache it, and no-cache (as before) that it must not be cached at all.
  • ETag ('entity tag') system, which lets you do conditional caching based on content identifiers rather than based on time


HTTPS

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

tl;dr:

  • cacheing proxies don't cache HTTPS (unless you MITM it)
  • endpoints can cache just as much
  • make sure your cacheing headers are correct


Intuitively, you might thing that things that are secure should not be cached.


Yes and no.

There used to be a good argument that the content you want to use HTTPS on is going to be personal and therefore dynamic content, and you will be setting headers so that this is only endpoint-cacheable and not proxy-cacheable.


This is less true now that Chrome has pushed everyone to consider HTTPS, because that basically means cacheing proxies don't really work anymore -- because proxying and secure transport are fundamentally at odds.

Unless you specifically MITM them. Which makes sense in business/university where your admins have control over workstation certificates and their proxy settings. And possibly within some other well-controlled structures, e.g. within CDNs.


Note that to browsers, little has changed. Endpoint-cacheable content is just as cacheable.

(HTTPs is a secure transport. The browser delivers the decrypted content to itself, and the transport has no direct bearing on cacheing logic)

Disabling cache

Sometimes you want to make sure that generated content always comes from the server.

To make sure you are disabling the case in the face of HTTP1.1, HTTP1.0 and older HTTP1.0-only proxies, you'll probably want the server to use the following headers:

  • Cache-control: no-cache - a HTTP 1.1 header, which works on current browsers and decent proxy software
  • Pragma: no-cache for HTTP 1.0 browser and HTTP 1.0 proxies - but various things do not honour Pragma, so you usually want:
  • Expires: (date in the past) Giving an Expires value of 0 or an invalid date should be interpreted as immediate expiration. It's slightly safer to just pick a valid date squarely in the past, though.


TODO: check

  • Apparently, Pragma: no-cache is invalid/deprecated in responses, but not yet in requests?


Expires (unconditional caching)

Expires means the server tells the browser that "until [date], you should not check cache-freshness with me at all, not even with a conditional request"

Expires is most useful for anything that will never change and be requested somewhat regularly by the same UA or proxy.

For public-cache content, this also means it can be cached closer to home, so it also relieves the origin server from doing quite as much work.


Not very useful for content that is expected to change every now and then (basically at all), because of the granularity.

(If you want to change the theme images & CSS for a page, you can always have your HTML refer to new names for everything -- if not, you'll have the problem of having frequent visitors see the old theme for a while (or even mixed or broken content)


As RFC2616 notes

  • "All HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception"
and you can take that to mean UTC



It can make sense to mix Expires with other mechanisms. For example, for people that are actively clicking around on your website, even an 'access time plus 3 minutes' Expires will make repeat loads of some resources low-latency (lower than a lot of If-Modified-Since/304, since that's a network interaction for each item). (...though both sides' computer clocks must be set accurately for this to work on a few-minute scale (and not be timezone-ignorant)

Vary

Vary tells proxy caches what part of a request it should consider when determining whether two requests are identical - which it does to decide when the request cannot by a cache and must come from thew origin server.

With dynamic sites, it happens more often that the same URL shows different things for different people - often because of personalized content (/home, /friends), for translated versions of the same content, but also for technical reasons (such as page compression)


Examples:

  • Pages that differ depending on users that are logged in should Vary on cookies as different cookies imply different pages and no one should ever get another user's page.
  • If you serve different languages (using Accept-Languages), you want to vary on those, or caches might give everyone whatever language is currently in the cache (that came to be there via the preferences of the user that visited the page when it wasn't yet in the cache).

Caches may choose to cache the various versions, or not cache them at all, so to get the most out of caching (...proxies), you generally want to vary on as little as possible so that the cache applies exactly as often as it usefully can.


The value of this field is a case-insensitive, comma-separated list.

They are usually names of request-header fields (but the RFC notes that's not a restriction).

The field-names given are not limited to the set of standard request-header fields defined by this specification. Field names are case-insensitive.


Cache-Control

Example:

Cache-Control: max-age=3600, must-revalidate


A HTTP1.1 header. Requests and responses can contain this header, with different meanings (and with different parts).

Note that 'private cache' generally means browser's cache, and 'shared cache' usually means 'proxy cache', and that there is sometimes a difference between a browser's disk cache and memory cache.


Responses (which are the things intercepted by caches, so which contains hits from the origin server) can contain:

  • private - may be stored in a private/browser cache but not in a shared/proxy cache
  • public - may be cached in a shared and in a private cache
  • no-cache - content may be stored, but must be checked for freshness before local content is served (that is, caches must never return stale cached requests even if they are configured to do so). Useful for public content that requires authentication.
  • no-store - may not be stored in even the browser's cache
  • no-transform - tell proxies not to change the data (such as recompressing images for space)
  • must-revalidate - Force strict obeying of your values (without this, HTTP allows agents to take liberties with values such as max-age and Expires when evaluating freshness)
  • proxy-revalidate - Make proxies strict this way, but allow browsers to take more liberties
  • max-age - the maximum time something should be kept in a cache (in seconds)
  • s-maxage - like max-age, but applies only to caches (like an override)
  • ...and possible extensions


Requests can use

  • no-cache
  • no-store
  • max-age
  • max-stale
  • min-fresh
  • no-transform
  • only-if-cached - Apparently used among sibling proxies, to synchronize content without causing origin requests(verify)
  • ...and possible extensions (all values not mentioned here are ignored if not understood - allowing values belonging to specific extensions to be ignorable)


Pragma

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

ETag

Etag ('entity tag') allows a modification check system based not on date/time but on some chosen identifier, one that is meant to be representative of whether the file has changed -- possibly a hash of the contents, but there are often resource-cheaper alternatives.


An Etag-aware client can choose to remember received Etags, and next time ask the server "I have the version you tacked this particular identifier on. Is that the version you would serve to me now, or do you want to give me another one?"

Or rather, you can both do a request if the Etag doesn't match and if it does match.

  • If-None-Match: something - often used for 'give me content only if it's changed since the time you handed me this indentifier
  • If-Match: something


Many web servers now automatically create ETags as part of their static file serving, based on something easy to reproduce; apache2 uses "inode-size-mtime" (see also FileETag), IIS bases it on mtime and an internal config change counter. This makes Etags unique per host at best, which means that if you have more than one server sharing the load of serving the same content, you need to tweak the Etag system.

When doing dynamic content generation, it's fairly easy to write your own Etag system, and frameworks may do most of the work for you. Exactly how you generate the identifier is up to you. Sometimes a content hash makes sense - but if that means IO and CPU on each access, it can make sense to check against a database, perhaps making filenames the hash, possibly memcache it, or some other trick that makes it a simple and fast read-out (preferably without IO in the case of 'no change').


You can combine Etag with byte-range operations. That is, instead of Range, you can use If-Range, which allows uses like "send me parts that I am missing, but if things have changed, send me the whole new version" in a single request.

You could even use Etag for conditional execution, particularly to have rules about things that have side effects (PUT, GET with database access, etc.). Some HTTP-based protocols use it this way(verify).


See also:

browser bugs

See Browser eccentricities#Caching bugs


mod_expires

In apache, you can use mod_expires, which allows you to set a minimum time in cache (or some time since last file change).

You can have settings at server (not advised!), vhost (if you know what you're doing), and directory/htaccess level, and can set it per MIME type - and practically also per extension.


Besides the inherent overkill behaviour of the Expires header, there seem to be a few gotchas:

  • It seems to apply to all content regardless of whether the source was dynamic or not, which is bad on dynamic sites.
  • It does not interact with other cache headers, which is regularly also not what you want.
  • Server-level ExpiresByType overrides more specif (e.g. directory-level, FilesMatch) ExpiresDefault. This is one reason you shouldn't setting things at server level even when you're not using vhosts.


Example:

ExpiresActive On
 
#That's the shorthand form for 'access plus 0 seconds' 
ExpiresDefault A0
#I prefer the longer form used below, as it is more readable. 
 
<Directory /var/www/foo/>
  ExpiresByType text/css     "modification plus 5 minutes" 
  ExpiresByType image/png    "access plus 1 day"
  ExpiresByType image/jpeg   "access plus 1 day"
  ExpiresByType image/gif    "access plus 1 day"
  ExpiresByType image/x-icon "access plus 1 month"
</Directory>
 
<Directory /var/www/foo/static>
  ExpiresByType image/png    "access plus 1 day"
  ExpiresByType image/jpeg   "access plus 1 day"
  ExpiresByType image/gif    "access plus 1 day"
  <FilesMatch "\.(xm|jp2|mp3)$">
    ExpiresDefault "access plus 3 months"   
    # or larger. Browser caches will have likely forgotten it anyway, and chances are so will public caches.
  </FilesMatch>
</Directory> 
 
<Directory /var/www/foo/weeklycolumn>
  ExpiresDefault "modification plus 6 days"
  # This is a *file* based timeout, independent of when it was accessed. 
  # Beyond that specific future time the agent will *always* check, 
  # so this is most useful for data that actually changes regularly
  #  If this were 'access', clients might not check until, 
  # in the worst case, six days after you changed the page.
</Directory>

Notes:

  • To be compatible with servers that don't have the module, always wrap in a module test,
    <IfModule mod_expires.c>
    and
    </IfModule>
    .
  • know the difference between 'access' and 'modification' - it's not a subtle one.
  • Be conservative and don't use Expires as your only caching mechanism. Clients will fall back to If-Modified-Since anyway (and if they don't, that is the mechanism you should be focusing on) so you're basically setting the interval of real checks.
  • Things like styles and scripts should not have long expire times - old styles will apply to previous visitors for a while after you may have changed them completely. . (unelss of course you use new filenames for each))

Manual apache statements

mod_expires is so basic that it can only set Expires, no other cache control headers.

In some cases, you may want to abuse mod_headers, for example:

<FilesMatch "\.(html|htm|php)$">
  Header set Cache-Control "max-age=60, private, proxy-revalidate"
</FilesMatch>
<FilesMatch "\.(jpg|jpeg|png|gif|swf)$">
  Header set Cache-Control "max-age=604800, public"
</FilesMatch>

Note that Cache-Control is a HTTP1.1 header


Request, size and amount

Compression

HTTP compression will often easily reduce HTML, CSS, and javascript to 20-40% of its original size, depending on the method of compression (gzip and deflate/zlib) and the content.


Browser rendering speed gains are negligable unless the data is relatively large or the client is on a low-bandwidth connection, but the reduced bandwidth use is useful, even when only in terms of server bandwidth bills.


Gotchas:

  • IE6 and previous never cache compressed pages (yes, this is a stupid bug). Whenever there is repeat downloading of fairly small files, caching is more important than compressing (to both sides). This basically means that you never want to send compressed content to IE, so if you want to use compression you may want some browser-specific behaviour. Ugh.
  • IE (versions?(verify)) may decide that compressed error pages are too small to be real(verify), and decide to show its own. You may want to avoid compressing these.


Notes:

  • In some implementations gzipping implies that the document can only be delivered as a whole (and not shown incrementally in the browser as it is downloaded). In other implementations, gzipped delivery can happen in chunks.
  • If you code compression yourself, you should check the Accept-Encoding: header for which compression format, if any, the browser will understand in a response. (HTTP1.1 clients technically must support it, but simpler ones may not. In HTTP1.0 it was optional)
  • Compressing small files is often not useful at all; trying to compress 500 or so bytes of output is rarely really worth the CPU time spent on it.


mod_deflate

In apache, mod_deflate is implemented as a transparent output filter and likely to be installed but not enabled.

Check that there is a line like the following in your apache config:

LoadModule deflate_module /usr/lib/apache2/modules/mod_deflate.so


Perhaps the simplest way to use is to apply to few specific mime types (whitelist-style), such as:

AddOutputFilterByType DEFLATE text/plain text/css text/javascript 
AddOutputFilterByType DEFLATE text/html application/xml application/xhtml+xml

You could set these globally if you wish.


The module listens to environment options like no-gzip and dont-vary. This allows 'enable globally, disable for specific things' (blacklist-style) logic:

SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI \.(?:png|jp2|jpe?g|jpeg?|gif)$  no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:t?gz|bz2|zip|rar|7z|sit)$  no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$                          no-gzip dont-vary


Since apache can set environment based on various tests, you can also use this behaviour to disable compression for IE (which you usually want), and probably want to do in global apache config. It seems everyone copy-pastes from the apache documentation:

BrowserMatch ^Mozilla/4         gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSI[E]           !no-gzip !gzip-only-text/html
# The bracketed E there is a fix to a past apache parse bug.
 
# Tells proxies to cache separately for each browser
Header append Vary User-Agent   env=!dont-vary
# This varies everything for user-agent by default unless dont-vary is set,
# which you can set on content you know it won't matter, for example
# when you won't compress it.

Notes:

  • can be set in server, vhost, directory, and .htaccess
  • You can also tweak the compression ratio versus resources tradeoff -
    DeflateCompressionLevel value
    directive.
  • It seems some browsers have problems with compressed external javascript specifically when it is included from the body section of a document, not the head. Something to keep in mind (and (verify) and detail here).
  • You can get apache to log the compression rates, to see how much it's helping. See [1] or [2] for details

mod_gzip

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(section very unfinished)

mod_gzip works in a similar way to mod_deflate

<IfModule mod_gzip.c> 
   mod_gzip_on  Yes      
   #Why?:
   mod_gzip_dechunk yes  
 
   #What to use it on: (example)
   mod_gzip_item_exclude file "\.css$"  
   mod_gzip_item_exclude file "\.js$"
   mod_gzip_item_include file \.htm$
   mod_gzip_item_include file \.html$
   mod_gzip_item_include mime ^text/.*
   mod_gzip_item_exclude file "\.wml$"
 </IfModule>

It has some extra features, such as checking for an already-compressed version (.gz on disk) when doing static file serving, and being more configurable.


(Semi-)Manually

PHP filter

The sections above apply to specific types of static files - well, depending on how they are configured. They can be used to handle PHP's output as well, but you may want to do it in PHP (Support for compression was added around 4.0.4). Doing it in PHP can be a little more work, but it can be smarter about output chunking, and you can do it selectively the way you control.


If zlib is not compiled in, PHP will ignore you silently.


In practice you probably don't want to set it globally, but do it selectively via apache config or .htaccess, often per directory (or even for specific scripts, using Files or FilesMatch). When PHP is compiled in, apache has the directives php_value and php_flag which let you control this:

php_flag zlib.output_compression On
# When you give a size (Note: using php_value, not php_flag),
# you enable it and also set the output buffer size (default is 4KB):
php_value zlib.output_compression 2048
 
# Optional:
php_value zlib.output_compression_level 3
#Default seems to be 6, which is relatively heavy on CPU. 3 is lighter and decent. 
# Even 1 will be noticable improvement on most text.

Notes:

  • The documentation says you can use iniset to enable "zlib.output_compression", but this seems to apply to few PHP versions(verify). It is non-ideal in other ways: You can't seem to iniset the compression_level(verify).

Also, if a higher level setting caused a script to compress, you can disable compression with iniset, but it will still use output buffering - even when you set explicit flushing.

Writing gzip from your own code

Check whether you can:

Supporting browsers will send a header like:

Accept-Encoding: gzip
Accept-Encoding: gzip, deflate

Some old browsers, like netscape version 4, have bugs and effectively lie about what they support - you'll want to test for them and not send them compresed content.


Signal that you are:

When you decide to use one of the advertized methods of compression, tell the browser about it, probably using:

Content-Encoding: gzip

There is also a Transfer-Encoding. The difference is largely semantic; the idea seems to be that Content-Encoding signals the data is meant to be a .gz file, while Transfer-Encoding states it's just about transfer - such as compressing (static or dynamic) HTML to save bandwidth. ((verify) both are well supported)

In practice, Content-Encoding serves both purposes; there little difference other than choices the browser may make based on this -- but things such as 'whether to save or display' are usually controlled by headers like the Content-Type response header.


Do it:

This is mostly just doing what you are saying.

Note that
Content-Length
header should report the size of the compressed data.

Pseudocode (for gzip only):

if request.headers['Accept-Encoding'].contains('gzip'):
    gzip_data = gzip.compress(output_data)
    response.headers["Content-Encoding"] = 'gzip'
    response.headers["Content-Length"]   = length(gzip_data)
    response.write(gzip_buffer)
else:
    #serve headers and data as usual


Chunked output involves telling Transfer-Encoding: chunked (something all HTTP1.1 agents must support), then writing fairly self-contained chunks (but I'm not sure about the details, either without or with compression)


Divide and conquer: offloading cacheing, spreading, balancing, etc.

Nginx notes

See also:

Web server related tools

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Notes on benchmarking:

Requests-per-second rates are devious.

If you test hello world apps they are almost meaningless, because you only measure the speed of connection setup and basic HTTP request parsing, which is unavoidable latency you always get. Yes, you want this to be low, but it usually is. 10000 req/s may look ten times as fast as 1000req/s, but that 0.9ms of time is only significant if you use near-zero time for actual work -- but becomes relatively negligible when your dynamic app does something complex enough to be interesting. For example, simple web apps easily use dozens of milliseconds just talking to a database.

Even when you're making something for throughput, you'll find that most things are IO bound (and, when loaded, cause most things on the server to be), so it is usually much more interesting to relieve IO somehow (e.g. memcache anything you can, add cache headers to have some of the IO be conditional), than it is to look at that last unavoidable millisecond-or-so.


ab, ab2

Apache Benchmark, comes with apache.

Useful to check how concurrency is working, and get an estimate of the request rate In most setups it will test the best-cache rate (because asking for exactly the same thing over and over relies on caches more than everyday use does)

The most interesting options:

  • -n amount-of-requests: keep it busy for a while, OR:
  • -t seconds: ...to keep it busy for a given amount of seconds
  • -c concurrency uses a number of paralllel fetchers. Use this to realistically simulate many clients, and see whether they are handled in parallel. (Note that most web servers will limit the amount of concurrent connections to a single IP / client)
  • -k: use keepalive feature, to simulate clients doing various requests on the same connection. Arguably not very realistic for various real-world tests (but can be useful to see the maximum operation rate).


Example:

ab -t 10 -c 20 -k http://example.com/

Note that the final slash is significant, as without that you're effectively asking for the redirect page (or, sometimes, a 404). Redirects will not be followed, only retrieved.


Notes on reading results

reading results when concurrency>1
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Consider that given constant CPU power, you can expect n parallel requests to be served in n times the time it would take in isolation (dividing so that each runs at 1/n times the full speed). With many-client load you can expect the time taken in each request to rise like this.

For example, a request taking 0.1sec at 100% CPU is the same processing time as four concurrent processes processes taking 0.4sec at 25% CPU each -- that's 0.1 CPU-seconds for each requests. Note that both cases means a rate of 10 requests-per-second.


For a more concrete example, tests on a simple hello world app (a single single-thread process) with concurrency 1:

Time per request:       2.657 [ms] (mean)
Time per request:       2.657 [ms] (mean, across all concurrent requests)

...and concurrency 4:

Time per request:       10.950 [ms] (mean)
Time per request:       2.738 [ms] (mean, across all concurrent requests)

Effectively, the first shows an estimation of the wallclock time requests take, the second of CPU time under the assumption that the concurrency is the only factor (often not true for dynamic requests).

As this case is an exact multiple of four, this just means the four handlers each took four times when four were running concurrently - meaning the concurrency works, and the handler is likely entirely CPU-bound. Expect different results when you use any database stuff, networking, use caches, or such.


Keep in mind that:

  • multiple-core processing means that you may very good scaling up to concurrency of 2 or 4 -- assuming the work will be spread among cores
  • ...and beyond that you're just dividing CPU time among more concurrent processes
  • A test run from a single client / source IP is almost never a good load test
  • a single IP/client may use and/or be granted only a few connections (web browsers as well as web servers often see 2 or 4), so a single-client test will only test how well a single client is served, and won't stress-test, and won't necessarily be a good indication of expectable request handling rate. (still, many servers have to divide resources above concurrency 2 or 4 anyway, so the difference is not necessarily large)


Ignore 'Failed requests' for dynamic pages

Since ab was written for static pages, it will assume different-sized responses are errors.

For example, a -n 20 test might get you:

Complete requests:      20
Failed requests:        19
   (Connect: 0, Length: 19, Exceptions: 0)

This only means that the reported length was different in each response, which on dynamic pages may be entirely expected.

Others

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


http://www.pushtotest.com/

Unsorted

ab/ab2 [3]

httperf[4]

siege[5]

hammerhead[6]

http_load[7]

web_bench[8]

See also