Webpage performance notes

From Helpful
Jump to navigation Jump to search
Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels


This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Lower level underlying mechanisms (client and server)

Protocol level

Persistent connections, a.k.a. keepalive

For context, HTTP 1.0 originally did just one fetch, meaning a separate connection for each script, image, and whatnot, after which the server always closed the connection. The extra time adds up. More so for HTTPS).

Persistent connections, a.k.a. keep-alive connections, are a HTTP1.1 feature that lets you keep a connection open and do requests and responses back-to-back.

So still sequential only, but that saves you the overhead of connection setup for each request. This makes sense when a client will be doing a bunch in quick succession - such as the initial load of most web pages.


Actually, keepalive was also tacked on to many a HTTP1.0 server, but a HTTP1.0 request/client has to ask for it using Connection: Keep-Alive. If the server sends back the same in the response, the server is signaling that it both supports keepalive and agrees to use keepalive and will leave the connection open. If not, then not.



In HTTP 1.1, all connections are keepalive by default.


In terms of specs, it closes only when(verify):

  • the client specifically asks for the connection to be closed
which a client does using the Connection: close request header
(and only do when it won't need another in a while)
  • the server sends Connection: close response header.
It may do so
when it cannot know content length ahead of sending the response
if it chooses not to support keepalive (but still be otherwise HTTP 1.1 compliant, which is probably rare)


At the same time, clients should always be prepared to open a new connection for the further requests it needs.

Because there are a handful of reasons that at no point is there a guarantee that a connection actually will stay open, or for how long it will stay connected while idle. (...though in practice the initial load of a webpage will often be so fast that it typically will be reusing a few connections)

servers typically close keepalive connections when they've idled for a short while. I've found figures for apache and nginx like 5 or 15 seconds, IIS 2 minutes (verify)
possibly longer when listening to Keep-Alive headers(verify)


See also:

HTTP1.1 Pipelining

tl;dr: Nice idea, but not useful this time 'round.


Pipelining is an option (on top of a persistent connection) where further requests can be sent without having to wait for the previous response.

This means that instead of keepalive, which is

request, wait, readresponse,
request, wait, readresponse,
request, wait, readresponse,

you can do e.g.

request request request,
wait,
readresponse readresponse readresponse


For example, you could request all images for a page at once. In ideal cases, this means that instead of getting the latency for every request, you get it roughly once, because you don't have to wait for one response before you can do the next request, and the responses are likely to arrive relatively back-to-back.

Best case: the client spends less walltime waiting, and doesn't need multiple connections for multiple documents.

Bad case: because things must be returned in request order, one resource can hold up all others In which case it would be no better (and potentially worse) than just persistent connections.


Average case varies, though is often at least a little better.

However, there are too many HTTP proxies and HTTP servers that had or have pipelining-related bugs, so most browsers have disabled it for HTTP1.1, in hope of better luck with whatever replaces it.


Technically HTTP1.1 requires pipelining to be implemented in servers(verify), but clients should not assume it is there.

Pipelining is not negotiated. Clients can try to pipeline their requests on an persistent connection. If they do, they must be able to detect failure, and be prepared to silently fall back to a new connection and not pipeline on that. (There are some further limitations. See e.g. RFC 2616, section 8.1.2.2).

...which means that trying to pipeline on a server that doesn't support it is initially slightly slower than not trying, which may be a reason various browsers won't try(verify). (Others are that there have been pipeline issues in various proxies, and a few servers. As such, various browsers support it but have it disabled by default)


Notes for webdevs:

  • will only work on persistent connections, which means making those work first
  • The reponses must still arrive in the same order as the requests, so the front of that queue could block the rest. Sometimes it pays to try to control the order clients do those requests in

On persistent connection - ends of bodies, Content-Length, and chunking

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

With one-request-per-connection connections, clients can take end-of-stream as meaning end-of-body.

With persistent connections and pipelining (HTTP 1.1), that approach won't work - the client must in some way be able to tell responses apart.


As a result, HTTP 1.1 servers will close a connection if it knows the client cannot tell where it ends, so a persistent connection is only possible when the client can know when a response is over.


Persistent connections are possible for, mainly:

  • responses defined to never have bodies, e.g.(verify) 1xx, 204, 304
  • bodies with Content-Length header
means the receiving end can read the next that-many bytes and know that was exactly the response
On static content this is fairly trivial (and servers may do this for you), on dynamic pages you need to pay more attention.
  • bodies using Transfer-encoding: chunked (see RFC 2616, section 3.6.1)
means that the reponse body will be sent as a sequence of:
length as a hex string (and optionally a semicolon and some parameters, but none are standardized yet so practice probably doesn't see that yet(verify))
CRLF
that amount of bytes
CRLF
a finished response is marked by a zero-length chunk - which means you don't need the content-length to know where the content ends.

An example stolen from here

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0\r\n
\r\n


Notes:

  • assume that very small chunks are inefficient, because servers may send them in individual calls, rather than combine/buffer them
  • There are no hard constraints on the chunk sizes(verify)
  • When using chunked transfers, a server can tell the client that it is delaying certain headers until the trailer after the chunked body - which is useful to delay things like Content-Length, Content-MD5 (meaning a server send huge amounts of data and calculate both of those while sending).
There are a number of restrictions on delayed headers, see e.g. RFC 2616, section 14.40.
  • All HTTP 1.1 implementations (servers and clients) are required to understand chunked transfer-coding. Servers almost always do, but not all clients do (particularly those that are mostly HTTP1.0 with a few HTTP 1.1 features).
Servers can decide by themselves whether to send a message chunked or not.
  • Clients can make chunked requests, though this is rarely very useful (verify)


See also:


HTTP connection limit

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

HTTP clients are expected to limit the amount of connections they make to any one server.

This was originally mostly to avoid congestion on the client side, because dial-up could only carry a few full-sized IP packets per second anyway, meaning they would see no improvement from more connections, and easily see degradation.


Many servers cannot impose such a limit per client, and effectively rely on RFC-observing clients being nice, or rely on QoS style things at lower level(verify). Some can but may not do so by default(verify).

Servers will often limit the amount of requests they handle at once overall (amount of workers), mainly to limit resource use (memory, CPU), but this is effectively an overall measure.


Types and amounts

RFC2616 (The HTTP 1.1 standard) mentions:

  • clients should not use more than two (persistent) connections to a server
  • HTTP 1.1 servers should not try to solve temporary overloads by closing connections. This can cause problems in itself, and relying on TCP flow control often works better.
  • a proxy should use up to (2*users) connections to a server or other proxy


In practice, the numbers these numbers were increased somewhat in the twenty years since (more resources per page, more async requests), but not much (values seem to be 4 or 6), in part because persistent connections have more effect than this.

It's an ongoing discussion, but there is generally consensus that between using lots of connections, and smarter alternatives, most alternatives are better.


When browsers follow the above hints, the congestion is often the client waiting on itself (for more connections to open up).

Note that a server responding more slowly (due to being loaded) will effectively slows the client request rate a little, a sort of backpressure effect, a side effect which arguably works out as a feature.


Browsers may also have a global connection limit(verify)


Further notes

Note that other parts of the network stack also plays in this. If incoming TCP connections pile in faster than they can be accepted by a process, they are placed in an OS queue, and when that queue is full, further connections are rejected at TCP level.


It seems the browser limit is typically per hostname, not per IP address(verify), which means a server side can use some basic DNS and/or vhost tricks to get a browser to make a few more connections to what is effectively the same server/proxy/load balancer (but beware of making caching much less efficient in the process - you often want to distribute rather than duplicate)


Note that if you use a (non-transparent) HTTP proxy, 'it is the the server - which effectively makes the per-server limit the overall limit. (verify)


http://www.openajax.org/runtime/wiki/The_Two_HTTP_Connection_Limit_Issue#Detailed_write-up http://www.ajaxperformance.com/2006/12/18/circumventing-browser-connection-limits-for-fun-and-profit/

HTTP/2, QUIC, and HTTP/3

the tl;dr is that

  • HTTP/2 and HTTP/3 aim to have exactly the same semantics as HTTP/1.x, so be drop-ins, while being a different, better transport at byte level
  • HTTP/2
adopts some extensions and workarounds typically seen in HTTP/1.1, and makes them standard
adds request/response multiplexing - basically a better variant of pipelining (e.g. no head-of-line blocking - except under packet loss)
adds server push
For more details, see HTTP notes#HTTP/2
  • QUIC is an always-encrypted transport, that is a general purpose (though HTTP/2-like)
implemented over UDP, though acting more like TCP
for more details, see HTTP notes#QUIC
  • HTTP/3
roughly "HTTP/2 over QUIC"
seems to improves some edge cases over HTTP/2 (like that in HTTP/2, head-of-line blocking could still happen under packet loss)
for more details, see HTTP notes#HTTP/3


Caching

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Upsides:

  • Saves bandwidth from the origin server
  • May lead to slightly-to-noticeably faster page loads, depending on how you do it, and on how many external resources need to be loaded.
In many cases, most time until the page is considered loaded is spent on external resources (css, scripts, and images).
For some things you can avoid the request entirely
For larger things a request to check may still be necessary yet you can avoiding some actual transfer
  • Having proxies (transparent or not) closer to the browsers can helps
providing the origin server lets them cache content.
It makes most sense for static stuff.


Arguables:

  • It can make sense to split things into specifically cacheable units
but if they're very small, most time is spent in the network, not in actual data transfer
consider e.g. ten 1KB icons, versus ten 304 responses meaning you can use the cache - it won't be a noticeable difference in speed (the sprite trick would do more)
you would see more improvement if you do unconditional caching, but that may not be practical.
bundling can work for or against you
if it's all static, and most content will be touched frequently enough, then sprites for (interface) images, combining compilers for js and css make sense
yet if content is more dynamic, bundling would invalidate the content and approach much more quickly
  • dynamic stuff is often barely cacheable


Some basic things relevant to understanding caching:

  • Private cache mostly means the browser cache. Public cache means caching somewhere between browser and origin server (proxy caches, transparent caches)
  • headers control how private and/or public caches ought to act: whether they may cache it, how long, what to base conditional requests on (if applicable)
allowing proxies to cache data from the end server means the data can come from a server closer than the origin server - which reduces unnecessary transfer on the 'net. (latency may be better, e.g. if it means an ocean more or less)
allowing browsers cache data from the end server means less transfer from the server. Avoiding requests at all also tends to be noticably faster (particularly Expires is a powerful tool)


  • when / how requests are done:
    • requested each time (no caching). Makes sense on some dynamic pages that change a lot and/or have to be as recent as possible.
    • conditional requests - ('if changed from what I have'). If it has changed, it gets the new data in response. If it has not changed the server sends a tiny response that lets the the browser know it can use the copy it has in cache.
      • Largely about saving bandwidth, so is nice primarily for large-ish resources.
      • Has less effect on latency / loading speed, so not as useful when loading many small resources.
      • Web servers usually serve static files in ways to allow conditional caching
      • By date: "Send me content only if it has changed since (specific date), otherwise give me a body-less 304 Not Modified response)"
        • Content is served with a Last-Modified header mentioning a date. The browser will remember that date and ask the server If-Modified-Since with that date.
        • With static file sharing, it's easy enough to use the file's modification time - the server will need no data store, only to stat() the file. (In modern dynamic content sites, you may want to use the Etag system instead.)
    • unconditional caching - no request as long as cache entry is present and not stale
      • mostly the use of Expires: (see the section below)
      • Tells the browser that until some later time, the content will not change at all and the browser need to check cached content (once the date passes the content is considered stale, and other things determine whether it then does a conditional or unconditional request)
      • useful to avoid 'no change' responses when you have many small resources (such as scipts and images)
      • This information cannot be revoked or changed, since the browser won't contact the server until expired (or the cache is cleared), so this is problematic for things that might sometimes change, (particularly when they are related to each other, such as scripts and images from the same page style - some people might see a half changed and possibly even a broken interface. Note it's easy enough to work around: refer to the new content by different URLs)



Further notes:

  • When testing cache behaviour, don't use reload
In many browsers, hitting reload means 'use cache but consider it stale'. If you use Expires, you'll see 304s where you expect no request at all.
If you want to simulate 'user comes back later', hit enter in the address bar (or open the URL in a new window or tab). (Note: Ctrl-L goes to the address bar in many browsers)
  • Proxy caches are fairly unlikely to cache reponses for POST requests. You may want to consider this in your site design.
  • Things still in present in the browser's recent history may' be served from its local cache or memory, even if the cachr logic would suggest it check with the server. Can apply to use of the back button, and to history.
  • Keep in mind there is some variation in UA behaviour. And there were some bugs (e.g. older IE had some known problems).


  • Developer tools may not show things as you expect. Spend a little time to lean to read it.


HTTP 1.0 and HTTP 1.1

Different headers apply to HTTP 1.0 and HTTP 1.1. While web browsers are mostly compliant to 1.1, other UAs may be compiant only to 1.0.

The HTTP 1.0 specs had a few cases with no defined behaviour, which has led to some creative (and hard to predict) behaviour in clients and proxies. In part, HTTP1.1 is simply better defined (e.g. 13.2, Expiration Mechanisms), and it also has some more powerful features.

It seems that developers aren't always clear on what mechanisms should be used, so it helps to read the specs and various summaries out there.


To summarize the mechanisms, HTTP 1.0 has:

  • Last-Modified, and If-Modified-Since: - conditional requests based on time (usually file modification time)
  • Expires: - unconditional caching
  • Pragma: no-cache, forcing the request to go to the origin server, and not from a cache (such as Squid)


HTTP 1.1 has:

  • Last-Modified: and If-Modified-Since: (like HTTP 1.0)
  • Expires: (like 1.0)
  • Cache-Control:, which allows origin server and browser (and caches) to specify rules that apply to public and private caches, including:
    • max-age, in seconds in the future from now. Useful as a relative time measure instead of the absolute-date Expires header - but is considered more of a hint(verify) (unlike Expires)
    • expiration changes
    • how important stale-based re-checks are to serving cached content
    • have a public/private(/no-cache) distinction. Public means it may be cached on transparent caches, private means only the user's browser may cache it, and no-cache (as before) that it must not be cached at all.
  • ETag ('entity tag') system, which lets you do conditional caching based on content identifiers rather than based on time


HTTPS

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

tl;dr:

  • cacheing proxies don't cache HTTPS (unless you MITM it)
  • endpoints can cache just as much
  • make sure your cacheing headers are correct


Intuitively, you might thing that things that are secure should not be cached.


Yes and no.

There used to be a good argument that the content you want to use HTTPS on is going to be personal and therefore dynamic content, and you will be setting headers so that this is only endpoint-cacheable and not proxy-cacheable.


This is less true now that Chrome has pushed everyone to consider HTTPS, because that basically means cacheing proxies don't really work anymore -- because proxying and secure transport are fundamentally at odds.

Unless you specifically MITM them. Which makes sense in business/university where your admins have control over workstation certificates and their proxy settings. And possibly within some other well-controlled structures, e.g. within CDNs.


Note that to browsers, little has changed. Endpoint-cacheable content is just as cacheable.

(HTTPs is a secure transport. The browser delivers the decrypted content to itself, and the transport has no direct bearing on cacheing logic)


Disabling cache

Sometimes you want to make sure that generated content always comes from the server.

To make sure you are disabling the case in the face of HTTP1.1, HTTP1.0 and older HTTP1.0-only proxies, you'll probably want the server to use the following headers:

  • Cache-control: no-cache - a HTTP 1.1 header, which works on current browsers and decent proxy software
  • Pragma: no-cache for HTTP 1.0 browser and HTTP 1.0 proxies - but various things do not honour Pragma, so you usually want:
  • Expires: (date in the past) Giving an Expires value of 0 or an invalid date should be interpreted as immediate expiration. It's slightly safer to just pick a valid date squarely in the past, though.


TODO: check

  • Apparently, Pragma: no-cache is invalid/deprecated in responses, but not yet in requests?


Expires (unconditional caching)

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Expires means the server tells the browser that "until [date], you should not check cache-freshness with me at all, not even with a conditional request"

Expires is most useful for anything that will never change and be requested somewhat regularly by the same UA or proxy.

For public-cache content, this also means it can be cached closer to home, so it also relieves the origin server from doing quite as much work.


Not very useful for content that is expected to change every now and then (basically at all), because of the granularity.

(If you want to change the theme images & CSS for a page, you can always have your HTML refer to new names for everything -- if not, you'll have the problem of having frequent visitors see the old theme for a while (or even mixed or broken content)


As RFC2616 notes

  • "All HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception"
and you can take that to mean UTC
  • HTTP has historically allowed RFC 822 / 1123, RFC 850 / 1036, and ANSI C asctime (see also Computer_dates_and_times#RFC_822.2F1233)
    • the first is preferable
      • in strftime terms: "%a, %d %b %Y %H:%M:%S GMT"



It can make sense to mix Expires with other mechanisms. For example, for people that are actively clicking around on your website, even an 'access time plus 3 minutes' Expires will make repeat loads of some resources low-latency (lower than a lot of If-Modified-Since/304, since that's a network interaction for each item). (...though both sides' computer clocks must be set accurately for this to work on a few-minute scale (and not be timezone-ignorant)


However, as https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expires notes, the presence of Cache-Control may make UAs ignore Expires

Vary

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Vary tells proxy caches what part of a request it should consider when determining whether two requests are identical - which it does to decide when the request cannot by a cache and must come from thew origin server.


Reason to do so include

  • With dynamic sites, it happens more often that the same URL shows different things for different people - often because of personalized content (/home, /friends)
could e.g. vary on cookies - different cookies imply different users
  • for translated versions of the same content at the same URL
(meaning you're using Accept-Languages -- which a lot of sites don't)
  • sometimes technical reasons (such as page compression)


Caches may choose to cache each version individually, or not cache them at all, so to get the most out of caching (...proxies) you generally want to vary on as little as possible so that the cache applies exactly as often as it usefully can.


The value of this field is a case-insensitive, comma-separated list.

They are usually names of request-header fields (but the RFC notes that's not a restriction).

The field-names given are not limited to the set of standard request-header fields defined by this specification. Field names are case-insensitive.


Some of the more commonly used things (some usually in combinations):

  • User-Agent
    • has been used to sort out agents that support compression from those that don't
    • now particularly for mobile content
    • note that there are *many* specific user agent strings, and this will store one for each. It won't necessarily save much bandwidth to have dozens of variations of each page
  • Accept-Encoding - e.g. used by mod_deflate, to help ensure that deflacted content won't be sent to agents that don't understand it
  • Cookie
    • on personal pages presented under the same URL as others, like /home)
    • perhaps for pages that use cookies to significantly change its presentation
  • accept-language - use if you actually serve different pages for different languages
  • accept-charset - regularly combined with accept-language


Notes:

  • It seems that modern browsers will generally not cache anything that has Vary set to anything other than User-Agent: (verify)

Cache-Control

Example:

Cache-Control: max-age=3600, must-revalidate


A HTTP1.1 header.

Requests and responses can contain this header, with different meanings (and with different parts).

Note that

'private cache' generally means browser's cache, and
'shared cache' usually means 'proxy cache',

and that there is sometimes a behavioural difference between a browser's disk cache and memory cache.


Responses (which are the things intercepted by caches, so which contains hits from the origin server) can contain:

  • private - may be stored in a private/browser cache but not in a shared/proxy cache
  • public - may be cached in a shared and in a private cache
  • no-cache - content may be stored, but must be checked for freshness before local content is served (that is, caches must never return stale cached requests even if they are configured to do so). Useful for public content that requires authentication.
  • no-store - may not be stored in even the browser's cache
  • no-transform - tell proxies not to change the data (such as recompressing images for space)
  • must-revalidate - Force strict obeying of your values (without this, HTTP allows agents to take liberties with values such as max-age and Expires when evaluating freshness)
  • proxy-revalidate - Make proxies strict this way, but allow browsers to take more liberties
  • max-age - the maximum time something should be kept in a cache (in seconds)
  • s-maxage - like max-age, but applies only to caches (like an override)
  • ...and possible extensions


Requests can use

  • no-cache
  • no-store
  • max-age
  • max-stale
  • min-fresh
  • no-transform
  • only-if-cached - Apparently used among sibling proxies, to synchronize content without causing origin requests(verify)
  • ...and possible extensions (all values not mentioned here are ignored if not understood - allowing values belonging to specific extensions to be ignorable)


Pragma

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


ETag

Etag ('entity tag') allows a modification check system based not on date/time but on some chosen identifier, one that is meant to be representative of whether the file has changed -- possibly a hash of the contents, but there are often resource-cheaper alternatives.


An Etag-aware client can choose to remember received Etags, and next time ask the server "I have the version you tacked this particular identifier on. Is that the version you would serve to me now, or do you want to give me another one?"

Or rather, you can both do a request if the Etag doesn't match and if it does match.

  • If-None-Match: something - often used for 'give me content only if it's changed since the time you handed me this indentifier
  • If-Match: something


Many web servers now automatically create ETags as part of their static file serving, based on something easy to reproduce; apache2 uses "inode-size-mtime" (see also FileETag), IIS bases it on mtime and an internal config change counter. This makes Etags unique per host at best, which means that if you have more than one server sharing the load of serving the same content, you need to tweak the Etag system.

When doing dynamic content generation, it's fairly easy to write your own Etag system, and frameworks may do most of the work for you. Exactly how you generate the identifier is up to you. Sometimes a content hash makes sense - but if that means IO and CPU on each access, it can make sense to check against a database, perhaps making filenames the hash, possibly memcache it, or some other trick that makes it a simple and fast read-out (preferably without IO in the case of 'no change').


You can combine Etag with byte-range operations. That is, instead of Range, you can use If-Range, which allows uses like "send me parts that I am missing, but if things have changed, send me the whole new version" in a single request.

You could even use Etag for conditional execution, particularly to have rules about things that have side effects (PUT, GET with database access, etc.). Some HTTP-based protocols use it this way(verify).


See also:


browser bugs

See Browser eccentricities#Caching bugs


Cacheing in apache

mod_expires

In apache, you can use mod_expires, which allows you to set a minimum time in cache (or some time since last file change).

You can have settings at server (not advised!), vhost (if you know what you're doing), and directory/htaccess level, and can set it per MIME type - and practically also per extension.


Besides the inherent overkill behaviour of the Expires header, there seem to be a few gotchas:

  • It seems to apply to all content regardless of whether the source was dynamic or not, which is bad on dynamic sites.
  • It does not interact with other cache headers, which is regularly also not what you want.
  • Server-level ExpiresByType overrides more specif (e.g. directory-level, FilesMatch) ExpiresDefault. This is one reason you shouldn't setting things at server level even when you're not using vhosts.


Example:

ExpiresActive On

#That's the shorthand form for 'access plus 0 seconds' 
ExpiresDefault A0
#I prefer the longer form used below, as it is more readable. 

<Directory /var/www/foo/>
  ExpiresByType text/css     "modification plus 5 minutes" 
  ExpiresByType image/png    "access plus 1 day"
  ExpiresByType image/jpeg   "access plus 1 day"
  ExpiresByType image/gif    "access plus 1 day"
  ExpiresByType image/x-icon "access plus 1 month"
</Directory>

<Directory /var/www/foo/static>
  ExpiresByType image/png    "access plus 1 day"
  ExpiresByType image/jpeg   "access plus 1 day"
  ExpiresByType image/gif    "access plus 1 day"
  <FilesMatch "\.(xm|jp2|mp3)$">
    ExpiresDefault "access plus 3 months"   
    # or larger. Browser caches will have likely forgotten it anyway, and chances are so will public caches.
  </FilesMatch>
</Directory> 

<Directory /var/www/foo/weeklycolumn>
  ExpiresDefault "modification plus 6 days"
  # This is a *file* based timeout, independent of when it was accessed. 
  # Beyond that specific future time the agent will *always* check, 
  # so this is most useful for data that actually changes regularly
  #  If this were 'access', clients might not check until, 
  # in the worst case, six days after you changed the page.
</Directory>

Notes:

  • To be compatible with servers that don't have the module, always wrap in a module test, <IfModule mod_expires.c> and </IfModule>.
  • know the difference between 'access' and 'modification' - it's not a subtle one.
  • Be conservative and don't use Expires as your only caching mechanism. Clients will fall back to If-Modified-Since anyway (and if they don't, that is the mechanism you should be focusing on) so you're basically setting the interval of real checks.
  • Things like styles and scripts should not have long expire times - old styles will apply to previous visitors for a while after you may have changed them completely. . (unelss of course you use new filenames for each))


Manual apache statements

mod_expires is so basic that it can only set Expires, no other cache control headers.

In some cases, you may want to abuse mod_headers, for example:

<FilesMatch "\.(html|htm|php)$">
  Header set Cache-Control "max-age=60, private, proxy-revalidate"
</FilesMatch>
<FilesMatch "\.(jpg|jpeg|png|gif|swf)$">
  Header set Cache-Control "max-age=604800, public"
</FilesMatch>

Note that Cache-Control is a HTTP1.1 header


Compression

HTTP compression will often easily reduce HTML, CSS, and javascript to 20-40% of its original size, depending on the method of compression (gzip and deflate/zlib) and the content.


Browser rendering speed gains are negligable unless the data is relatively large or the client is on a low-bandwidth connection, but the reduced bandwidth use is useful, even when only in terms of server bandwidth bills.


Gotchas:

  • IE6 and previous never cache compressed pages (yes, this is a stupid bug). Whenever there is repeat downloading of fairly small files, caching is more important than compressing (to both sides). This basically means that you never want to send compressed content to IE, so if you want to use compression you may want some browser-specific behaviour. Ugh.
  • IE (versions?(verify)) may decide that compressed error pages are too small to be real(verify), and decide to show its own. You may want to avoid compressing these.


Notes:

  • In some implementations gzipping implies that the document can only be delivered as a whole (and not shown incrementally in the browser as it is downloaded). In other implementations, gzipped delivery can happen in chunks.
  • If you code compression yourself, you should check the Accept-Encoding: header for which compression format, if any, the browser will understand in a response. (HTTP1.1 clients technically must support it, but simpler ones may not. In HTTP1.0 it was optional)
  • Compressing small files is often not useful at all; trying to compress 500 or so bytes of output is rarely really worth the CPU time spent on it.


Compression in apache

mod_deflate

In apache, mod_deflate is implemented as a transparent output filter and likely to be installed but not enabled.

Check that there is a line like the following in your apache config:

 LoadModule deflate_module /usr/lib/apache2/modules/mod_deflate.so


Perhaps the simplest way to use is to apply to few specific mime types (whitelist-style), such as:

AddOutputFilterByType DEFLATE text/plain text/css text/javascript 
AddOutputFilterByType DEFLATE text/html application/xml application/xhtml+xml

You could set these globally if you wish.


The module listens to environment options like no-gzip and dont-vary. This allows 'enable globally, disable for specific things' (blacklist-style) logic:

SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI \.(?:png|jp2|jpe?g|jpeg?|gif)$  no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:t?gz|bz2|zip|rar|7z|sit)$  no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$                          no-gzip dont-vary


Since apache can set environment based on various tests, you can also use this behaviour to disable compression for IE (which you usually want), and probably want to do in global apache config. It seems everyone copy-pastes from the apache documentation:

BrowserMatch ^Mozilla/4         gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSI[E]           !no-gzip !gzip-only-text/html
# The bracketed E there is a fix to a past apache parse bug.
 
# Tells proxies to cache separately for each browser
Header append Vary User-Agent   env=!dont-vary
# This varies everything for user-agent by default unless dont-vary is set,
# which you can set on content you know it won't matter, for example
# when you won't compress it.

Notes:

  • can be set in server, vhost, directory, and .htaccess
  • You can also tweak the compression ratio versus resources tradeoff - DeflateCompressionLevel value directive.
  • It seems some browsers have problems with compressed external javascript specifically when it is included from the body section of a document, not the head. Something to keep in mind (and (verify) and detail here).
  • You can get apache to log the compression rates, to see how much it's helping. See [1] or [2] for details
mod_gzip
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

(section very unfinished)

mod_gzip works in a similar way to mod_deflate

 <IfModule mod_gzip.c> 
   mod_gzip_on  Yes      
   #Why?:
   mod_gzip_dechunk yes  

   #What to use it on: (example)
   mod_gzip_item_exclude file "\.css$"  
   mod_gzip_item_exclude file "\.js$"
   mod_gzip_item_include file \.htm$
   mod_gzip_item_include file \.html$
   mod_gzip_item_include mime ^text/.*
   mod_gzip_item_exclude file "\.wml$"
 </IfModule>

It has some extra features, such as checking for an already-compressed version (.gz on disk) when doing static file serving, and being more configurable.


(Semi-)Manually

PHP filter

The sections above apply to specific types of static files - well, depending on how they are configured. They can be used to handle PHP's output as well, but you may want to do it in PHP (Support for compression was added around 4.0.4). Doing it in PHP can be a little more work, but it can be smarter about output chunking, and you can do it selectively the way you control.


If zlib is not compiled in, PHP will ignore you silently.


In practice you probably don't want to set it globally, but do it selectively via apache config or .htaccess, often per directory (or even for specific scripts, using Files or FilesMatch). When PHP is compiled in, apache has the directives php_value and php_flag which let you control this:

php_flag zlib.output_compression On
# When you give a size (Note: using php_value, not php_flag),
# you enable it and also set the output buffer size (default is 4KB):
php_value zlib.output_compression 2048

# Optional:
php_value zlib.output_compression_level 3
#Default seems to be 6, which is relatively heavy on CPU. 3 is lighter and decent. 
# Even 1 will be noticable improvement on most text.

Notes:

  • The documentation says you can use iniset to enable "zlib.output_compression", but this seems to apply to few PHP versions(verify). It is non-ideal in other ways: You can't seem to iniset the compression_level(verify).

Also, if a higher level setting caused a script to compress, you can disable compression with iniset, but it will still use output buffering - even when you set explicit flushing.


Writing gzip from your own code

Check whether you can:

Supporting browsers will send a header like:

Accept-Encoding: gzip
Accept-Encoding: gzip, deflate

Some old browsers, like netscape version 4, have bugs and effectively lie about what they support - you'll want to test for them and not send them compresed content.


Signal that you are:

When you decide to use one of the advertized methods of compression, tell the browser about it, probably using:

Content-Encoding: gzip

There is also a Transfer-Encoding. The difference is largely semantic; the idea seems to be that Content-Encoding signals the data is meant to be a .gz file, while Transfer-Encoding states it's just about transfer - such as compressing (static or dynamic) HTML to save bandwidth. ((verify) both are well supported)

In practice, Content-Encoding serves both purposes; there little difference other than choices the browser may make based on this -- but things such as 'whether to save or display' are usually controlled by headers like the Content-Type response header.


Do it:

This is mostly just doing what you are saying. Note that Content-Length header should report the size of the compressed data.

Pseudocode (for gzip only):

if request.headers['Accept-Encoding'].contains('gzip'):
    gzip_data = gzip.compress(output_data)
    response.headers["Content-Encoding"] = 'gzip'
    response.headers["Content-Length"]   = length(gzip_data)
    response.write(gzip_buffer)
else:
    #serve headers and data as usual


Chunked output involves telling Transfer-Encoding: chunked (something all HTTP1.1 agents must support), then writing fairly self-contained chunks (but I'm not sure about the details, either without or with compression)



Client bandwidth

Server side

Divide and conquer: offloading cacheing, spreading, balancing, etc.

Host connection limits

Hosting elsewhere (CDNs mainly), static/dynamic split

When part of your content is static (images, and also css, and scripting) there is some value in it being fetched from a server other than the one already busy enough with dynamic pages.

Options include using

  • a server for just static content (There are also some tricks you can use to lessen disk IO and lower response latency in such a static server)
  • using a CDN (same thing, functionally, but management is different)
  • using something like nginx (to handle just these requests) in front


If the browser now connects to more hosts than before, it may now be using more connections (its own per-host connection limit, now with more hosts) and pages can load a little faster. It can pay to split page layout (html,script,js) separately from media(verify).

This does not necessarily make that much latency difference if the browser was already pipelining, but it can't hurt.


In theory, some cookies can also be avoided - if a few KB of cookies apply to each request to a dynamic host, that adds up.


Splitting static and dynamic content may also make your management of Vary and other cache directives simpler (since static content is non-Vary, Cache-Control public).




Nginx notes

See also:

Higher level considerations

Speed, perceived and real

reflow and redraw

The render tree basically refers to the combination of the DOM and CSSOM - which are basically the parsed in-memory form of HTML and CSS.


But more practically, we primarily care more about:

  • browser reflow/relayout
means something changed size and/or position, which due to the hierarchical relations tends to mean many/most/all of the page needs to be moved/resized (and then redrawn) as well
considered bad to do unnecessarily, particularly if it's a number of times on initial page load, because as far as humans care things are done when they stop jerking around.
can be expensive on pages with complex structure (where it might easily take over ~100ms)
also usability-wise - consider a page spends ten seconds inserting ads in the text you're trying to read, then maybe doing some modal popovers
  • browser redraw/repaint refers to having to draw parts of a web page again
more often a small part, due to changing the styling on just that part, or animating something to draw attention
this is often much lighter operation than reflow (and its implied larger redraw)
  • though used less, restyle may refer to CSSOM changes that trigger redraw but not relayout



You can aim to minimize relayouts in a few ways, like considering that

In general,

  • changes to the DOM are likely to cause relayout and redraw
  • Changes to the CSSOM can be designed to only cause redraw
can cause relayout, particularly layout properties, more so when things cascade down



During page loads, there is limited point in starting to draw things until the render tree is loaded, in that both added DOM content, and changes due to CSS loading, is likely to cause not only redraw but also relayout.

As such,

  • combining multiple CSS files into gets the CSSOM done earlier
  • starting CSS loads earlier (e.g. before JS loads) may get the CSSOM done ealier



These details are actually more relevant these days than in the early web, because we do a lot more stuff client-side, do late loading. We have created an extra phase of loading, really.

While some of that may get something on screen faster -- which is easily perceived as loading faster -- it's an odd tradeoff in that focusing too hard on first contentful paint will more easily lead to setups with more relayouting.

And you can destroy a slight perception of faster loading if that page then spends the next five seconds moving all the shit you're trying to read or click around a couple of times.


So a more perception-based thing is when things stop changing.

  • combine all interface images into a sprites
so that that's one image load, not many, and doesn't block other resource loads
  • delay large image loads
some secondary images can get late-loaded, e.g. via scripting
though large images are often the main thing you're presenting, and are sensibly part of the "thing you're loading"
  • have pre-set widths and heights on all images, so that it won't cause a relayout once it loads
ideally on the tag


  • scripting that alters the DOM and/or CSSOM may cause reflow
and consider that most tends to only start once the DOM is loaded.
(or, if started earlier, necessary finish no earlier than that - if it's doing anything to the DOM)


Other notes:

  • when a lot of things happen in a very short time, browsers are clever enough to not reflow for all of them individually.
  • The above is an above-the-fold view. Non-relayouting (re)paints below the fold are freeish from from the perspective of initial load (verify)


JS load ordering

See Javascript_notes_-_syntax_and_behaviour#load_ordering

resource loading hints

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


<link rel="dns-prefetch" href="//img.example.com">
do DNS resolving for a probably-unknown host people may go to next
https://www.w3.org/TR/resource-hints/#dns-prefetch
https://caniuse.com/#feat=link-rel-dns-prefetch


<link rel="preconnect" href="https://cdn.example.com">
Get the DNS, TCP connect, and (if applicable) TLS over with, but don't request
(various browsers guess at this already)
Note there is also a HTTP-header equivalent of this(verify)
there are CORS details
https://www.w3.org/TR/resource-hints/#preconnect
https://caniuse.com/#feat=link-rel-preconnect


<link rel="preload" href="/eg/other.css" as="style">
says "load as soon as you can and with these priorities"
these requests do not affect the onload event, so these are not for your page-central script and style tags (verify)
meant for this naviation
only really useful when you know things to load before the browser does
true for some things hidden in CSS (think fonts, background images) and some JS
there are CORS details
as= includes style, image, font, document (relates to CSP)
https://www.w3.org/TR/preload/
https://w3c.github.io/preload/
https://caniuse.com/#feat=link-rel-preload
Note there is also a HTTP-header equivalent of this


<link rel="modulepreload" href="http://example.com/thing.js"> 
https://caniuse.com/?search=modulepreload
https://html.spec.whatwg.org/multipage/links.html#link-type-modulepreload


<link rel="prefetch" href="http://example.com/thing.ext"> 
says "after you're done, maybe load these things because they're likely needed soon"
more useful for the next navigations than for this one
there are CORS details
https://caniuse.com/#search=prefetch


<link rel="prerender" href="https://example.com/page2.html">
analogous to opening a page in a hidden tab (parses everything, starts scripting)
Which makes it a good thing if you are pretty certain this is the next navigation
and otherwise just a drain on all resources
https://www.w3.org/TR/resource-hints/#prerender
https://caniuse.com/#search=prerender
not widely supported, and chrome seems to actually treat it as a prefetch



See also:

Glossary, further tricks

Requests

Inline content or external requests

Request, size and amount

bundling

Client side

Tools

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Load testing

Notes on benchmarking:

If you are testing a hello-world apps, request-per-second rate are almost meaningless.

Because you are only measuring connection setup time. This is unavoidable latency you always get. Yes, you want this to be low, but it usually is.

10000 req/s may look ten times as fast as 1000req/s, but that difference is also just 0.9ms.

If most of your requests are served on that scale, then it's important. But I can guarantee that if you are doing anything interesting, you're probably easily using a few milliseconds, and frequently a few dozen, e.g. talking to a database or disk.

That ten times as fast has now become 10% of your overall problem. More than nothing, but probably less than any other part.

Even when you're making something for throughput, you'll find that most things are IO bound (and, when loaded, cause most things on the server to be), so it is usually much more interesting to relieve IO somehow (e.g. memcache anything you can, add cache headers to have some of the IO be conditional), than it is to look at that last unavoidable less-than-a-millisecond.


And importantly, it is the time your requests spend that is the main thing that caps the request rate you can handle. When you hit that rate, you will get slowness.



ab, ab2

Apache Benchmark, comes with apache.


Useful to check how your concurrency settings are working.

Keep in mind that the precise request rate is probably less meaningful, because you're probably testing a do-little page, and even if you aren't you're probably testing a best-case way (because asking for exactly the same thing over and over relies on caches more than everyday use does).


The most interesting options:

  • -n amount-of-requests: keep it busy for a while, OR:
  • -t seconds: ...to keep it busy for a given amount of seconds, however many requests that is.
  • -c concurrency uses a number of paralllel fetchers. Use this to realistically simulate many clients, and see whether they are handled in parallel. (Note that most web servers will limit the amount of concurrent connections to a single IP / client)


Example:

ab -t 10 -c 20 -k http://example.com/

Note that the final slash is significant, as without that you're effectively asking for the redirect page (or, sometimes, a 404). Redirects will not be followed, only retrieved.


Notes on reading results

reading results when concurrency>1
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Note that when concurrency is involved, request time is not a direct indication of rate.

Consider that ten sequential requests taking 0.1sec at 100% CPU is the same CPU time as 10 concurrent processes processes taking 1.0sec at 10% CPU each.

Either way, that's 0.1 CPU-seconds for each request, and 10 requests-per-second, but with different amounts of latency, throughput, and scheduling.


For a more concrete example, tests on a simple hello world app (a single single-thread process)

...with concurrency 1:

Time per request:       2.657 [ms] (mean)
Time per request:       2.657 [ms] (mean, across all concurrent requests)

...and concurrency 4:

Time per request:       10.950 [ms] (mean)
Time per request:       2.738 [ms] (mean, across all concurrent requests)

Effectively, the first shows an estimation of the wallclock time requests take, the second of CPU time under the assumption that the concurrency is the only factor.

...which is rarely true for dynamic requests. A lot of real-world requests are at least as IO-bound as they are CPU-bound, so you'll probably see the effect of databases, networking, caches, and such - and not really be able to separate their effect.

In this case it's an exact multiple of four, which indicates the four handlers each took four times when four were running concurrently - meaning the concurrency works (if they were handled sequentially it would look more like the first case), and it suggests handler is likely entirely CPU-bound.


Keep in mind that:

  • multiple-core processing means that you may very good scaling up to concurrency of 2 or 4 -- assuming the work will be spread among cores
  • ...beyond that you're just dividing CPU time among more concurrent processes, and it does nothing for the average rate of requests
  • A test run from a single client / source IP is almost never a good load test
  • a single IP/client may use and/or be granted only a few connections (web browsers as well as web servers often see 2 or 4), so a single-client test will only test how well a single client is served, and won't stress-test, and won't necessarily be a good indication of expectable request handling rate. (still, many servers have to divide resources above concurrency 2 or 4 anyway, so the difference is not necessarily large)


Ignore 'Failed requests' for dynamic pages

Since ab was written for static pages, it will assume different-sized responses are errors.

For example, a -n 20 might get you:

Complete requests:      20
Failed requests:        19
   (Connect: 0, Length: 19, Exceptions: 0)

This only means that the reported length was different in each response, which on dynamic pages may be entirely expected.

Others

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


http://www.pushtotest.com/

Unsorted

ab/ab2 [3]

httperf[4]

siege[5]

hammerhead[6]

http_load[7]

web_bench[8]

See also