HTTP notes

From Helpful
(Redirected from HTTP1 notes)
Jump to: navigation, search
Related to web development, hosting, and such: (See also the webdev category)
JS libraries and frameworks

Server stuff:

Dynamic server stuff:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Minimal requests

In some cases, such as debugging via netcat, or perhaps doing requests from microcontrollers or other embedded devices, it's useful to do a quick and dirty HTTP request with a minimal implementation on your side (for HTTP 1.0 and a very simple response body, you can get away with just a few strtoks and such).

For HTTP 1.0, the most basic is just
GET / HTTP/1.0

You regularly want to support servers that do named virtual hosts, in which case you want:

GET / HTTP/1.0

This will also work with most HTTP 1.1 servers.

Note that when talking to HTTP1.1 servers you generally don't want to mention HTTP/1.1 in the request, because you're saying you're supporting various things you're probably not. You may be able to get away with it, but you can't count on it.

Some result status notes

The major groups are roughly:

  • 1xx Informational
  • 2xx Success
  • 3xx Redirection
  • 4xx Client Error (bad request, unknown url, bad auth, etc.)
  • 5xx Server Error (scripting error, gateway timeout, server overloaded, etc.)

A few trigger automatic client handling. A decent number only have well defined use within specific protocols (e.g. WebDAV).

Some of the better known codes:

  • 200 OK
    - the most generic indication that a request and response went well
  • 404 Not Found
    - no resource under that URL
  • 500 Internal Server Error
    - the most generic error you get from dynamic page generation
  • 401 Unauthorized
    - You may be able to go here if you authorized. Browsers typically present a login dialog and automatically do another request.
  • 403 Forbidden
    - valid request, but server won't serve this resource to you (regardless of authentication)
  • 301 Moved Permanently
    - permanent redirect (see below)
  • 302 Found
    - temporary redirect (see below)
  • 400 Bad Request
    - Request that doesn't make sense (often bad syntax or malformed).
Also sometimes used as an application error code - e.g. REST-like things complaining about non-sensible requests to the service being provided

Some others I've seen repeatedly:

  • 408 Request Timeout
"The client did not produce a request within the time that the server was prepared to wait".
May be a trashing client, a DoS attack, network load.
If it happens rarely, it may be hard to find out why. If it happens regularly, your chances may be slightly better.
  • 410 Gone
    - "yes, something used to be hosted here. It isn't now."
Sometimes means a temporary configuration problem. Probably more often means the web server knows something is permanently gone and informs the client about it.(verify)
  • 502 Bad Gateway
    - usually means invalid response from a backend server
  • 504 Gateway timeout
    - server doing proxying to backend servers didn't get a (timely) response from a backend server
if that process is doing lots of work (consider offloading not-immediately-necessary work to a background process)
if that process is waiting on something else (like database), figure out why
  • 503 Service temporarily overloaded
    - "something temporarily wrong, trying again later should probably work."
what triggers this varies. It may come from some process-pool manager that is at its configured limit. In some cases backend timeouts will do this (instead of a 504).

HTTP redirect


The most commonly used redirect statuses are 301 and 302, largely because they are were the only ones in HTTP/1.0(verify)).

  • 301 Redirect (MOVED_PERMANENTLY)
    • meaning "we have moved this, go there (and it'd be useful if you always did so in the future)"
    • cacheable by default (so not good for temporary workarounds -- re-visiting clients may not notice in a while)
    • used for moving domains, for redirecting between domains and sites (e.g. from to, for site reorganisation (...and using this as URL aliasing so that you don't break all old URLs - though many people are too lazy for this), for moving the pagerank from an old to a new location (not sure this is as true as some people make it sound, but google itself suggests using 301 over 302), and others
    • meaning "fetch this content over there for now, but in the future come back to the URL you just used."
    • cacheable only with explicit instructions (verify)
    • used for redirect services such as tinyURL (302 is also justifiable?), for Temporarily redirecting to backup content while restoring main content (Though in practice backups are often out of date enough to make a "We're working on restoring the site" notice more practical), and others

And since HTTP/1.1(verify):

  • 307 Temporary (TEMPORARY_REDIRECT)
    • useful for pages that use POST, since it instructs the browser to POST the same to the new URL (while 301 and 302 seem to imply GET or be undefined)(verify)
    • cacheable only with explicit instructions (verify)
  • 303 See Other
    • like 307, but forces a GET instead. Primarily useful for scripts that may/always take POST requests to redirect to a basic URL / GET. (while 302/301 are relatively method-agnostic)(verify)
    • not cacheable

In all cases you also need to add a Location response header, the value of which is thew new URL (should be an absolute URL, although browsers may choose to work with relative ones as well).


  • older and simpler user agents may understand only 301 and 302, not 303 and/or 307.
Current browsers can be assumed to understand. Bots not so much.
  • It is reasonable for a 301 to be cached (by browsers, ISPs, and other caches) - after all, they are intended to be permanent. A 302 usually won't be (unless you instruct it(verify)). This is one real reason not to use 301 for something intended to be temporary - the originating server can't change this until the cache expires.
  • Whether 302s are cache seems to be mostly controlled by cacheing headers. (HTTP 1.1 mentions "This response is only cacheable if indicated by a Cache-Control or Expires header field.")


For site redirects, moved sites: The simple example is to redirect the site root (/) to a new URL, but in practice this can be one location of many you want to direct, in which case you probably want to offload these individual redirects to the web server, using something like mod_rewrite. See also pages like [1].

Mass URL aliasing depends a little - sometimes it's easier to do with dynamic scripting, which fetches the new URL from a database.

See also rel="canonical"

See also

Temporarily Unavailable (503)

Useful to signal 'come back later', mostly for spiders so that they are less likely to decide you've dropped off the internet.

Some header notes


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Quoth RFC 2616:

The Location response-header field is used to redirect the recipient
to a Location other than the Request-URI
for completion of the request or identification of a new resource.

Mostly useful for (external/HTTP) redirects - see #HTTP_redirect. Also used in a few other places, such as 201 Created to refer to the URL that was created.

Value should be an absolute URL - though a number of browsers will deal with relative URLs, not all will.

See also:


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Note: In most cases you are looking for Location instead of this

Says RFC 2616:

The Content-Location entity-header field MAY be used to supply the resource Location
for the entity enclosed in the message when that entity is accessible
from a Location separate from the requested resource’s URI.

For multiple-entity resources, such as the same page in other languages, this header lets you signal those as alternative URIs.

Done mostly in response to an applicable Accept header in the request, and is more or less a means of content negotiation.

May be absolute or relative URI. Undefined for POST and PUT, so you should probably stick to GET.

Has implications on caching, which can e.g. use this association to flush all variants of stale content.

Rarely used in web browsing(verify); perhaps most applicable to (MIME) multipart content[2] (and some things that can use that, like SOAP), or perhaps for HTTP-based protocols where you can apply somewhat stronger meaning, which seems to describe how it is used in Atom (see RFC 5023).


A response header, not part of HTTP/1.1, but fairly widely supported anyway.

Value of the header is a disposition-type plus optional parameters. Parameters start with a semicolon for separation.

There are two disposition-types:

  • attachment
useful to force download for a MIME type the UA can handle otherwise (side note: your other main option for that is to set the mime type to something the browser will always download, typically octet-stream)
  • inline
process as you normally would, which is the default, so specifying this is only useful when you specify parameters.


  • filename
    - specify the filename the UA ought to save as.
filename characters must be in ISO-8859-1 (Latin1)
Should be quoted if it contains spaces
  • filename*
filename characters can use RFC 5987 (see also below)
Not supported by all UAs, so if you use this, you should also add filename

Example from the RFC:

Content-Disposition: attachment; filename="EURO rates"; filename*=utf-8%e2%82%ac%20rates

See also:

RFC 5987 text coding

...mainly for strings in HTTP headers.

The minimum required set of encodings is ISO-8859-1 and UTF-8 (and producers muse use one of these) it seems the only use is ailty of using UTF-8 strings.

The value is basically

  • charset (
    , possible future additions)
  • '
  • optional language tag
  • '
  • percent-escaped bytestring (where the bytestring is coded according to the charset)



WWW-Authenticate and Authorization

Basic Auth notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Basic auth is Base64-ing of username:password.

A server will tell you this auth is needed by responding to a regular request with a 401 response ('Authorization Required') that includes a header like:

WWW-Authenticate: Basic realm="Secure Area"

...and for which the body is usually a very minimal HTML document telling you that you are unauthorized -- but most browsers will, instead of showing it, pop up a dialog for login (mentioning the realm name), and do the request again.

(you will only see that HTML page if you refuse or fail to authenticate)

That second (and successive) requests will include an Authorization header, like:

Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
...which is the Base64 encoding of plain text, here of
Aladdin:open sesame
, a colon-separated username and password.

There is not really a means of logout, no HTTP signaling for it. Browsers usually remember the login until they are closed, or the history is cleared.

Big security warning

Since Base64 is a reversible text coding

this is basically just plaintext
so and entirely sniffable (assuming you have a point of interception) unless transported over SSL or TLS (i.e. HTTPS)
so its only real value over plain HTTP is identifying users, not security. Generally, don't do this unless you have a good reason.

See also:

Digest Auth notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Originally described by RFC 2069, replaced by RFC 2617 which also describes a somewhat stronger variant (that can be backwards compatible with 2069).

Once you understand Basic auth, you can consider this a variant that sends a hash instead. It's not ideal, but a lot better than Basic auth.

Uses the same headers names as Basic, but with start with Digest and contain more values/options.

The exchange is also similar, though exactly how the exchange should work depends on the 'qop' (quality of protection, which can be missing, 'auth', or 'auth-int'), which is part of why implementation of both sides is more involved than Basic -- and you'll probably want a library, or web server, handling this for you.

On qop variants

qop not specified ('RFC 2069 mode'):

HA1 = MD5( username + ':' + password  )
HA2 = MD5( method   + ':' + digestURI )
response = MD5( HA1 + ':' + nonce + ':' + HA2  )


HA1 = MD5( username + ':' + password  )
HA2 = MD5( method   + ':' + digestURI )
response = MD5( HA1 + ':' + nonce + nonce_count + client_nonce + qop + ':' + HA2 )


HA1 = MD5( username + ':' + password  )
HA2 = MD5( method   + ':' + digestURI + ':' + MD5(entity_body) )
response = MD5( HA1 + ':' + nonce + nonce_count + client_nonce + qop + ':' + HA2 )

Note that:

  • the first isn't as strong
  • most servers and clients support RFC 2069 style and qop-auth, but fewer support qop=auth-int


Basically, it's an MD5 hash. Or rather two, which is an okay idea, and it's a solid few steps better than Basic auth.

Also, it allows client nonce, which should help against chosen plaintext attacks and things like rainbow table optimization, and some replay attacks ( that a server can do some checks)

Still vulnerable against man-in-the-middle.




The idea behind OPTIONS is to check which HTTP methods are allowed for a specific URL (or the server in general, via
, communicated via headers (mostly Allow(verify)).

It may have a body, but no use of it is described in RFC 2616. This seems meant for protocols on top of HTTP to do something potentially useful.

It seem OPTIONS is mostly seen in the context of things like WebDAV (part of the protocol) and CORS (for its preflight requests), so you rarely really have to respond to these yourself.

And since OPTIONS doesn't allow cacheing, not seeing it generally used may be a good thing.


connection types

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
requests from a webpage to its originating server, e.g. to support single-page apps
  • WebSockets
a bidirectional connection initiated via HTTP
chunks of content are as separate messages. Beyond that, you decide your protocol.
useful for live content, push messages, etc.
also a bit of a firewall cheat (since it uses port 80 which is typically allowerd)
  • Comet[3] refers to some implementation that effectively allows server push
    • long-polling - basically means the client does a request, and the server only responds when it has something to say, which may be much later.
    • streaming - uses a persistent connection. Browser implementation variation means it is hard to make this robustly portable(verify).
  • HTTPS is HTTP over TLS or SSL
  • Server-sent events[4]
  • HTTP CONNECT tunnel
Basically a TCP connection initiated via a HTTP proxy
  • BOSH, Bidirectional-streams Over Synchronous HTTP
means using two HTTP connections to simulate one bidirectional one (verify)
Only really used by XMPP?

Range requests

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Mainly to support download continuing, and otherwise conserve bandwidth.

Server may indicate support via:

Accept-Ranges: bytes

It may also choose to indicate lack of support via

Accept-Ranges: none

A server may also choose to ignore a Range request.

The above header is advice.

A client may always ask for it, and should figure out lack of support from the response

basically whether it's a 206 with a Content-Range response header).

A client can do a request like

Range: bytes=0 - 499
The server (unless it needs to answer with
416 Range Not Satisfiable
would respond with a
206 Partial Content
response with a response header like:
Content-Range: bytes 0-499/10000

Which is describes the byte range byterange / completelength where


  • only applies to GETs; range must be ignored on everything else
  • byte ranges are zero-based offset, and inclusive
so 0-499/500 is the entire file; watch for off-by-one bugs
  • completelength may be * for unknown
  • the last position can be omitted, implies 'until the end'
means everything
  • the first position can be omitted
means last 500 bytes (but you still have to know the size)

  • You can ask for multiple sub-ranges in one request via comma separation
means the first and last bytes
this is more complex on both the client (bookkeeping) and server (multipart dealie, and it may coalesce adjacent ranges)
few clients will ever do this
clients that do not understand multipart responses shouldn't ask for them :)
servers do not all not implement this (technically a violation of the RFC)
response is a little more detailed

  • If-Range is the combination of this with Etag
basically a "entire file if different, subrange if still the same" deal(verify)

See also:

See also


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

HTTP/2 smooths a few things which in HTTP1.x were workarounds, and tends to lower latency in the process. (it seems that SPDY was the experimental version, and everything useful ended up in HTTP/2(verify))

HTTP/2 does not change HTTP1's semantics, but is a completely different transport at byte level. (You can see it as an API where the byte level is now handled for you. In HTTP/1.0 you could still do it all yourself because it was minimal, while proper 1.1 compliance was already quite hard.)

Because of the same semantics, dropping it in shouldn't break anything, but it can still be a bunch of work.

Interesting things it adds include:

  • Request/response multiplexing
basically a better version of pipelining, in that it does not have head-of-line blocking issues
...except that under packet loss it still does, because of how TCP recovers in-order
  • server push
Basically means the server can pre-emptively send responses, to prime a browser's cache before it knows it needs parts
(fallback for non-supporting browsers is that it would just do the request)
the server itself has to know precisely what to push -- this is actually more complex than you think
  • Request/response priorities
e.g. send css first, js second, images last

And details like:

  • compresses HTTP headers
helps (only) when they're not trivial
and primarily applies to request headers, very little on not response headers
(arguably mostly useful for some CDNs)

Some notes:

  • Browsers seem to have chosen to only support the TLS variant(verify)
  • single connection, so can be more sensitive to packet loss (which is essentially head-of-line at TCP level)

HTTP/2 is now supported


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

QUIC is an always-encrypted transfer, acting mostly like TCP+TLS, but implemented on UDP.

It's a general-purpose transport, though has similar design to HTTP/2(verify)


  • faster initial connection setup (in part because encryption was designed into the protocol, not wrapped around it as with TLS)
  • connection multiplexing
  • always encrypted and authenticated


  • because it's sort of imitating TCP over UDP, firewalling is harder
  • more complex to set up

There are two of 'em now, google QUIC and IETF QUIC, which have diverged enough to be considered separate beasts, though hopefully the two will converge again.


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Basically HTTP over QUIC, and solves the issue of head of line blocking under packet loss (well, improves the recovery).

AOTW has only partial support