✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

HTTP 1.x

Minimal requests

In some cases, such as debugging via netcat, or perhaps doing requests from microcontrollers or other embedded devices, it's useful to do a quick and dirty HTTP request with a minimal implementation on your side (for HTTP 1.0 and a very simple response body, you can get away with just a few strtoks and such).

For HTTP 1.0, the most basic is just GET / HTTP/1.0. You regularly want to support servers that do named virtual hosts, in which case you want:

GET / HTTP/1.0
Host: www.example.com

As in, those are all the bytes on a bare TCP connection required to get a response. To wit, a command line example:

echo -e "GET /index.html HTTP/1.0\nHost: www.example.com\n" | netcat www.example.com 80

(echo's -e interprets the newlines inside that, which makes it possible to write on a single line)

HTTP 1.1 requires that Host header.

And, technically, a bunch more. When doing minimal requests to HTTP1.1 servers you probably want to still say you're 1.0, because saying you're 1.1 suggests you support various extensions you're probably are not. You can often get away with it, but you can't count on it.

Some result status notes

The major groups are roughly:

1xx Informational
2xx Success
3xx Redirection
4xx Client Error (bad request, unknown url, bad auth, etc.)
5xx Server Error (scripting error, gateway timeout, server overloaded, etc.)

A few trigger automatic client handling. A decent number only have well defined use within specific protocols (e.g. WebDAV).

Some of the better known codes:

200 OK - the most generic indication that a request and response went well
404 Not Found - no resource under that URL
500 Internal Server Error - the most generic error you get from dynamic page generation, roughly meaning "...something broke"
401 Unauthorized - You may be able to go here if you authorized
403 Forbidden - valid request, but server won't serve this resource to you (e.g. valid credentials but no authorization -- but there may be entirely different reasons)
301 Moved Permanently - permanent redirect (see below)
302 Found - temporary redirect (see below)
400 Bad Request - Request that doesn't make sense (often bad syntax or malformed).

Also sometimes used as an application error code - e.g. REST-like things complaining about non-sensible requests to the service being provided

Some others I've seen repeatedly:

503 Service temporarily overloaded roughly means "something temporarily wrong, trying again later should probably work."

what triggers this varies. Possibly something doing smart allocation, sometimes internal timeouts report as this (instead of a 504).

504 Gateway timeout - server doing proxying to backend servers didn't get a (timely) response from a backend server

if that process is doing lots of work (consider offloading not-immediately-necessary work to a background process)

if that process is waiting on something else (like database), figure out why

502 Bad Gateway - usually means invalid response from a backend server

that is, most setups have workers, and something in front of delegating. This is the thing in front complaining about the workers

408 Request Timeout

"The client did not produce a request within the time that the server was prepared to wait".

May be a trashing client, a DoS attack, network load.

If it happens rarely, it may be hard to find out why. If it happens regularly, your chances may be slightly better.

410 Gone - "yes, something used to be hosted here. It isn't now."

Sometimes means a temporary configuration problem. Probably more often means the web server knows something is permanently gone and informs the client about it.(verify)

204 No Content - "I return no body, and that's intentional"

e.g. one way of detecting captive portals, in part because this is one specific case that is rarely returned by regular websites or by captive portals.

HTTP redirect

What

The most commonly used redirect statuses are 301 and 302, probably largely because they are were the only ones in HTTP/1.0(verify)).

301 Redirect (MOVED_PERMANENTLY)
- meaning "we have moved this, go there (and it'd be useful if you always did so in the future)"
- cacheable by default (so not good for temporary workarounds -- re-visiting clients may not notice in a while)
- used for moving domains, for redirecting between domains and sites (e.g. from example.com to www.example.com), for site reorganisation (...and using this as URL aliasing so that you don't break all old URLs - though many people are too lazy for this), for moving the pagerank from an old to a new location (not sure this is as true as some people make it sound, but google itself suggests using 301 over 302), and others

302 Found (MOVED_TEMPORARILY)
- meaning "fetch this content over there for now, but in the future come back to the URL you just used."
- cacheable only with explicit instructions (verify)
- used for redirect services such as tinyURL (302 is also justifiable?), for Temporarily redirecting to backup content while restoring main content (Though in practice backups are often out of date enough to make a "We're working on restoring the site" notice more practical), and others

And since HTTP/1.1(verify):

307 Temporary (TEMPORARY_REDIRECT)
- useful for pages that use POST, since it instructs the browser to POST the same thing to the new URL (while 301 and 302 seem to imply GET or be undefined)(verify)
- cacheable only with explicit instructions (verify)

303 See Other
- like 307, but instructs to do a GET instead. Primarily useful for scripts that may/always take POST requests to redirect to a basic URL / GET. (while 302/301 are relatively method-agnostic)(verify)
- not cacheable

In all cases you also need to add a Location response header, the value of which is thew new URL (should be an absolute URL, although browsers may choose to work with relative ones as well).

Notes:

older and simpler user agents may understand only 301 and 302, not 303 and/or 307.

Current browsers can be assumed to understand. Crawlers vary a bunch more.

It is reasonable for a 301 to be cached (by browsers, ISPs, and other caches) - after all, they are intended to be permanent. A 302 usually won't be (unless you instruct it(verify)). This is one real reason not to use 301 for something intended to be temporary - the originating server can't change this until the cache expires.

Whether 302s are cache seems to be mostly controlled by cacheing headers. (HTTP 1.1 mentions "This response is only cacheable if indicated by a Cache-Control or Expires header field.")

Practice

Can I have relative redirects?

Originally not, RFC 2616 requires an absolute URI in Location.

But that was replaced by RFC 7231, which says "When it has the form of a relative reference [...], the final value is computed by resolving it against the effective request URI".

Some UAs were be more permissive, but you couldn't count on it. You still can't count on all UAs to be up to date to RFC 7231, but for browsers you basically can.

Mass redirects

When moving sites, the lazy fix is to redirect all paths under the old site to the new site's root, but you may prefer to have redirects for every page to its actual new location.

Whether that's based on a rule or a big list, you probably want to offload these individual redirects to the web server, using something like mod_rewrite in a apache, or similar in nginx.

On rel="canonical"

rel=canonical is page content that tells crawlers what the preferred URL of the web page is, within the same domain.

This is only used by crawlers, and does not act as a redirect for UAs, so it's partly an SEO thing, and partly help slightly cleaner search results.

This can be useful if there are multiple correct URLs, like a blog having a shortened URL and a more expressive URL for posts, or e-commerce systems having multiple ways to view the same product.

<link rel="canonical" href="https://example.org/" />

(See also link rel)

Temporarily Unavailable (503)

Useful to signal 'come back later', mostly for spiders so that they are less likely to decide you've dropped off the internet.

Some header notes

User-Agent

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

https://en.wikipedia.org/wiki/User_agent#Format_for_human-operated_web_browsers

https://webaim.org/blog/user-agent-string-history/

Location

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Quoth RFC 2616:

The Location response-header field is used to redirect the recipient
to a Location other than the Request-URI
for completion of the request or identification of a new resource.

Mostly useful for (external/HTTP) redirects - see #HTTP_redirect. Also used in a few other places, such as 201 Created to refer to the URL that was created.

Value classically should be an absolute URL, though most browsers these days will also deal with relative URLs.

See also:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30

Content-Location

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Note: In most cases you are looking for Location instead of this

Says RFC 2616:

The Content-Location entity-header field MAY be used to supply the resource Location
for the entity enclosed in the message when that entity is accessible
from a Location separate from the requested resource’s URI.

For multiple-entity resources, such as the same page in other languages, this header lets you signal those as alternative URIs.

Done mostly in response to an applicable Accept header in the request, and is more or less a means of content negotiation.

May be absolute or relative URI. Undefined for POST and PUT, so you should probably stick to GET.

Has implications on caching, which can e.g. use this association to flush all variants of stale content.

Rarely used in web browsing(verify); perhaps most applicable to (MIME) multipart content[1] (and some things that can use that, like SOAP), or perhaps for HTTP-based protocols where you can apply somewhat stronger meaning, which seems to describe how it is used in Atom (see RFC 5023).

Content-Disposition

A response header, not part of HTTP/1.1, but fairly widely supported anyway.

Value of the header is a disposition-type plus optional parameters. Parameters start with a semicolon for separation.

There are two disposition-types:

attachment

useful to force download for a MIME type the UA can handle otherwise (side note: your other main option for that is to set the mime type to something the browser will always download, typically octet-stream)

inline

process as you normally would, which is the default, so specifying this is only useful when you specify parameters.

Parameters:

filename - specify the filename the UA ought to save as.

filename characters must be in ISO-8859-1 (Latin1)

Should be quoted if it contains spaces

filename*

filename characters can use RFC 5987 (see also below)

Not supported by all UAs, so if you use this, you should also add filename

Example from the RFC:

Content-Disposition: attachment; filename="EURO rates"; filename*=UTF-8''%e2%82%ac%20rates

See also:

RFC 6266 (earlier mentions in RFC 2616, RFC 2183, RFC 1806)

RFC 5987 text coding

...mainly for strings in HTTP headers.

The minimum required set of encodings specified this way is ISO-8859-1 and UTF-8 (and producers must use one of these).

As you often use this to break out of being restricted to ISO-8859-1, this seems usually used to get UTF-8.

The value is basically

charset (ISO-8859-1 or UTF-8, possible future additions)
'
optional language tag
'
percent-escaped bytestring (where the bytestring is coded according to the charset)

Examples:

iso-8859-1'en'%A3%20rates
UTF-8%e2%82%ac%20rates

`WWW-Authenticate` and `Authorization`

HTTP Basic Auth notes

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

HTTP Basic auth is Base64-ing of username:password.

When you do a regular request, a server can tell you you this auth is needed by responding with a 401 ('Authorization Required') that includes a header like:

WWW-Authenticate: Basic realm="Secure Area"

...and for which the body is usually a very minimal HTML document telling you that you are unauthorized -- but most browsers will only show that if you fail or refuse the following: Most browsers will pop up a dialog for login (mentioning the realm name), and automatically do the same request again, now with an authentication attempt in it.

That second (and successive) requests will include an Authorization header, like:

Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

...which is the Base64 encoding of plain text, here of Aladdin:open sesame, a colon-separated username and password.

There is not really a means of logout, no HTTP signaling for it. Browsers usually remember the login until they are closed, or the history is cleared.

Upside:

means people have to enter something valid to enter

Downside:

basically just sends passwords in plain text (as base64 is reversible by design)

This is not secure

When this is sent over anything unencrypted, anyone with the ability to sniff traffic can read it out.

Over HTTP, this was a bad thing to do. Now that HTTPS is the default (mostly) this is much less of an issue, but you must never accidentally disable that, and it's a separately configured thing.

So the only value is identifying users, not security, and don't use basic auth at all, unless you have a very good reason it's fine.

See also:

http://en.wikipedia.org/wiki/Basic_access_authentication

HTTP Digest Auth notes

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Originally described by RFC 2069, replaced by RFC 2617 which also describes a somewhat stronger variant (that can be backwards compatible with 2069).

Once you understand Basic auth, you can consider this a variant that

sends a hash instead
uses a client-generated nonce (lessens replay attacks, some pre-image attacks)

It's not ideal, but it's better than Basic auth.

Uses the same headers names as Basic, but with start with Digest and contain more values/options.

Exactly how the exchange should go seems to depend on two variables of the exchange, but it'll be be some variation on the theme of:

HA1 = MD5( username + ':' + password  )
HA2 = MD5( method   + ':' + digestURI )
response = MD5( HA1 + ':' + nonce + ':' + HA2  )

Note that some of the them Note that:

the first isn't as strong
most servers and clients support RFC 2069 style and qop-auth, but fewer support qop=auth-int

Security

Basically, it's an MD5 hash. Or rather two, which is an okay idea, and it's a solid few steps better than Basic auth.

Also, it allows client nonce, which should help against chosen plaintext attacks and things like rainbow table optimization, and some replay attacks (...in that a server can do some checks)

Still vulnerable against man-in-the-middle.

Semi-sorted

Expect

OPTIONS

The idea behind OPTIONS is to check which HTTP methods are allowed for a specific URL (or the server in general, via *), communicated via headers (mostly Allow(verify)).

It may have a body, but no use of it is described in RFC 2616. This seems meant for protocols on top of HTTP to do something potentially useful.

It seem OPTIONS is mostly seen in the context of things like WebDAV (part of the protocol) and CORS (for its preflight requests), so you rarely really have to respond to these yourself.

And since OPTIONS doesn't allow cacheing, not seeing it generally used may be a good thing.

CONNECT

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

https://en.wikipedia.org/wiki/HTTP_tunnel

connection types

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

XMLHttpRequest, a.k.a. XHR, AJAX

requests from a webpage to its originating server, e.g. to support single-page apps

WebSockets

a bidirectional connection initiated via HTTP

chunks of content are as separate messages. Beyond that, you decide your protocol.

useful for live content, push messages, etc.

also a bit of a firewall cheat (since it uses port 80 which is typically allowerd)

Comet[2] refers to some implementation that effectively allows server push
- long-polling - basically means the client does a request, and the server only responds when it has something to say, which may be much later.
- streaming - uses a persistent connection. Browser implementation variation means it is hard to make this robustly portable(verify).

HTTPS is HTTP over TLS or SSL

Server-sent events[3]

HTTP CONNECT tunnel

Basically a TCP connection initiated via a HTTP proxy

BOSH, Bidirectional-streams Over Synchronous HTTP

means using two HTTP connections to simulate one bidirectional one (verify)

Only really used by XMPP?

Range requests

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Mainly to support download continuing, and otherwise conserve bandwidth.

Server may indicate support via:

Accept-Ranges: bytes

It may also choose to indicate lack of support via

Accept-Ranges: none

A server may also choose to ignore a Range request.

The above header is advice.

A client may always ask for it, and should figure out lack of support from the response

basically whether it's a 206 with a Content-Range response header).

A client can do a request like

Range: bytes=0 - 499

The server (unless it needs to answer with 416 Range Not Satisfiable) would respond with a 206 Partial Content response with a response header like:

Content-Range: bytes 0-499/10000

Which is describes the byte range byterange / completelength where

Notes:

only applies to GETs; range must be ignored on everything else

byte ranges are zero-based offset, and inclusive

so 0-499/500 is the entire file; watch for off-by-one bugs

completelength may be * for unknown

the last position can be omitted, implies 'until the end'

e.g. 0- means everything

the first position can be omitted

e.g. -500 means last 500 bytes (but you still have to know the size)

You can ask for multiple sub-ranges in one request via comma separation

e.g. 0-0,-1 means the first and last bytes

this is more complex on both the client (bookkeeping) and server (multipart dealie, and it may coalesce adjacent ranges)

few clients will ever do this

clients that do not understand multipart responses shouldn't ask for them :)

servers do not all not implement this (technically a violation of the RFC)

response is a little more detailed

If-Range is the combination of this with Etag

basically a "entire file if different, subrange if still the same" deal(verify)

See also:

RFC 7233

HTTP/2

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

HTTP/2 smooths a few things which in HTTP1.x were workarounds, and tends to somewhat lower latency in the process.

(it seems that SPDY was the experimental version, and everything useful ended up in HTTP/2(verify))

HTTP/2 does not change HTTP1's semantics, but is a completely different transport at byte level.

You can see it as an API where the byte level is now handled for you. (Which is just as well. In HTTP/1.0 you could still do it all yourself because it was minimal, while proper 1.1 compliance was already quite hard)

Because of the same semantics, dropping it in shouldn't break anything at application level, though the change can still be a bunch of work.

Interesting things it adds include:

Request/response multiplexing

basically a better version of pipelining, in that it does not have head-of-line blocking issues

...except that under packet loss it still does have head-of-line blocking, because of how TCP recovers in-order (see also QUIC, which avoids this by being UDP based)

server push

Basically means the server can pre-emptively send responses

...though only to prime a browser's cache

...and note that the server has to know precisely what to push -- doing that efficiently can actually be much more complex than you probably think

Request/response priorities

e.g. could send send css first, js second, images last

And details like:

compresses HTTP headers

...though this helps (only) when they're not trivial

...and primarily applies to request headers, very little on not response headers

arguably mostly useful for some CDNs, but not much else

Some notes:

Browsers seem to have chosen to only support the TLS variant(verify)

single connection, so can be more sensitive to packet loss (which is essentially head-of-line at TCP level)

https://www.smashingmagazine.com/2017/04/guide-http2-server-push/

HTTP/2 is now supported

QUIC

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

QUIC is an always-encrypted transfer,

acting mostly like TCP+TLS,

yet implemented on UDP

It's a general-purpose transport, and has a design similar to HTTP/2(verify)

Upsides:

always encrypted and authenticated

faster initial encryption setup than with TLS (in part because encryption was designed into the protocol, not wrapped around it)

does connection multiplexing

Downsides:

because it's sort of imitating TCP over UDP, firewalling is harder

more complex to set up

There are two of 'em now, google QUIC and IETF QUIC, which have diverged enough to be considered separate beasts, though hopefully the two will converge again.

https://en.wikipedia.org/wiki/QUIC

HTTP/3

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Basically HTTP over QUIC, and solves the issue of head of line blocking under packet loss (well, improves the recovery).

https://en.wikipedia.org/wiki/HTTP/3

AOTW has only partial support

HTTP notes

Contents

HTTP 1.x

Minimal requests

Some result status notes

HTTP redirect

What

Practice

On rel="canonical"

Temporarily Unavailable (503)

Some header notes

User-Agent

Location

Content-Location

Content-Disposition

RFC 5987 text coding

`WWW-Authenticate` and `Authorization`

HTTP Basic Auth notes

HTTP Digest Auth notes

Semi-sorted

Expect

OPTIONS

CONNECT

connection types

Range requests

See also

HTTP/2

QUIC

HTTP/3

Navigation menu

HTTP notes

HTTP 1.x

Minimal requests

Some result status notes

HTTP redirect

What

Practice

On rel="canonical"

Temporarily Unavailable (503)

Some header notes

User-Agent

Location

Content-Location

Content-Disposition

RFC 5987 text coding

WWW-Authenticate and Authorization

HTTP Basic Auth notes

HTTP Digest Auth notes

Semi-sorted

Expect

OPTIONS

CONNECT

connection types

Range requests

See also

HTTP/2

QUIC

HTTP/3

Navigation menu

`WWW-Authenticate` and `Authorization`