HTTP notes

From Helpful
Jump to: navigation, search
Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · plugin notes · unsorted

Server stuff:

Dynamic server stuff:

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Minimal requests

In some cases, such as debugging via netcat, or perhaps doing requests from microcontrollers or other embedded devices, it's useful to do a quick and dirty HTTP request with a minimal implementation on your side (for HTTP 1.0 and a very simple response body, you can get away with just a few strtoks and such).

For HTTP 1.0, the most basic is just
GET / HTTP/1.0
.

You regularly want to support servers that do named virtual hosts, in which case you want:

GET / HTTP/1.0
Host: www.example.com

This will also work with most HTTP 1.1 servers.

Note that when talking to HTTP1.1 servers you generally don't want to mention HTTP/1.1 in the request, because you're saying you're supporting various things you're probably not. You may be able to get away with it, but you can't count on it.

Some result status notes

The major groups are roughly:

  • 1xx Informational
  • 2xx Success
  • 3xx Redirection
  • 4xx Client Error (bad request, unknown url, bad auth, etc.)
  • 5xx Server Error (scripting error, gateway timeout, server overloaded, etc.)

A few trigger automatic client handling. A decent number only have well defined use within specific protocols (e.g. WebDAV).


Some of the better known codes:

  • 200 OK
    - the most generic indication that a request and response went well
  • 404 Not Found
    - no resource under that URL
  • 500 Internal Server Error
    - the most generic error you get from dynamic page generation
  • 401 Unauthorized
    - You may be able to go here if you authorized. Browsers typically present a login dialog and automatically do another request.
  • 403 Forbidden
    - valid request, but server won't serve this resource to you (regardless of authentication)
  • 301 Moved Permanently
    - permanent redirect (see below)
  • 302 Found
    - temporary redirect (see below)
  • 400 Bad Request
    - Request that doesn't make sense (often bad syntax or malformed). Also sometimes used as an application error code - e.g. REST-like things complaining about non-sensible requests to the service being provided

Some others I've seen repeatedly:

  • 410 Gone
    - "yes, something used to be hosted here. It isn't now." Sometimes means a temporary configuration problem. Probably more often means the web server knows something is gone and informs the client about it.(verify)


HTTP redirect

What

The most commonly used redirect statuses are 301 and 302, largely because they are were the only ones in HTTP/1.0(verify)).

  • 301 Redirect (MOVED_PERMANENTLY)
    • meaning "we have moved this, go there (and it'd be useful if you always did so in the future)"
    • cacheable by default (so if removed, re-visiting clients may not notice in a while) (verify)
    • used for moving domains, for redirecting between domains and sites (e.g. from example.com to www.example.com), for site reorganisation (...and using this as URL aliasing so that you don't break all old URLs - though many people are too lazy for this), for moving the pagerank from an old to a new location (not sure this is as true as some people make it sound, but google itself suggests using 301 over 302), and others
  • 302 Found (MOVED_TEMPORARILY)
    • meaning "fetch this content over there for now, but in the future come back to the URL you just used."
    • cacheable only with explicit instructions (verify)
    • used for redirect services such as tinyURL (302 is also justifiable?), for Temporarily redirecting to backup content while restoring main content (Though in practice backups are often out of date enough to make a "We're working on restoring the site" notice more practical), and others


And since HTTP/1.1(verify):

  • 307 Temporary (TEMPORARY_REDIRECT)
    • useful for pages that use POST, since it instructs the browser to re-POST the POSTed data to the new URL (while 301 and 302 seem to imply GET or be undefined)(verify)
    • cacheable only with explicit instructions (verify)
  • 303 See Other
    • like 307, but forces a GET instead. Primarily useful for scripts that may/always take POST requests to redirect to a basic URL / GET. (while 302/301 are relatively method-agnostic)(verify)
    • not cacheable


In all cases you also need to add a Location response header, the value of which is thew new URL (should be an absolute URL, although browsers may choose to work with relative ones as well).


Notes:

  • older and simpler user agents may understand only 301 and 302, not 303 and/or 307.
Current browsers can be assumed to understand. Bots not so much.
  • It is reasonable for a 301 to be cached (by browsers, ISPs, and other caches) - after all, they are intended to be permanent. A 302 usually won't be (unless you instruct it(verify)). This is one real reason not to use 301 for something intended to be temporary - the originating server can't change this until the cache expires.
  • Whether 302s are cache seems to be mostly controlled by cacheing headers. (HTTP 1.1 mentions "This response is only cacheable if indicated by a Cache-Control or Expires header field.")


Practice

For site redirects, moved sites: The simple example is to redirect the site root (/) to a new URL, but in practice this can be one location of many you want to direct, in which case you probably want to offload these individual redirects to the web server, using something like mod_rewrite. See also pages like [1].

Mass URL aliasing depends a little - sometimes it's easier to do with dynamic scripting, which fetches the new URL from a database.


See also rel="canonical"


See also

Temporarily Unavailable (503)

Useful to signal 'come back later', mostly for spiders so that they are less likely to decide you've dropped off the internet.


Some header notes

Location

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Quoth RFC 2616:

The Location response-header field is used to redirect the recipient
to a Location other than the Request-URI
for completion of the request or identification of a new resource.

Mostly useful for (external/HTTP) redirects - see #HTTP_redirect. Also used in a few other places, such as 201 Created to refer to the URL that was created.

Value should be an absolute URL - though a number of browsers will deal with relative URLs, not all will.

See also:

Content-Location

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Note: In most cases you are looking for Location instead of this


Says RFC 2616:

The Content-Location entity-header field MAY be used to supply the resource Location
for the entity enclosed in the message when that entity is accessible
from a Location separate from the requested resource’s URI.

For multiple-entity resources, such as the same page in other languages, this header lets you signal those as alternative URIs.

Done mostly in response to an applicable Accept header in the request, and is more or less a means of content negotiation.

May be absolute or relative URI. Undefined for POST and PUT, so you should probably stick to GET.

Has implications on caching, which can e.g. use this association to flush all variants of stale content.

Rarely used in web browsing(verify); perhaps most applicable to (MIME) multipart content[2] (and some things that can use that, like SOAP), or perhaps for HTTP-based protocols where you can apply somewhat stronger meaning, which seems to describe how it is used in Atom (see RFC 5023).



Content-Disposition

A response header, not part of HTTP/1.1, but fairly widely supported anyway.


Value of the header is a disposition-type plus optional parameters. Parameters start with a semicolon for separation.

There are two disposition-types:

  • attachment
useful to force download for a MIME type the UA can handle otherwise (side note: your other main option for that is to set the mime type to something the browser will always download, typically octet-stream)
  • inline
process as you normally would, which is the default, so specifying this is only useful when you specify parameters.


Parameters:

  • filename
    - specify the filename the UA ought to save as.
filename characters must be in ISO-8859-1 (Latin1)
Should be quoted if it contains spaces
  • filename*
filename characters can use RFC 5987 (see also below)
Not supported by all UAs, so if you use this, you should also add filename

Example from the RFC:

Content-Disposition: attachment; filename="EURO rates"; filename*=utf-8%e2%82%ac%20rates

See also:

RFC 5987 text coding

...mainly for strings in HTTP headers.


The minimum required set of encodings is ISO-8859-1 and UTF-8 (and producers muse use one of these) it seems the only use is ailty of using UTF-8 strings.

The value is basically

  • charset (
    ISO-8859-1
    or
    UTF-8
    , possible future additions)
  • '
  • optional language tag
  • '
  • percent-escaped bytestring (where the bytestring is coded according to the charset)

Examples:

iso-8859-1'en'%A3%20rates
UTF-8%e2%82%ac%20rates

WWW-Authenticate and Authorization

Basic Auth notes

Digest Auth notes

Semi-sorted

Expect

OPTIONS

The idea behind OPTIONS is to check which HTTP methods are allowed for a specific URL (or the server in general, via
*
)
, communicated via headers (mostly Allow(verify)).


It may have a body, but no use of it is described in RFC 2616. This seems meant for protocols on top of HTTP to do something potentially useful.


It seem OPTIONS is mostly seen in the context of things like WebDAV (part of the protocol) and CORS (for its preflight requests), so you rarely really have to respond to these yourself.

And since OPTIONS doesn't allow cacheing, not seeing it generally used may be a good thing.

connection types

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
requests from a webpage to its originating server, e.g. to support single-page apps
  • WebSockets
a bidirectional connection initiated via HTTP
chunks of content are as separate messages. Beyond that, you decide your protocol.
useful for live content, push messages, etc.
also a bit of a firewall cheat (since it uses port 80 which is typically allowerd)
  • Comet[3] refers to some implementation that effectively allows server push
    • long-polling - basically means the client does a request, and the server only responds when it has something to say, which may be much later.
    • streaming - uses a persistent connection. Browser implementation variation means it is hard to make this robustly portable(verify).
  • HTTPS is HTTP over TLS or SSL
  • Server-sent events[4]
  • HTTP CONNECT tunnel
Basically a TCP connection initiated via a HTTP proxy
  • BOSH, Bidirectional-streams Over Synchronous HTTP
means using two HTTP connections to simulate one bidirectional one (verify)
Only really used by XMPP?


Range requests

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

When a server mentions Accept-Ranges: bytes

This is mainly to support download continuing, and otherwise conserve bandwidth.


Server may indicate support via:

Accept-Ranges: bytes

It may also choose to indicate lack of support via

Accept-Ranges: none

A server may also choose to ignore a Range request.


The above header is advice.

A client may always ask for it, and should figure out lack of support from the response

basically whether it's a 206 with a Content-Range response header).


A client can do a request like

Range: bytes=0 - 499
The server would respond with a
206 Partial Content
response (or
416 Range Not Satisfiable
where applicable)
like
Content-Range: bytes 0-499/10000

Which is describes the byte range byterange / completelength where

Notes:

  • only applies to GETs; range must be ignored on everything else
  • byte ranges are zero-based offset, and inclusive
so 0-499/500 is the entire file; watch for off-by-one bugs
  • completelength may be * for unknown
  • the last position can be omitted, implies 'until the end'
e.g.
0-
means everything
  • the first position can be omitted
e.g.
-500
means last 500 bytes


  • You can ask for multiple sub-ranges in one request via comma separation
e.g.
0-0,-1
means the first and last bytes
this is more complex on both the client (bookkeeping) and server (multipart dealie, and it may coalesce adjacent ranges)
few clients will ever do this
clients that do not understand multipart responses shouldn't ask for them :)
servers do not all not implement this (technically a violation of the RFC)
response is a little more detailed


  • If-Range is the combination of this with Etag
basically a "entire file if different, subrange if still the same" deal(verify)


See also:

See also