📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.

(Note these are sometimes misnamed Universal resource somethings.)

Basic concepts

URL

Uniform Resource Locators (URLs) specify where to locate something. This often implies or at least suggests its availability.

URN

Uniform Resource Names (URNs) are what - usually identifiers (or names) of content or concepts. Typically with namespaces to signal what sort of thing you are identifying. For example, you might find someone using isbn:0898156122 to identify a book.

URNs are sometimes prefixed with urn: to clarify a URN is being used, though this is not considered part of the URN itself.

There are a few registered namespaces, see [1]. Having an official definition is useful for resolvers to do standardized things. There's nothing stopping you from inventing your own for a specific use.

URI

Uniform Resource Identifiers (URIs) may be either an URN, an URL, or in some cases be both (such as 'look up this identifier at a specific service').

On the internet, almost all URIs are specifically URLs.

There are a few specific real-world contexts where both URLs and URNs are used.

Also

IRI

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Internationalized Resource Identifier (RFC 3987, 2005) extends URIs with Unicode.

It amounts to percent encoding UTF8 bytes. Which we had been doing for a while anyway, but it's nice to have a standard, some restrictions.

https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier

XRI

IDN

In practice

Maximum URL length

URL length is considered limited in HTTP requests, partially to avoid DoS attacks that ties up resources in URL parsing (though not really to receiving that data, and note that the DoS argument also applies to using large requests in other ways, e.g. large POSTed attachments)

...so servers may deem a request silly and send a "413 Entity Too Large" or "414 Request-URI Too Long" instead of handling that request.

I've seen quoted figures like 4kB, 8kB (8190 for Apache, see LimitRequestLine), 16kB (documented for IIS), and 32kB. This has probably changed(verify)).

You can generally assume you can get away with ~2000 characters, which is high enough for most useful things.

If you're sending serious data in the URL, you should be able to explain why you are not POSTing the data instead.

Note that most uses for very long URLs are for data in the request line, which can go into a POST body instead. And often should anyway, as this has better semantics when used to update server state, e.g. that browser will not request it without explicit interaction, and most spiders won't do so at all.

Notes on standards and real-world implementations:

RFC 2616 (HTTP 1.1) does not specify an upper limit

It does mention that some very old (proxy) software might not support lengths above 255, though this is an estimate and a very cautious one at that.

In earlier Internet Explorer versions (which?(verify)), URIs can be no longer than 2083 characters, of which at most 2048 can be the path
Browsers tend to have no apparent limits, or more to the point, limits higher than most server limits[2]

servers/frameworks may impose their own limits, which are sometimes configurable, and also regularly not.

On a related note, POST body length limits are not mentioned in standards, though in practice are often limited by server configuration, or implementation. For example, things that load POST into memory for speed may limit it to avoid self-DOSing.

Defaults may be on the order of a handful of MBs (see e.g. PHP and nginx defaults),

maxima may be on the order of GBs (assume 2GB server-side, as in sint32max, though it can be more),

in theory you could use figures up to host RAM, or beyond if you make it stream to storage.

URI parsing, escaping, valid characters

See Escaping_and_delimiting_notes#URI_parsing.2C_escaping.2C_and_some_related_concepts

On PATH_INFO

With or without the CGI-style split/movement between SCRIPT_NAME and PATH_INFO, there are a few cases for PATH_INFO:

empty string
- if at root, or if it maps to a directory or virtual directory, servers will usually send an external redirect to the same URL with an added slash
- browsers may already add it, figuring they'll save you a few hunred milliseconds.

not an empty string, in which case it must start with a slash to be valid. It can be just a slash, or a longer path.

It depends on server configuration whether it then...

maps to a directory - when it is a slash or ends with a slash, and the server config maps it to a filesystem/virtual directory. Often means an index serve.
leads to a redirect to add a final slash so that the above case applies more cleanly (e.g. apache does this, based on its configuration)
hooks into a module / dynamic app, in which case control is handed over and further treatment is completely up to it. Usually happens after PATH_INFO and SCRIPT_NAME are altered to reflect app path mounting.

Uniform Resource Somethings

Contents

Basic concepts

URL

URN

URI

Also

IRI

XRI

IDN

See also

In practice

Maximum URL length

URI parsing, escaping, valid characters

Omitting scheme

Navigation menu