Cookie notes

From Helpful
Jump to: navigation, search
Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Introduction

Cookies give a website some memory on your computer.

A server can ask a visiting browser to remember a piece of text, and on a later visit repeat it back - verbatim, only.


Originally the only mechanism was the HTTP response itself asks a browser to store some information. Assuming the browser agrees to do so, it will then send that exact information whenever it returns to that same domain/server/site/application.

Later, scripting added some extra abilities, mostly particular the fact that the page that was served can do the same in browser-side scripting, On behalf of the server that served the page, only, so this does not change who can get that information. (It did allow some creative new uses, though)


One of the easiest things a cookie that lets a server do is recognize that a specific browser returns to visit. It's like giving someone a specific, unique word to repeat back later, to let you know it's them.

Many uses of cookies are some variant of this "yup, this is still me" use, but there are others. It depends mostly on what is stored.

Purposes include:

  • keeping users logged in
typically a token representing "we have recently checked this user's authentication" (and we check that token with via server state).
not any actual user/session state, that can be kept on the server side, just referred to by this token.
  • identifying returning users regardless of login
...which can be used e.g. remember shopping basket state even when you are not logged in yet
or remember other state within a session, for a person who hasn't logged into an account, or for a site without accounts
Can be done anonymously, e.g. storing a number that is randomly generated and itself meaningful, but remembered for a while
  • extension of the previous point: tracking user visits, usage, or the order in which pages are accessed within a site, for example
doing so requires identifying returning users, cookies happen to be the most convenient way of doing so.
anonymously recording the set of pages that a specific browser visits on a site (or within the domain in which the cookie is set)
otherwise maintaining specific information about users (often relatively anonymously unless you choose to identify yourself to a site/domain that does this, or happen to be identifiable another way)


Mechanism and syntax

Cookies are just text.

Their values and related metadata are sent in a HTTP header.

A cookie set consists of

  • a set of semicolon-separated predefined parts
  • most of which are optional (will default to certain behaviour when omitted)
  • most of which take the form of attr=value, a few of which take no value

...and actual values to store in the cookie, in the form name=value, which is actually the only required part.


For example, a server may have a response containing:

Set-Cookie: foo=bar; Expires=Mon, 09-Dec-2002 13:46:00 GMT; Secure

A UA may then later send a request containing:

Cookie: foo=bar

If multiple cookies apply, they are merged into one Cookie header, separated by semicolons (RFC6265 forbids multiple Cookie header fields[1])


Standards and the real world
  • The first standard was Set-Cookie:, the basics of which were standardized in RFC 2109 (from 1997)
  • Set-Cookie2 was written to extend that, standardized in RFC 2965 (from 2000), with the idea that it would replace Set-Cookie
but it never really caught on, and was deprecated in 2011 by RFC 6265.
Modern browsers do not support Set-Cookie2.
  • RFC 6265 is basically an update to Set-Cookie:
  • There have been various things used in the real world since the RFC 2109 spec that became widely supported.
Some of them made it into RFC 6265, some of them are just common enough that you'ld want to support them.


Name and value

Required.

Basically just name=value

Notes:

  • attribute names are case-insensitive
  • the value should be escaped so that it will not clash with parsing -- see RFC 2616's definition of quoted-string
  • if a particular name appears multiple times in a set-cookie, only the first should be used
  • Names starting with $ are reseved and not to be used by applications

You can set a variable name to contain a new value to make the browser overwrite it.

Note that set-cookies with new values only overwrite values for a name when the old and new Domain and Path values are also equal.

Note that an Expires in the past and/or a Max-Age of 0 will cause a cookie to be discarded regardless of value (a common way to delete a cookie).


Expires

Optional, but common because a lot of uses want persistent cookies, not session cookies.


Cookies without an Expires=

will lead to the UA removing it when the UA closes.
Sometimes called session cookie - in the per-run-of-the-browser sense, not to be confused with cookies that support login sessions.


Cookies with an Expires=

should persist between different runs of the browser
until the given expiration
..or until the UA removes the cookie for other reasons
These are sometimes called persistent cookies.


Domain

Optional.


If omitted, the UA will decide to send the cookie for the hostname that set it, excluding subdomains.

If set, the UA will decide to send the cookie for the requested domain (unless refused for some reason), including subdomains.


For example,

if domain was omitted, and set-cookie was sent from app1.example.org, cookies will be sent back from visits to app1.example.org, and not example.org
if domain was omitted, and set-cookie was sent from example.org, cookies will be sent back from visits to example.org and not app1.example.org
if domain=example.org, cookies will be sent back from visits to example.org, app1.example.org, and any other subdomain of example.org


If you meant it as a site-wide login, you might want to send Domain=example.org so that it will be sent to example.org and anything under it, e.g. app2.example.org


Notes

  • can be a feature, e.g. for site-wide logins
  • can be a security risk. Consider e.g. the case where app2.example.org is hosted hosted by someone else.
  • You can only send one value
so you can't craft a list of specific allowed hosts
  • that value will only be accepted if the host that requests it is part of that domain
  • browsers may have further restrictions, e.g. most will refuse 'Domain=org' - see supercookies[2]

Path

Optional.


You can ask that a cookie be sent for all requests that are, directory-logic-wise, under a specific path.


If omitted, "the user agent will use the 'directory' of the request-uri's path component as the default value." (basically the request path up to the rightmost /)

Setting this is often done to widen that.

For example, app1.example.org/log/me/in may want to set Path=/ when it sets a login cookie


Matching will be

  • from the start
e.g. Path=/docs will not match /my/docs
  • full directory name (up to the next slash or end of the string), not substring
e.g. Path=/docs will match /docs and /docs/
e.g. Path=/docs will not match /docsets


Note that when you have a reverse proxy in front of your app, the path (also host, and potentially domain) for the application may not be the one the browser sees, which can lead the browser to reject or just not send a cookie.

Having such a proxy rewrite cookies is possible, but not always easy to do well.

Secure

Optional.

Is a flag, takes no value.


This is the server requesting that the UA only sent the cookie when doing HTTPS requests to the originating server, and not in HTTP requests.

This should make it more resistant to snooping and man-in-the-middle attacks.

HttpOnly

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Optional.

Is a flag, takes no value.

Supported since 2010 or so. [3][4]


Http-only cookies cannot be accessed in browser-side scripting on pages served by the domain.

...yet will will be sent back by the browser as usual, including in XHR/Fetch requests(verify)


This lets you make a hard split between

cookies that don't need to be read out by scripting (like login tokens),
cookies that you specifically want to use from scripting (e.g. remembering parts of UI state).


The idea being even if your page has XSS (Cross-Site Scripting) issues, inserted scripting cannot read out or alter that cookie.

However, HttpOnly was only ever meant as a useful mitigation, never as a secure solution.

While XSS cannot read/steal the cookie, there are still certain flaws.

  • XSS may in certain cases still effecticely replace/overwrite the cookie's value (but not read it)(verify) (consider attacks such as that creating many new cookies when a browser has a limit per domain - this can flush the oldest, and replace the value with a new cookie).
  • CSRF
  • XST

SameSite

Rejection of invalid cookies

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Incompleteness and invalidity will lead browsers to reject cookies.


According to RFC 2965, section 3.3.2, invalid cookie-store requests are those that satisfy one of the following:

  • the path in the cookie is not a prefix of the URI path of the page that set-cookie was requested from (you can't set other paths' cookies)
  • the UA request's effective host not match the cookie's domain (this can be a potential bother when using reverse proxies, named virtual hosts and such. You often want to use request information for the cookie)


And, in practice, a number of domain/host requirements

  • often the minimum-total-dot requirement:
    • Two for .com, .net, .org, .edu, .gov, .mil, and .int (verify) (For example, there are two in .example.com, so it passes this test)
    • one for .local (verify)
    • Three otherwise (verify)
  • when the domain in the cookie implies that the host part has a dot
    • e.g. a set-cookie from www1.webservers.example.com for domain .example.com implies that the host is www1.webservers, which contains a dot so is invalid. For this example, you would want to specify the domain .webservers.example.com
  • the domain does not follow general DNS rules (made of letters, digits, and hyphens [a-z0-9.-]). Note that intranet naming does not always necessarily keep to DNS rules, e.g. by containing underscores. See also Microsoft KB 909264, RFC 952, RFC 1123.


In browsers, incompleteness or invalidity of a cookie may mean the cookie will not be set at all, be used but not persist, and this may differ between browsers and specific types of invalidity.


Note that in reverse proxying, the path and host for the application may not be the one the browser sees, which can (rightly) lead it to reject the cookie (or just not send it as expected). Having a proxy rewrite cookies is possible, but not always easy to do well.

Limitations

Size

Storing data directly in a cookie (rather than a token to refer to data elsewhere) is possible, but you cannot count on cookies storing more than approximately 4KB.

Browsers have rules about per-site size as well as total cookie size, in part because having large cookies means larger requests makes all requests to the host/path, application (or even domain, if you set the domain and path broadly) larger so a little slower, particularly if there are many large cookies.

Amount

The spec gives no limits, real world may give you maybe 50-300 cookies if you're lucky, but probably shouldn't count on getting more than 20.


Keep in mind that browsers are free to e.g. delete the least-used cookies once you reach such a per-domain limit, or some 'total' limit.

Further notes

A server can effectively remove a cookie in several ways.

  • you can mention the name with an empty value, and
  • you can mention the name and a new expiration date - one in the past.

Javascript

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Flaws relating to security

Privacy

Login

When you log into a website, what it typically does is

check your credentials once, then stores a cookie that "I recently checked your credentials" that it can actively verify
on all later requests (until the cookie, i.e. login, expires), actively verify that cookie


That cookie usually stores a large randomly generated number, that is completely meaningless in itself, except that both sides remember it for a while, and on the server side it is associated with the user you logged in as.


There are a handful of basic security details, such as

make that number large, so that trying random numbers would guessing take thousands of years, so infeasible"
not accidentally giving someone runtime control of your webpage scripting, as they could just read it off
it's a good idea to use the HttpOnly flag, which tells the browser "this cookie is just for you to send back, not for scripting have access to at all" (by default it's both)


...but given such care, this is a pretty good system.


It's also the basis of most login systems, largely because without having any way to persist that "I've checked you", you would have to send your credentials every new connection (which for a browser is a at least one and typically a for every refresh)

Minor evil? Just a tool?

See also

  • RFC 2109, 'HTTP State Management Mechanism' ((technically) obsoleted by...)
  • RFC 2965, 'HTTP State Management Mechanism'
  • perhaps read RFC 2964, Use of HTTP State Management