Cookies

From Helpful
Jump to: navigation, search
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Introduction

Cookies give a website some memory. A server can ask a visiting browser to remember a piece of text, and on a later visit repeat it back verbatim.


Originally that was the only mechanism: A server/site/application asks a browser to store some information. If the browser agrees to do so, it will then send that exact information whenever it returns to that same domain/server/site/application.

Later, scripting added some extra abilities, mostly particular the fact that (browser-side) scripting can ask the browser to change cookie contents - on behalf of the server that served the page. (This is mostly a programmer convenience for some more interactive pages)


One of the easiest things a cookie that lets a server do is recognize that a specific browser returns to visit. It's like giving someone a specific word to repeat back later - you'll know it's them. Many uses of cookies are some variant of this "yup, this is still me" use, but there are others - it depends on what is stored, and how it's used.

Purposes include:

  • keeping users logged in
    • i.e. a token representing "we have recently checked this user's authentication" (and we check that token with via server state).
  • identifying returning users (regardless of login)
    • Can be done anonymously, e.g. storing a number in the cookie (a number that is randomly generated and not meaningful, except in its repeated use in this session)
    • ...which can be used e.g. remember shopping basket state even when you are not logged in yet
    • or remember other state within a session, for a person who hasn't logged into an account, or for a site without accounts
  • extension of the previous point: tracking user visits, usage, or the order in which pages are accessed within a site, for example
doing so requires identifying returning users, cookies happen to be the most convenient way of doing so.
anonymously recording the set of pages that a specific browser visits on a site (or within the domain in which the cookie is set)
otherwise maintaining specific information about users (often relatively anonymously unless you choose to identify yourself to a site/domain that does this, or happen to be identifiable another way)

Javascript

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Mechanism and syntax

Cookies are from many views just pure text. Their values and related metadata sent in a HTTP header.

A cookie set consists of

  • a set of semicolon-separated predefined parts
  • most of which are optional (will default to certain behaviour when omitted)
  • most of which take the form of attr=value, a few of which take no value

...and actual values to store in the cookie, in the form name=value, which is actually the only required part.


It is fairly common to, beside the name=value, also set an expiry time. Setting a path and domain is also not unusual.

Set-Cookie: foo=newvalue; expires=date; path=relative/path; domain=.example.org.
Set-Cookie: foo=bar; path=/abs/path; expires Mon, 09-Dec-2002 13:46:00 GMT; secure


Set-Cookie: (RFC2109)

Name and value

Required.

Basically just name=value

Notes:

  • attribute names are case-insensitive
  • the value should be escaped so that it will not clash with parsing -- see RFC 2616's definition of quoted-string
  • if a particular name appears multiple times in a set-cookie, only the first should be used
  • Names starting with $ are reseved and not to be used by applications

You can set a variable name to contain a new value to make the browser overwrite it.

Note that set-cookies with new values only overwrite values for a name when the old and new Domain and Path values are also equal.

Note that an Expires in the past and/or a Max-Age of 0 will cause a cookie to be discarded regardless of value (a common way to delete a cookie).


Expiration

Optional.

If omitted, the UA will remove the cookie when it closes - called a session cookie (in the per-session sense).

Cookies with an expiration date will persist between different runs of the browser, until the given expiration. These are sometimes called persistent cookies.


domain

Optional.

If omitted, the browser defaults it to the host that set it. If you wanted it domain-wide, that's not ideal. In reverse-proxied setups, the browser doing this may be preferable (unless you want a specific wider domain to be set).

The domain/host that sets it must be, or be a member of, the host/domain it wants to set.


This can be handy when you, for example, don't want visits to www.example.com to be treated as different from visits to example.com, particularly for logins.


Path

Optional.

If omitted, the browser defaults it to the path that generated the cookie(verify): the requests path up to the rightmost /.

Useful when an application wishes to isolate certain cookie data to just itself, via the pathname that the applicationis hosted under.


If the host represents an application, it may be simpler for your code to force this the path to be '/' so that it applies application-wide and you don't have to worry about where you set it from.

Secure

You can optionally require that a cookie should only be set if the connection is secure (HTTPS). this is primarily extra protection against

Set-Cookie2: (RFC2965)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Set-Cookie2 is an extended form that defines additional attributes, and additional behaviour requirements (see e.g. RFC 2965, section 3.3.1).


It is mentioned that the name must come before attributes, which RFC 2965 it mentions means that you can not cause name collisions. It is probably a good idea to never use a name that matches a pre-defined cookie attribute, though, as not all cookie implementations may be smart about this. This is probably a good idea for Set-Cookie: as well.


Note on support

The idea behind RFC 2965 (from 2000) was that Set-Cookie2 would replace Set-Cookie, but it caught on very slowly. Most modern browsers do now support both Set-Cookie: and Set-Cookie2:.

Version

Required in the Set-Cookie2 header.

Version=1 refers to implementation according to RFC 2965. There is currently no other version(verify).

Discard

Optional.

Tells the browser to delete the specified cookie, unconditionally (can be handier than doign so via expiration/max-age).

Max-Age

Seconds to keep the cookie.

An easier (and timezone-independent) way to handle expiry.

Zero means that the browser should delete the cookie.


CommentURL

Port

Non-standard extensions

HttpOnly

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

From Microsoft.

The idea is that for cookies with this property present, the browser denies scripting from accessing that cookies, so that it cannot be altered or read/stolen via XSS.

It makes sense in that practically, script-interactive use of cookies (such as remembering part of page UI state) can be separated from cookies that have more serious things actually worth stealing (such as login tokens).


HttpOnly was only ever meant as a useful mitigation, never as a solid security solution.

While XSS cannot read/steal the cookie, there are still certain flaws. XSS may in certain cases still effecticely replace/overwrite the cookie's value (but not read it)(verify) (consider attacks such as that creating many new cookies when a browser has a limit per domain - this can flush the oldest, and replace the value with a new cookie).


Most browsers have supported it for a while. For details, see e.g. http://www.greebo.net/2008/03/25/httponly-update/


Rejection of invalid cookies

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Incompleteness and invalidity will lead browsers to reject cookies.


According to RFC 2965, section 3.3.2, invalid cookie-store requests are those that satisfy one of the following:

  • the path in the cookie is not a prefix of the URI path of the page that set-cookie was requested from (you can't set other paths' cookies)
  • the UA request's effective host not match the cookie's domain (this can be a potential bother when using reverse proxies, named virtual hosts and such. You often want to use request information for the cookie)

There are also a number of logical domain/host requirements

  • often the minimum-total-dot requirement:
    • Two for .com, .net, .org, .edu, .gov, .mil, and .int (verify) (For example, there are two in .example.com, so it passes this test)
    • one for .local (verify)
    • Three otherwise (verify)
  • when the domain in the cookie implies that the host part has a dot
    • e.g. a set-cookie from www1.webservers.example.com for domain .example.com implies that the host is www1.webservers, which contains a dot so is invalid. For this example, you would want to specify the domain .webservers.example.com
  • the domain does not follow general DNS rules (made of letters, digits, and hyphens [a-z0-9.-]). Note that intranet naming does not always necessarily keep to DNS rules, e.g. by containing underscores. See also Microsoft KB 909264, RFC 952, RFC 1123.


In browsers, incompleteness or invalidity of a cookie may mean the cookie will not be set at all, be used but not persist, and this may differ between browsers and specific types of invalidity.


Note that in (non-transparent) reverse proxying, the path and host for the application may not be the one the browser sees, which can (rightly) lead it to reject the cookie. Having a proxy rewrite cookies is possible, but not always easy to do well.

Limitations

Size

Storing data directly in a cookie (rather than a token to refer to data elsewhere) is possible, but you cannot count on cookies storing more than approximately 4KB.

Browsers have rules about per-site / total cookie size, in part because having large cookies makes all requests to the host/path, application (or even domain, if you set the domain and path broadly) a little slower, particularly if there are many large cookies.


All this isn't really that large a problem as it is regularly just as convenient to store just an identifier in a cookie, which refers to server-stored state of any simplicity/size you choose.

Amount

You shouldn't count on getting more than 20 cookies for any domain, and note that the browser might delete the least-used cookies if it has some amount of total cookies (e.g. the 300 mentioned a few places, 50 in IE7[1]).



Further notes

Deleting a cookie can be done in several ways. You can mention the name with an empty value, and you can mention the name and a new exporation date - one in the past.


Cookie reponses (Cookie: headers) may merge var-val pairs from different origins into one. Cookie libraries will handle such details for you.


When using reverse proxies to apps, note that domain and path should be as the browser sees them, as it makes the decisions about storing and sending cookies based on those.


You can read out cookies for the current page using scripting, and you can set a cookie for the page's originating site. Note this is done purely done at the browser side, and can mean several changes before the browser makes its next HTTP request with a Cookie: header.

This can be useful for things that only ever change interactively, such as interface selection, and is used in google analytics.


Cookies and security

Login

When you log into a website, what it typically does is check once, then stores a cookie with the meaning "I recently checked you" (referring with a randomly generated number to an entry in the site's database, so you can't just fake such a cookie).

While there are more secure ways, this is a sensible thing to do.

Minor evil?

There are sites that consider cookies a Bad Thing.


This often consists mostly of alarmism drowning out the one or two decent points.

Plus the good points aren't really about cookies, in that it's mainly about what is stored and why, not how it is stored.


The short story is that cookies can only store data verbatim:

  • data that the came from the server side OR
  • data that the web page's scripting already had

It cannot take information not already accessible to the browser. It can only store what it already has.


There are still less-than-wholesome uses of that.

Cookies easily identify your return to a site. One question is how that is used.

This can be split into

  • very specific returns - e.g.
login tokens, the things that keep you logged into a site (note that these are usually randomly generated numbers known to both sides temporarily, and signify nothing more than "this token means the person authenticated recently" - they do not reveal passwords)
anything that lets people steal/snoop this is a realistic problem.
...because it lets someone effectively log in as you (not necessarily very long, and the degree of nastiness depends on how much you can do - most sites don't let you change the password, they ask you for your old one)
done through scripts, sometimes snooping at access points, or such.
  • otherwise anonymous returns
in particular ad networks - they do this on many sites, and try their best to match return visits in different places.
They often do not actually identify a person, just know that user 3853745923 visited both site X and site Y. Profiling types of users is valuable already, to which ad to serve. You can argue whether that's a violation of privacy or not.
...and sometimes less wholesome things than this.
This was easier for them when third party cookies applied to the embedded ads themselves. Browsers are now stricter with those.



See also Security_notes#Cookie-like_things

See also

  • RFC 2109, 'HTTP State Management Mechanism' ((technically) obsoleted by...)
  • RFC 2965, 'HTTP State Management Mechanism'
  • perhaps read RFC 2964, Use of HTTP State Management