Cookie notes

From Helpful
(Redirected from Cookies)
Jump to navigation Jump to search

Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels


This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Magic cookies

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A magic cookie, often just cookie, is a value handed between two parties, to signify something.

The reason to use them is often that the basic level of interchange between the two sides has no long-lasting memory, and if you want a returning client to be recognized as such at all (beyond a direct back and forth), you need to add that somehow (more technically, to add state to a stateless connection).


The thing that gets signified in the process is

often identity,
sometimes authentication (overlaps with tokens),
sometimes other kinds of event, or context, or agreement.

Yes, that is vaguely wide, but it really can be anything that the issuing party has us to remember, and that the other side only needs to re-reference.


You might be reminded of tokens.

That makes a lot of sense - a cookie is essentially one specific type of the wider category we today refer to as tokens.

When we call it cookie, this often points to the more one-sided things we might also call client-carried memory token:

To the issuing party, that is a reference to something that same side remembers.
To the other party, it's just something they need to return verbatim - they don't need to know what it means at all, and might not be able to even if they wanted to.

More pedantically, cookie usually means an

issuer-defined,
holder-stored,
later-referenced

...piece of state, especially where the meaning exists entirely on the issuing side.


Opaque meaning

when we call a value opaque, we mean that the holder cannot guess at what this value means.

In fact it might just be a random number. If so, it has meaning only by merit of (and only as long as) the issuer keeps that number next to the state it references. (This is also how that cookie can be made entirely opaque/meaningless to the holder)


Uses

If we stick to that narrower description, then there are a lot of there are of of cases that do this re-referencing thing, from RPC mechanism, to certain kernel communication, to game servers remembering specific clients, to tickets you buy.


The thing is that in a lot of cases, this is a core implementation detail, a specific detail within a different, wider thing.

This is why only people diving into deeper details details might call it a cookie, and why the only thing we call cookies in an everyday sense tend to come from systems that made this concept into a

So in a sense, the only thing we frequently cookie are mechanisms that made this concept into a separate tool usable more widely.


This is why the best known case is HTTP cookies (see below), but that is not their only use, and not the origin of the term.


There is a (flawed) real-world analogy in the ticket/token you get in a coat check: it will say something that has no value or use to you, but it does to whomever you hand it back to: it uniquely identifies which slot holds your coat.

...that is where the flaw comes in: due to the practicality of that coat room, that value will have a direct meaning, and one that you can probably guess at.


"Magic"?

'Magic' seems to refer to the sense of "it just works, you don't need to know why" - the fact that only the issuer needs to care.

And arguably that only it can care, i.e.

  • the fact that the holder cannot inspect it usefully - the opaqueness as that means you can't even guess what it means
  • the fact that the holder cannot compute these, useful in a security sense


https://en.wikipedia.org/wiki/Magic_cookie

HTTP cookies

HTTP cookies are gives a website the ability to remember something on your computer.


They build on the idea of magic cookies - issuer-defined, holder-stored. Mechanically:

A server can ask a visiting browser to remember a piece of text
Set-Cookie header
that same browser visiting that specific server some time later send back that same text.
Cookie header


Note that the server can only hear back the text that it originally set.

Verbatim. Only. It cannot ever get back anything else.

It's like giving someone a secret word to repeat back to you later. The word itself can be entirely meaningless, a random one you though up and wrote down. It is the fact they know this word that lets you know it's them.


The reason that that can be good, and the reason that that can be bad, are not even about the value stored in the the cookie itself, but the fact that it lets a server notice a returning visitor.


Why remember things?

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.




The No-cookie cookie

Perhaps a poignant example is that today's "I don't want cookies" preference is often stored in a cookie.

That's not so much ironic as much as it is necessary - the protocol the web speaks has zero memory, so you need to tell it what it needs to know. And as long as as it doesn't store who you are, just what you want, that's perfectly fine.

If the only thing it stores is "cookies = please don't", it's also not unique, and not exploitable.



Mechanism and syntax

Cookies are just text.


Their values and related metadata are sent in a HTTP header.

A cookie set consists of

  • a set of semicolon-separated predefined parts
  • most of which are optional (will default to certain behaviour when omitted)
  • most of which take the form of attr=value, a few of which take no value

...and actual values to store in the cookie, in the form name=value, which is actually the only required part.


For example, a server may have a response containing:

Set-Cookie: foo=bar; Expires=Mon, 09-Dec-2002 13:46:00 GMT; Secure

A browser (or UA in another form) that sees that same server name again will probably send a request containing:

Cookie: foo=bar

If multiple cookies apply, they are merged into one Cookie header, separated by semicolons (RFC6265 forbids multiple Cookie header fields[1])


Scripting?

Originally, it was only the HTTP response that could ask for cookie sets.

Assuming the browser agrees to do so, it will then send that exact information whenever it returns to that same domain/server/site/application.

Later, scripting added some extra abilities, which meant that scripting on the same page that served the cookie could alter it.

This does not change who can interact with this information - unless people made the security mistake of allowing anyone to add code to their page.






Standards and the real world
  • The first standard was Set-Cookie:, the basics of which were standardized in RFC 2109 (from 1997)
  • Set-Cookie2 was written to extend that, standardized in RFC 2965 (from 2000), with the idea that it would replace Set-Cookie
but it never really caught on, and was deprecated in 2011 by RFC 6265.
Modern browsers do not support Set-Cookie2.
  • RFC 6265 is basically an update to Set-Cookie:
  • There have been various things used in the real world since the RFC 2109 spec that became widely supported.
Some of them made it into RFC 6265, some of them are just common enough that you'ld want to support them.


Name and value

Required.

Basically just name=value

Notes:

  • attribute names are case-insensitive
  • the value should be escaped so that it will not clash with parsing -- see RFC 2616's definition of quoted-string
  • if a particular name appears multiple times in a set-cookie, only the first should be used
  • Names starting with $ are reseved and not to be used by applications

You can set a variable name to contain a new value to make the browser overwrite it.

Note that set-cookies with new values only overwrite values for a name when the old and new Domain and Path values are also equal.

Note that an Expires in the past and/or a Max-Age of 0 will cause a cookie to be discarded regardless of value (a common way to delete a cookie).


Expires

Optional, but common because a lot of uses want persistent cookies, not session cookies.


Cookies without an Expires=

will lead to the UA removing it when the UA closes.
Sometimes called session cookie - in the per-run-of-the-browser sense, not to be confused with cookies that support login sessions.


Cookies with an Expires=

should persist between different runs of the browser
until the given expiration
..or until the UA removes the cookie for other reasons
These are sometimes called persistent cookies.


Domain

Optional.


If omitted, the UA will decide to send the cookie for the hostname that set it, excluding subdomains.

If set, the UA will decide to send the cookie for the requested domain (unless refused for some reason), including subdomains.


For example,

if domain was omitted, and set-cookie was sent from app1.example.org, cookies will be sent back from visits to app1.example.org, and not example.org
if domain was omitted, and set-cookie was sent from example.org, cookies will be sent back from visits to example.org and not app1.example.org
if domain=example.org, cookies will be sent back from visits to example.org, app1.example.org, and any other subdomain of example.org


If you meant it as a site-wide login, you might want to send Domain=example.org so that it will be sent to example.org and anything under it, e.g. app2.example.org


Notes

  • can be a feature, e.g. for site-wide logins
  • can be a security risk. Consider e.g. the case where app2.example.org is hosted hosted by someone else.
  • You can only send one value
so you can't craft a list of specific allowed hosts
  • that value will only be accepted if the host that requests it is part of that domain
  • browsers may have further restrictions, e.g. most will refuse 'Domain=org' - see supercookies[2]

Path

Optional.


You can ask that a cookie be sent for all requests that are, directory-logic-wise, under a specific path.


If omitted, "the user agent will use the 'directory' of the request-uri's path component as the default value." (basically the request path up to the rightmost /)

Setting this is often done to widen that.

For example, app1.example.org/log/me/in may want to set Path=/ when it sets a login cookie


Matching will be

  • from the start
e.g. Path=/docs will not match /my/docs
  • full directory name (up to the next slash or end of the string), not substring
e.g. Path=/docs will match /docs and /docs/
e.g. Path=/docs will not match /docsets


Note that when you have a reverse proxy in front of your app, the path (also host, and potentially domain) for the application may not be the one the browser sees, which can lead the browser to reject or just not send a cookie.

Having such a proxy rewrite cookies is possible, but not always easy to do well.

Secure

Optional.

Is a flag, takes no value.


This is the server requesting that the UA only sent the cookie when doing HTTPS requests to the originating server, and not in HTTP requests.

This should make it more resistant to snooping and certain man-in-the-middle attacks.

HttpOnly

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Optional.

Is a flag, takes no value.

Supported since 2010 or so. [3][4]

By default, a cookie will be sent back to the host next time, and be available to the scripting on that page.

This requests the browser only send it back, but not give it to scripting.


This lets you make a hard split between

cookies that don't need to be read out by scripting (like login tokens),
cookies that you specifically want to use from scripting (e.g. remembering parts of UI state).


The idea being even if your page has XSS (Cross-Site Scripting) issues, inserted scripting cannot read out or alter that cookie.

However, HttpOnly was only ever meant as a useful mitigation, never as a secure solution.

While XSS cannot read/steal the cookie, there are still certain flaws.

  • XSS may in certain cases still effectively replace/overwrite the cookie's value (but not read it)(verify) (consider attacks such as that creating many new cookies when a browser has a limit per domain - this can flush the oldest, and replace the value with a new cookie).
  • CSRF
  • XST

SameSite

Rejection of cookies

Rejection of invalid cookies

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Incompleteness and invalidity will lead browsers to reject cookies.


According to RFC 2965, section 3.3.2, invalid cookie-store requests are those that satisfy one of the following:

  • the path in the cookie is not a prefix of the URI path of the page that set-cookie was requested from (you can't set other paths' cookies)
  • the UA request's effective host not match the cookie's domain (this can be a potential bother when using reverse proxies, named virtual hosts and such. You often want to use request information for the cookie)


And, in practice, a number of domain/host requirements

  • often the minimum-total-dot requirement:
    • Two for .com, .net, .org, .edu, .gov, .mil, and .int (verify) (For example, there are two in .example.com, so it passes this test)
    • one for .local (verify)
    • Three otherwise (verify)
  • when the domain in the cookie implies that the host part has a dot
    • e.g. a set-cookie from www1.webservers.example.com for domain .example.com implies that the host is www1.webservers, which contains a dot so is invalid. For this example, you would want to specify the domain .webservers.example.com
  • the domain does not follow general DNS rules (made of letters, digits, and hyphens [a-z0-9.-]). Note that intranet naming does not always necessarily keep to DNS rules, e.g. by containing underscores. See also Microsoft KB 909264, RFC 952, RFC 1123.


In browsers, incompleteness or invalidity of a cookie may mean the cookie will not be set at all, be used but not persist, and this may differ between browsers and specific types of invalidity.


Note that in reverse proxying, the path and host for the application may not be the one the browser sees, which can (rightly) lead it to reject the cookie (or just not send it as expected). Having a proxy rewrite cookies is possible, but not always easy to do well.

Limitations

Size

Storing data directly in a cookie (rather than a token to refer to data elsewhere) is possible, but you cannot count on cookies storing more than approximately 4KB.

Browsers have rules about per-site size as well as total cookie size, in part because having large cookies means larger requests makes all requests to the host/path, application (or even domain, if you set the domain and path broadly) larger so a little slower, particularly if there are many large cookies.

Amount

The spec gives no limits, real world may give you maybe 50-300 cookies if you're lucky, but probably shouldn't count on getting more than 20.


Keep in mind that browsers are free to e.g. delete the least-used cookies once you reach such a per-domain limit, or some 'total' limit.

Further notes

A server can effectively remove a cookie in several ways.

  • you can mention the name with an empty value, and
  • you can mention the name and a new expiration date - one in the past.

Javascript

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Flaws relating to security

Privacy

Login

When you log into a website, what it typically does is

  • some mechanism of check supplied credentials, once,
and in the same interaction, store some value that signifies "I recently checked your credentials" that can be actively verified (and reveals little else), here in a cookie
  • on all later requests (until the cookie and thereby the login expires), actively verify that cookie


That cookie usually stores a large randomly generated number, that is completely meaningless in itself, except that both sides remember it for a while, and on the server side it is associated with the user you logged in as.


There are a handful of basic security details, such as

make that number large, so that trying random numbers would guessing take thousands of years, so infeasible"
not accidentally giving someone runtime control of your webpage scripting, as they could just read it off
it's a good idea to use the HttpOnly flag, which tells the browser "this cookie is just for you to send back, not for scripting have access to at all" (by default it's both)


...but given such care, this is a pretty good system.


It's also the basis of most login systems, largely because without having any way to persist that "I've checked you", you would have to send your credentials every new connection (which for a browser is a at least one and typically a for every refresh)

Just a tool? Minor evil? Better than the alternatve?

On cookie laws

First-party versus third-party cookies

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


See also

  • RFC 2109, 'HTTP State Management Mechanism' ((technically) obsoleted by...)
  • RFC 2965, 'HTTP State Management Mechanism'
  • perhaps read RFC 2964, Use of HTTP State Management