Cookie notes

From Helpful
Revision as of 00:33, 21 April 2024 by Helpful (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels


This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Magic cookies

HTTP cookies

Introduction

Cookies gives a website the ability to remember something on your computer -- so specific to your computer/


Mechanically:

A server can ask a visiting browser to remember a piece of text
later visits means the client will send back that same text.


Note that the server can only hear back that it originally set. Verbatim.

It's like giving someone a secret word to repeat back to you later. The word itself can be entirely meaningless, but the fact they know it lets you know it's them, and that's the only thing it is good for.

This is the what cookies end up being used for ninety percent of the time.


Why remember things?

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The first uses were things like customizing sites.

They can help implement login systems.


Can it do anything particularly nefarious?

In itself, no, but you can imagine that this can be a useful tool in something less than wholesome.

They imply knowledge of repeated visits, which with certain... agreements means it's easy to assist targeted ads.


And there are worse things you can do.

Not easily, it can't hurt to understand this better.


Login, a.k.a. "identifying returning users when they specifically want that".

The protocol that transmits webpages is HTTP. You can think of it as text back and forth, and this communication itself has no memory. In principle, each request starts from scratch.

How would you do implement login over that?


Let's have an analogy: Say you want a secret society, anonymous members, but you do want to ensure only members can enter, and not based on a single secret shared by everyone (as that can leak out without you ever figuring out who or when).

...no, serious question - sit down, now, design a system.

Chances are you'll end up with a basic solution like giving every person a a unique pass-phrase to remember.

The pass-phrase each person has is itself meaningless (and for practical reasons probably is probably a word-soup combination of random words, as there would be enough of them for everyone).


On a connection with no memory of itself, you need to send something of significance every time.

Now we get to some implementation issues - there are at least two potential problems here::

  • it's a little too easy to reveal this, just by passive snooping, because it will be in almost all communication
  • it's a little too easy to put this in the URL, which if shared or save-as'd will share your password


The first is may be important, but is more of an implementation detail for the techs to solve.

For the interested, this is solved roughly by another layer of the same idea --
a little more technically, that token needs to be verifiable somehow (short term memory on the server of what it gave out), but its value can be meaningless. This is usually just a big random number.
so more practically, this is a randomly generated number that is forgotten on a much shorter term, so it can signify "you were logged in" for a fixed period of time.


The second, well, wouldn't it be nice if we had a system where, on network connections (and in the headers/metadata rather than content layer),

the server can ask "here's a value for you to keep. When you return to me, just send it"
the client sends it back, only as we got it?

The server can never ask for anything, so only ever gets back the values that a server first gave to us?

Soooo. That's what cookies are.


"So wait, are you arguing they're good actually, entirely harmless?"

Not quite.

It's not the actual value that is stored that is the issue. It's what the site may choose to associate with that (and also what it might be able to infer)

Again, the cookie value is itself meaningless, just as the pass-phrase you gave to a specific person is probably word soup.

This is why it's not about cookies as a mechanism.

We need a mechanism with memory, to have things like login, and any mechanism with memory is going to have issues like this.

From a "what it reveals" standpoint, it's arguably no different from what happens when you do not log out from a social site, and other sites interact with that social site for you, which, [1].


"Wait, what else might it be able to infer, then?"

Think about the secret society again.

You know that the same person returned to your door (that was the point, the requirement).

Even if you initially don't know who they are, if they return often enough, you could start to remember things about them.

Maybe you remember they're really into the cloak and dagger aspect, and sell them a nice cloak next time.

You don't need to know who they are, but you have just invented targeted ads.


Cookies to keep state without identity

Cookies to infer indentity without your wish

One of the issues is that in terms of privacy, these two are opposites.

But in terms of implementation, they're very similar.


Other purposes include identifying returning users regardless of login

...which can be used e.g. remember shopping basket state from before you logged in
or remember other state within a session, for a person who hasn't logged into an account yet, which carries into your login if and when you do (anonimity until then might even be seen as a privacy win)
Can be done anonymously, e.g. storing a number that is randomly generated and itself meaningful, but remembered for a while


This also highlights one of the real issues:

It's not the actual value that is stored that is the issue. That pass-phrase you gave to one specific person is itself meaningless (and for practical reasons probably is probably a combination of random words).

It's what the site may choose to tie to that



The No-cookie cookie'

Perhaps a poignant example is that today's "I don't want cookies" preference is often stored in a cookie.

That's not so much ironic as much as it is necessary - the protocol the web speaks has zero memory, so you need to tell it what it needs to know. And as long as as it doesn't store who you are, just what you want, that's perfectly fine.

If the only thing it stores is "cookies = please don't", it's also not unique, and not exploitable.



Mechanism and syntax

Cookies are just text.


Their values and related metadata are sent in a HTTP header.

A cookie set consists of

  • a set of semicolon-separated predefined parts
  • most of which are optional (will default to certain behaviour when omitted)
  • most of which take the form of attr=value, a few of which take no value

...and actual values to store in the cookie, in the form name=value, which is actually the only required part.


For example, a server may have a response containing:

Set-Cookie: foo=bar; Expires=Mon, 09-Dec-2002 13:46:00 GMT; Secure

A UA may then later send a request containing:

Cookie: foo=bar

If multiple cookies apply, they are merged into one Cookie header, separated by semicolons (RFC6265 forbids multiple Cookie header fields[2])


Scripting?

Originally, it was only the HTTP response that could ask for cookie sets.

Assuming the browser agrees to do so, it will then send that exact information whenever it returns to that same domain/server/site/application.

Later, scripting added some extra abilities, which meant that scripting on the same page that served the cookie could alter it.

This does not change who can interact with this information - unless people made the security mistake of allowing anyone to add code to their page.






Standards and the real world
  • The first standard was Set-Cookie:, the basics of which were standardized in RFC 2109 (from 1997)
  • Set-Cookie2 was written to extend that, standardized in RFC 2965 (from 2000), with the idea that it would replace Set-Cookie
but it never really caught on, and was deprecated in 2011 by RFC 6265.
Modern browsers do not support Set-Cookie2.
  • RFC 6265 is basically an update to Set-Cookie:
  • There have been various things used in the real world since the RFC 2109 spec that became widely supported.
Some of them made it into RFC 6265, some of them are just common enough that you'ld want to support them.


Name and value

Required.

Basically just name=value

Notes:

  • attribute names are case-insensitive
  • the value should be escaped so that it will not clash with parsing -- see RFC 2616's definition of quoted-string
  • if a particular name appears multiple times in a set-cookie, only the first should be used
  • Names starting with $ are reseved and not to be used by applications

You can set a variable name to contain a new value to make the browser overwrite it.

Note that set-cookies with new values only overwrite values for a name when the old and new Domain and Path values are also equal.

Note that an Expires in the past and/or a Max-Age of 0 will cause a cookie to be discarded regardless of value (a common way to delete a cookie).


Expires

Optional, but common because a lot of uses want persistent cookies, not session cookies.


Cookies without an Expires=

will lead to the UA removing it when the UA closes.
Sometimes called session cookie - in the per-run-of-the-browser sense, not to be confused with cookies that support login sessions.


Cookies with an Expires=

should persist between different runs of the browser
until the given expiration
..or until the UA removes the cookie for other reasons
These are sometimes called persistent cookies.


Domain

Optional.


If omitted, the UA will decide to send the cookie for the hostname that set it, excluding subdomains.

If set, the UA will decide to send the cookie for the requested domain (unless refused for some reason), including subdomains.


For example,

if domain was omitted, and set-cookie was sent from app1.example.org, cookies will be sent back from visits to app1.example.org, and not example.org
if domain was omitted, and set-cookie was sent from example.org, cookies will be sent back from visits to example.org and not app1.example.org
if domain=example.org, cookies will be sent back from visits to example.org, app1.example.org, and any other subdomain of example.org


If you meant it as a site-wide login, you might want to send Domain=example.org so that it will be sent to example.org and anything under it, e.g. app2.example.org


Notes

  • can be a feature, e.g. for site-wide logins
  • can be a security risk. Consider e.g. the case where app2.example.org is hosted hosted by someone else.
  • You can only send one value
so you can't craft a list of specific allowed hosts
  • that value will only be accepted if the host that requests it is part of that domain
  • browsers may have further restrictions, e.g. most will refuse 'Domain=org' - see supercookies[3]

Path

Optional.


You can ask that a cookie be sent for all requests that are, directory-logic-wise, under a specific path.


If omitted, "the user agent will use the 'directory' of the request-uri's path component as the default value." (basically the request path up to the rightmost /)

Setting this is often done to widen that.

For example, app1.example.org/log/me/in may want to set Path=/ when it sets a login cookie


Matching will be

  • from the start
e.g. Path=/docs will not match /my/docs
  • full directory name (up to the next slash or end of the string), not substring
e.g. Path=/docs will match /docs and /docs/
e.g. Path=/docs will not match /docsets


Note that when you have a reverse proxy in front of your app, the path (also host, and potentially domain) for the application may not be the one the browser sees, which can lead the browser to reject or just not send a cookie.

Having such a proxy rewrite cookies is possible, but not always easy to do well.

Secure

Optional.

Is a flag, takes no value.


This is the server requesting that the UA only sent the cookie when doing HTTPS requests to the originating server, and not in HTTP requests.

This should make it more resistant to snooping and certain man-in-the-middle attacks.

HttpOnly

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Optional.

Is a flag, takes no value.

Supported since 2010 or so. [4][5]


Requests frrom the browser that it sends this cookie back on on pages served by the domain (including XHR/Fetch requests(verify)), but does not expose its value to the page's browser-side scripting.


This lets you make a hard split between

cookies that don't need to be read out by scripting (like login tokens),
cookies that you specifically want to use from scripting (e.g. remembering parts of UI state).


The idea being even if your page has XSS (Cross-Site Scripting) issues, inserted scripting cannot read out or alter that cookie.

However, HttpOnly was only ever meant as a useful mitigation, never as a secure solution.

While XSS cannot read/steal the cookie, there are still certain flaws.

  • XSS may in certain cases still effectively replace/overwrite the cookie's value (but not read it)(verify) (consider attacks such as that creating many new cookies when a browser has a limit per domain - this can flush the oldest, and replace the value with a new cookie).
  • CSRF
  • XST

SameSite

Third-party cookies

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Rejection of cookies

Rejection of invalid cookies

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Incompleteness and invalidity will lead browsers to reject cookies.


According to RFC 2965, section 3.3.2, invalid cookie-store requests are those that satisfy one of the following:

  • the path in the cookie is not a prefix of the URI path of the page that set-cookie was requested from (you can't set other paths' cookies)
  • the UA request's effective host not match the cookie's domain (this can be a potential bother when using reverse proxies, named virtual hosts and such. You often want to use request information for the cookie)


And, in practice, a number of domain/host requirements

  • often the minimum-total-dot requirement:
    • Two for .com, .net, .org, .edu, .gov, .mil, and .int (verify) (For example, there are two in .example.com, so it passes this test)
    • one for .local (verify)
    • Three otherwise (verify)
  • when the domain in the cookie implies that the host part has a dot
    • e.g. a set-cookie from www1.webservers.example.com for domain .example.com implies that the host is www1.webservers, which contains a dot so is invalid. For this example, you would want to specify the domain .webservers.example.com
  • the domain does not follow general DNS rules (made of letters, digits, and hyphens [a-z0-9.-]). Note that intranet naming does not always necessarily keep to DNS rules, e.g. by containing underscores. See also Microsoft KB 909264, RFC 952, RFC 1123.


In browsers, incompleteness or invalidity of a cookie may mean the cookie will not be set at all, be used but not persist, and this may differ between browsers and specific types of invalidity.


Note that in reverse proxying, the path and host for the application may not be the one the browser sees, which can (rightly) lead it to reject the cookie (or just not send it as expected). Having a proxy rewrite cookies is possible, but not always easy to do well.

Limitations

Size

Storing data directly in a cookie (rather than a token to refer to data elsewhere) is possible, but you cannot count on cookies storing more than approximately 4KB.

Browsers have rules about per-site size as well as total cookie size, in part because having large cookies means larger requests makes all requests to the host/path, application (or even domain, if you set the domain and path broadly) larger so a little slower, particularly if there are many large cookies.

Amount

The spec gives no limits, real world may give you maybe 50-300 cookies if you're lucky, but probably shouldn't count on getting more than 20.


Keep in mind that browsers are free to e.g. delete the least-used cookies once you reach such a per-domain limit, or some 'total' limit.

Further notes

A server can effectively remove a cookie in several ways.

  • you can mention the name with an empty value, and
  • you can mention the name and a new expiration date - one in the past.

Javascript

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Flaws relating to security

Privacy

Login

When you log into a website, what it typically does is

  • some mechanism of check supplied credentials, once,
and in the same interaction, store some value that signifies "I recently checked your credentials" that can be actively verified (and reveals little else), here in a cookie
  • on all later requests (until the cookie and thereby the login expires), actively verify that cookie


That cookie usually stores a large randomly generated number, that is completely meaningless in itself, except that both sides remember it for a while, and on the server side it is associated with the user you logged in as.


There are a handful of basic security details, such as

make that number large, so that trying random numbers would guessing take thousands of years, so infeasible"
not accidentally giving someone runtime control of your webpage scripting, as they could just read it off
it's a good idea to use the HttpOnly flag, which tells the browser "this cookie is just for you to send back, not for scripting have access to at all" (by default it's both)


...but given such care, this is a pretty good system.


It's also the basis of most login systems, largely because without having any way to persist that "I've checked you", you would have to send your credentials every new connection (which for a browser is a at least one and typically a for every refresh)

Just a tool? Minor evil? Better than the alternatve?

On cookie laws

See also

  • RFC 2109, 'HTTP State Management Mechanism' ((technically) obsoleted by...)
  • RFC 2965, 'HTTP State Management Mechanism'
  • perhaps read RFC 2964, Use of HTTP State Management