Cache and proxy notes

From Helpful

(Redirected from Cache notes)
This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.


Contents

Proxy

In the dictionary definition and most technical contexts, a proxy is an entity that does something on your behalf and/or which you do something through.


Proxy server

A proxy server forwards requests (for a file, service, connection, web page, or such, depending on the exact type of proxy) to other servers and (when necessary, which is 'usually') also makes sure that the response ends up where it should.

From different contexts, the term 'proxy server' can refer to proxying of just HTTP (for basic web surfing), may handle non-HTTP services (additionally, or even only), may proxy any network connection.


Proxies are usually used for one or more of the following reasons:

  • caching: particularly in web content, parts may be cached at several points between end client and server, which improves reaction time and reduces network load
  • connection sharing: multiple computers can use the same internet connection (although there are alternatives at a lower networking level, such as NAT, that do not involve servicing specific requests and often take less administration bother)
  • network isolation: in such shared connections, the clients can be part of an isolated/private network, while only the proxied connections go outside
  • identification/anonimization: an end server can will think the request came from the proxy.
    • This can be used to e.g. make sure only students use a university's licensed content (often a web-based proxy that you have to log in to)
    • if you set up a proxy for general use and don't log its use, the end server's logs can only know requests came from a proxy, and not the end user
  • filtering options: since you are routing all traffic via the proxy, you can block content, transform content, and eavesdrop for usage statistics and such.


Transparent proxy

A transparent proxy (a.k.a. intercepting proxy) is a proxy that acts as a network gateway, enabling it to automatically proxy certain connections (meaning the end server sees the proxy, not you, as a client), without knowledge of the client.

In the form of transparent proxies, this is often employed by ISPs for the decreased bandwidth use and increased speed. Large organizations regularly also do this, whether they use a private IP network or a public one (as various universities do), as it:

  • makes it easy to add a caching web proxy
  • is easy on administration (upsides of a proxy without any necessary client configuration)
  • makes it easy to check user (employer) use/abuse, and block content if necessary

Reverse proxy

A reverse proxy is variation, named for its use in the reverse direction of what is generally considered a proxy.

For example, a connection to a web server might be to a server that does nothing more than route that connection from that front-end server to to an internal web server that does the work for that request, selectively or even for all incoming requests.


It could be compared to (and is in some cases even implemented using) DNAT, but is usually more specific as it may happen at a higher protocol level, and may be configured in more detail. For example, it may happen for specific parts (URL paths) of a site.


Reverse proxies can be used...

  • for security in that you can control which specific internal servers/services are exposed
  • for security in that the the reverse proxy can deal with attacks so that the web servers on an internal network don't necessarily have to
  • to do certain work so that the backing web servers don't have to, e.g.
    • caching static content
    • making it do encryption work
    • compress compressible content
  • to scale and load balance applications: the reverse proxy can distribute jobs to various backing web servers
  • to spoonfeed slow clients: reading an internal web server's dynamic response so it can close the process and go on to the next request, while the proxy feeds the response to the slow client

See also


(Mostly abstract) cache types

These are primarily notes
This is probably not going to be complete in any real sense, and exists to contain bits of useful information.

Size limitations, logic related to items entering and leaving

FIFO cache

Least Recently Used (LRU)

Somewhat like a FIFO cache, in that it has a limited size and housekeeping is minimal. Instead of throwing away items created the longest ago (as in a basic FIFO cache), it throws away items that were last accessed the longest ago.

Used when you expect a distribution of a few often-accessed items, which will stay in the cache.

Basic implementations are fairly easy linked list / queue deals.


Because items could stay in the cache indefinitely, there is often also some timeout logic, so that common items will leave at all - usually be refreshed, as they will likely immediately ne created and cached again.



Real-world caches

CPU's memory cache

OS caches

These are a bunch of quick jots worth noting down but not complete in any way (and probably won't make it up to well-written text and possibly not even stub status).


An OS may choose to cache various content, often particularly that related to filesystem use (metadata and data).

For example, linux's cache use is mostly:

  • the page cache
  • the inode cache
  • the dentry (directory entry) cache

(When it comes to culling, the page cache is more easily cleared than the inode and dentry caches(verify))


You can flush these (2.6 kernels), which can be interesting to IO benchmarking and similar tests. According to the kernel source's Documentation/filesystems/proc.txt:

To free pagecache:

sync; echo 1 > /proc/sys/vm/drop_caches

To free dentries and inodes:

sync; echo 2 > /proc/sys/vm/drop_caches

To free pagecache, dentries and inodes:

sync; echo 3 > /proc/sys/vm/drop_caches

The sync isn't really necessary, but can be a little more thorough for tests involving writes. Without it, dirty objects stay in the cache, seeing as this is non-destructive cache clearing.




You can also tweak the way the system uses the inode and dentry cache -- which is sometimes handy for large file servers, to avoid oom_kill related problems, and such.

You may want to google for setting names you get from

ls /proc/sys/vm

and

sysctl -a | grep vm.

memcache

On the web

HTTP caching logic

See Webpage_performance_notes#Saving_Bandwidth_by_Caching

Transparent caches

For example, Squid is often used as a transparent proxy that does nothing else but cache content that can be cached.

Useful for home LAN sharing, companies, ISPs, and such, to save bandwidth (and lower latency on some items) by placing cacheable content closer to its eventual consumers.

Web server caches

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.


On access times