Apache config and .htaccess - semi-sorted

From Helpful
Jump to navigation Jump to search
Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels


These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

Apache warnings

AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1

means you

  • only have ServerName in vhost containers
  • don't have a (public) IP that resolves to an FQDN

ServerName is used when the server needs to identify itself, and since all content is served from your vhosts (where you necessarily have a ServerName), it's perfectly fine to ignore this warning.

If you have an FQDN, you may as well set it globally (i.e. put a ServerName line outside of vhost containers).

If you don't, but like quieter logs, you can do the same with, probably, ServerName localhost.


Apache errors

apache server reached MaxClients setting, consider raising the MaxClients setting

You are likely to hit max clients with one or more of:

  • some handlers take a long time - and are used often enough to eventually hold up all slots
  • you have a bug where some connections don't get closed until some timeout
  • your site is very busy


If handlers never take much time, and you only get this because of a peak amount of requests, you probably indeed want to increase maxclients.

...but this is rarer than having some handlers stick around.

When your side isn't very busy at all, but still runs slow and/or becomes unreachable after a while, you want to look at the handlers.


mod_stats (/server-status) may be informative.


(URL and filesystem restrictions)

Apache won't allow anything outside of DocumentRoot, unless explicitly allowed.

Some installations are restrictive by default, meaning that each location need specific allowing of everything you'll need.

In particular things specified through Options may easily be disabled by default.


...particularly around virtual hosts, because each new DocumentRoot is a previously unknown directory, so will require some specific allows (usually in the form of a <Directory> section).


http://apache.active-venture.com/sections.html


client denied by server configuration

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

...mentioning the client, and the filesystem directory that is being denied.


Reasons include:

  • Allow/Deny (and Order) / Require lines that deny explicitly
...not uncommonly one that is inherited from a restrictive default.
  • Use of files outside DocumentRoot, without a <Directory> to allow it
  • Alias without Directory or Location
  • proxying without <Location> or <Proxy>


Additionally

  • you may be overlooking the scoping of sections; a <Directory> in one virtualhost has no effect on other virtualhosts
  • it may not be clear why a directory is being accessed at all - or, in the case of dynamic content, even when it is.


Request-URI Too Long

Assuming you want to increase the length of acceptable URLS, look at LimitRequestLine

max bytes in the request line
default is 8190


(there are some distinct but similar limits that apply in other cases, e.g. LimitRequestFieldSize, see that same link)

/etc/apache config and .htaccess

When you host various sites, central administration is partially a boon.

...but continuously applying small per-site changes by user request becomes tedious. For this case, you can allow users to do most config in .htaccess files within the DocumentTree.


When things don't work, usually the cause is one of:

  • The directive makes no sense in this context
some belong in global config, some in virtual hosts, some per directory, some cannot be used in .htacess
the documents will mention this [1]
Basically, the global config has restricted what kind of things you can do in .htaccess files.
  • .htaccess not being read at all (due to AllowOverride none)
Testable by mashing your keyboard and saving that as .htaccess
(avoids some IO)


See also:

Settings for...

Virtual hosts

Long ago, one hostname pointed to one IP address, and an IP address typically to one specific physical computer.

Which means one website per physical computer. Not very flexible.


Named virtual hosts means a webserver at one IP address can serve for many hostnames. (This relies on HTTP 1.1, which say you must specify the name of the host you're accessing in the request text. Minimal HTTP clients are usually HTTP 1.0 plus this hostname as the sole 1.1ism)


Short story:

  • You can Listen to one or more IP addresses, on one or more ports.
  • Individual VirtualHost containers are matched on IPaddress + port + ServerName (from connection's IP address, its port, and the HTTP request's hostname)
Requests that do not match a VirtualHost container will end up at the first-defined vhost.
When serving from one IP, I like to mention the IP and port in both Listen and VirtualHost (just to make it clearer to myself that I get exactly the matches I expect).
When serving on multiple IPs, it really depends on why you are doing so. Usually if you know why you want that, the matching you want is also obvious enough to you.


To configure this type of virtual hosting

  • tell apache the IP (or even multiple IPs) that may receive different host requests:
# Listen on all interfaces:
Listen 80
# OR listen on specific interface
Listen 192.168.1.12:80


# NameVirtualHost became optional in 2.3.11
#   When it sees a IP:port combination used in multiple VirtualHosts entries,
#   name-based virtual hosting is automatically enabled for that address.
NameVirtualHost *:80

# define <tt>VirtualHost</tt>s at will, e.g
<VirtualHost *:80>
  ServerName   www.example.com
  DocumentRoot /var/www/www
  #other config
</VirtualHost>

<VirtualHost *:80>
  ServerName   images.example.com
  DocumentRoot /var/www/images
  #other config
</VirtualHost>

Notes:

  • Mentioning the port in the VirtualHost is optional when it is 80
but can be clearer for everyone when you (may ever) use HTTPS
  • You can serve specific servernames on only some IPs, but * (for 'all apache-bound IPs) is usually easier
  • HTTP 1.0 requests without a hostname can (in general, not specific to apache) lead to an error, or to the server to decide to serve from a default virtual host.
  • (The other thing people can mean with 'virtual host' is IP-based virtual hosts: one computer has one site per IP, but multiple IPs, so multiple sites. In the olden days was the first/only way to host multiple sites, but these days it is the impractical option. I will not detail it here.)


See also: http://httpd.apache.org/docs/2.2/vhosts/details.html


Virtual hosts and SERVER_NAME

By default, apache takes the value of the SERVER_NAME and SERVER_PORT variables from your configuration.

When you have set up virtual hosts and want to rewrite based on hostnames, you probably want to tell apache to use the value in each request's Host: header by setting:

UseCanonicalName Off

See http://httpd.apache.org/docs/2.0/mod/core.html#usecanonicalname

There is more to this, particularly when you're doing reverse proxy type things.

Things go to the wrong vhost, it seems to be ignoring ServerName, what gives?

A common mistake seems to be to use <VirtualHost *:80> in some vhosts and <VirtualHost an_actual_ip:80> in others.


Basically, all vhosts are first sorted into sets via this argument.

And each of these sets gets a default vhost (the first vhost defined in it(verify) which if included via sites-available is filesystem-sort based), which is probably the reason requests end up in (the default vhost of ) another another vhost set than you think.

apache2ctl -S will list the server names in each set, and the default in each set, so is rather useful to see if this is the issue.


(I'm not sure whether it picks the IP-based vhost set or the * one by default, or just the first matching - I suspect the last - but in any case this is basically just weird undefined behaviour you want to avoid anyway)

vhost organization

Various unix/linux distributions have some vhost management, for example a list of files that contain one or more vhost definitions, that are imported via something along the lines of Include /etc/apache2/vhosts.d/*.conf.


Debian-style organisation

Debian (and ubuntu) have a central config that basically just includes:

  • /etc/apache2/conf-enabled/*.conf  (config files for general stuff like security, charset)
  • /etc/apache2/sites-enabled/*.conf (config files vhost stuff)
  • /etc/apache2/mods-enabled/*.conf  (config files with LoadModule, IfModule stuff)


The idea is that these things are symlinks into the respective -available/ directories, which you can do via:

  • a2enconf / a2disconf
  • a2ensite / a2dissite
  • a2enmod / a2dismod


This mostly just create symlinks. For example, when you a2ensite 000-default a site, it basically just does

ln -s /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-enabled/000-default.conf 

...except with some extra checking, in the case of modules it can enable dependencies, and a few more such details.


Yes, if you don't care for the a2* tools you could do much the same just my moving these files in and out of the -enable directories.


Semi-sorted

server-status

       <Location /server-status>
               SetHandler server-status
               Require local    # for munin and the likes
               # I'm on the same net as my server, so I do something like:
               Require ip 192.168.1.
       </location>
       # Keep track of extended status information for each request
       ExtendedStatus On



Connections stuck in R

R means we are reading the request coming in from the client.


It's intended to be a very short phase. If it lasts long enough to notice, it's probably tying up a worker.


Causes include:

  • large(/throttled) file uploads(verify)
  • slow clients, but that's unlikely (unless it's...)
  • DoS in the form of clients opening connections but not sending a (full) request (like slowloris). They will time out, but enough of these will tie up all connections.
you can move this problem onto something like an nginx servr in front. While this only moves the exact same problem onto another system, nginx is often configured to be more lightweight per connection, meaning the above would take system resources (TCP stack, some memory) but not tie up an apache worker (which is heavier, unless maybe nginx is your primary webserveer).
and some other tactics, like a lower Timeout (see below)
  • A dynamic script expecting to read client state that is not being sent, that has no reason to stop waiting
there is a reasonable workaround (for some cases) in setting a Timeout, because in the read phase this amounts to 'how many seconds to wait on an empty TCP buffer before deciding a connection is probably dead'.
Its default is a minute, so this is a stopgap at best.
That said, how much do you lower this? a handful of seconds? Under serious load, the balance between 'many workers tied up' and 'rejecting things unecessarily' becomes somewhat impossible.


  • hitting a limit like open file descriptors, or a small ephemeral port range.
If that's the only reason, increase those (but first check that the root cause isn't DoSing or wildly inefficient code, though)

Stuck in W state

W is the part where apache has started sending data to the client.

Like most other states, if it gets stuck it'll tie up a worker, which will eventually means no more workers.

Outputs are usually small enough to get sent quickly, so (like R) you'll typically just see a few W states due just to timing.


(Note that seeing a lot of W state is not the same as stuck in W state. I had one test app where a browser requested hundreds of images on a page, which looked like a block of six W requests but each was actually each doing 15req/s)


Getting stuck there can mean

  • client that is slow to accept data
can happen on large downloads
something like nginx can offload this from the apache worker pool (and move it more towards a limit per client?), making buffering on the apache side more predictable
  • dynamic code that is handling our request, but is slow to produce all its data
paused or stopped sending for any reason (but didn't bork out), such as
maintenance scripts that are actually running from apache (e.g. nightly cleanup code for a webapp, run from crontab or such). This particular case is usually fine, unless they run so long that these pile up.
dynamic stuff contending for IO, database (or other things) - you'll often see this in top. Can often debug with some logging (PID, time, step in whole tends to reveal the slow step), sometimes profiling is nicer
dynamic stuff spending significant time on blocking calls / locks
harder to debug, depending a little on what it is (e.g. a db transaction will show up easily enough, a library mutex not so much). You may well want a profiler.
  • running so many children that the implied memory use makes the server trash
  • you gave the client the wrong Content-Length while doing persistent connections (verify)


Some of those can be alleviated just by figuring out the complete response before handing it to apache (mostly: don't stream into chunked transfers when you don't need it)


Lots of K state

When browsers request keepalives, apache will keep them around for, and disconnect them after, KeepAliveTimeout, which defaults to 5 seconds.

So you would except to see a bunch of K, which is useful for a lot of browsers, because it makes for faster turnaround (and less connection overhead) for further requests from the same webpage.


Five seconds is a reasonable value. Higher would make more slots idle (implications vary little by MPM), lower than maybe 2 may mean browsers may need to connect more often.


If you are only serving pages that imply no further requests (rare, but might be true for APIs), but clients may still request it, you could consider lowering or even disable the timeout.

Stuck in G state

Mapping URL to filesystem

Add MIME mappings

(Applies to files that apache itself serves)


You can add extension-to-mime mappings (note this is only useful when clients know what to do with the mime types). Examples:

AddType audio/mpeg        mp3
AddType application/x-ogg ogg
AddType image/jp2         jp2 j2k


There is also a default mime type. In some configurations this is text/plain, which means unknown binary files will be shown as garbage, rather than downloaded. A saner default it so force unknowns to be downloaded:

DefaultType application/octet-stream

You may want to force this for certain existing filetypes, so that they will be offered for download and not be played by a plugin that recognizes the MIME type.

AddType application/octet-stream  avi mpg


Note that detailed AddType-ing is fairly pointless. As a web host, you can't count on browsers plugins for correct website behaviour, and without mime types, it'll just download (via DefaultType), and everyone understands downloading before opening anyway.

In a few cases, it may be interesting and/or annoying. For example, quicktime installs a firefox plugin that handles a bunch of media. That it plays mp3 in the browser may be nice to some, annoying to others. That it views JPEG2000 files is interesting, since photo enthusiasts may convince each other to install it, particularly since few image viewers/editers support it yet.

Change basic directory Options

...like enabling directory indexes when they are globally disabled, or setting what files should be treated as indices:

Options        +Indexes
DirectoryIndex index.fancy index.php index.html index.html default.htm

See: Options, DirectoryIndex

Cache headers

Most coders does not explicitly set cache-related headers. Sometimes this is fine, as a framework handles it, and apache file handling is moderately smart as well, but you may want to override its behaviour, particularly setting the max-age for static filetypes to something quite large.

Note that the Header directive requires mod_headers

See also Webpage_performance_notes#Saving_Bandwidth_by_Caching and Webpage_performance_notes#mod_expires.


Conditions and environments

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

IfDefine

<IfDefine something>

or

<IfDefine !something>

...tests for things that were passed in at startup time via -Dsomething It seems you cannot define these on your own.

Environment

Refer to a shell-type environment that is passed into CGI and such; see also 'Environment Variables in Apache'.


mod_setenvif:

  • SetEnvIf, SetEnvIfNoCase: Set environment based on (regex) condition that tests one of:
    • Headers in the request
    • other request metadata (remote address, local address, protocol, request method, etc.)
    • Environment already present (note this allows more complex logig if you test against earlier SetEnv variables)
  • BrowserMatch, BrowserMatchNoCase: Do something only when talking to clients that tell us they are a specific User-Agent


mod_env:

  • PassEnv: whitelist variables to accept from the shell environment that started apache.
  • SetEnv: Add your own variables
  • UnsetEnv: remove variables


Apache itself mostly mostly allows you to set environment variable, and doesn't allow their use as conditions. However:

  • There are a number of special-purpose environment settings
  • Headers can be sent on environment conditions
  • mod_rewrite (specifically RewriteCond) can test for and therefore react to them.
  • various third-party things can use them, such as mod_ext_filter
    • ...mostly to activate logic that works around agent bugs, so is usually seen on BrowserMatch directives
    • ...but also to control mod_negotiation and mod_proxy behaviour somewhat



Custom error pages

...for example an amusing 404, or one that mails or logs specific things, or dynamically generates a 'perhaps you meant...' page.

When that doesn't apply:

# Can be a simple string, for exampe 
ErrorDocument 403 "Everyone is denied on fish day!"

# An internal redirect, for example 
ErrorDocument 404 /Nono.html

# An external redirect:
ErrorDocument 404 http://www.plinko.net/404/

You can also do an internal or external redirect to a script, for example php. Internal tends to be more useful, as you retain some information about the original request (and the use won't see the URL change).


See also ErrorDocument in the docs.


Rate/traffic limiting

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)
  • a reverse proxy, if you were using one anyway, may be able to do this
  • mod_cband
    • can limit data rate, data amount, requests per second, and simultaneous connections
    • for virtualhosts, users, and/or destinations
    • (so sort of a flexible combination of mod_bandwidth and mod_limitipconn)
  • mod_evasive - reduces the effect of DoS requests
    • triggered by high-rate repeated destinations, high rates on one apache child
    • temporarily blacklists when triggered
    • Works well against DoS, decently against DDoS, and should not affect any real users.
  • mod_bw
    • maximum per vhost, shared between all requests to it (verify)
  • mod_limitipconn - most useful to limit concurrent downloads at all (probably only useful for something only serving large files, not anything that's supposed to allow quick loading of client pages)
  • mod_bwshare
    • maximum rate per client
  • mod_ratelimit
    • per connection
    • per destination


Note that reverse proxies (think pound, nginx, and such) can often also rate-limite.

See also

General: