Apache config and .htaccess - semi-sorted

From Helpful
Jump to: navigation, search
Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · plugin notes · unsorted

Server stuff:

Dynamic server stuff:

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

Apache warnings

AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1

means you

  • only have ServerName in vhost containers
  • don't have a (public) IP that resolves to an FQDN

ServerName is used when the server needs to identify itself, and since all content is served from your vhosts (where you necessarily have a ServerName), it's perfectly fine to ignore this warning.

If you have an FQDN, you may as well set it globally (i.e. put a ServerName line outside of vhost containers).

If you don't, but like quieter logs, you can do the same with, probably, ServerName localhost.


Apache errors

apache server reached MaxClients setting, consider raising the MaxClients setting

If your site is a busy one, upping that limit is what you want.

When it' not the busy sort (and it runs slow or is unreachable), you probably have some connections that are staying open very long.

mod_stats (/server-status) may be informative.


(URL and filesystem restrictions)

Apache won't allow anything outside of DocumentRoot, unless explicitly allowed.

Some installations are restrictive by default, meaning that each location need specific allowing of everything you'll need.

In particular things specified through Options may easily be disabled by default.


...particularly around virtual hosts, because each new DocumentRoot is a previously unknown directory, so will require some specific allows (usually in the form of a <Directory> section).


http://apache.active-venture.com/sections.html


client denied by server configuration

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

...mentioning the client, and the filesystem directory that is being denied.


Reasons include:

  • Allow/Deny (and Order) / Require lines that deny explicitly
...not uncommonly one that is inherited from a restrictive default.
  • Use of files outside DocumentRoot, without a <Directory> to allow it
  • Alias without Directory or Location
  • proxying without <Location> or <Proxy>


Additionally

  • you may be overlooking the scoping of sections; a <Directory> in one virtualhost has no effect on other virtualhosts
  • it may not be clear why a directory is being accessed at all - or, in the case of dynamic content, even when it is.

/etc/apache config and .htaccess

When you host various sites, central administration is partially a boon.

...but continuously applying small per-site changes by user request becomes tedious. For this case, you can allow users to do most config in .htaccess files within the DocumentTree.


When things don't work, usually the cause is one of:

  • The directive makes no sense in this context
some belong in global config, some in virtual hosts, some per directory, some cannot be used in .htacess
the documents will mention this [1]
Basically, the global config has restricted what kind of things you can do in .htaccess files.
  • .htaccess not being read at all (due to AllowOverride none)
Testable by mashing your keyboard and saving that as .htaccess
(avoids some IO)


See also:

Settings for...

Virtual hosts

Historically, one hostname pointed to one IP address, and an IP address typically to one specific physical computer. Which means one website per physical computer. Not very flexible.

Named virtual hosts means a webserver at one IP address can serve for many hostnames. (This relies on HTTP 1.1, which say you must specify the name of the host you're accessing in the request text. Minimal HTTP clients are usually HTTP 1.0 plus this hostname as the sole 1.1ism)


Short story:

  • You can
    Listen
    to one or more IP addresses, on one or more ports.
  • Individual VirtualHost containers are matched on IPaddress + port + ServerName (from connection's IP address, its port, and the HTTP request's hostname)
Requests that do not match a VirtualHost container will end up at the first-defined vhost.
When serving from one IP, I like to mention the IP and port in both Listen and VirtualHost (just to make it clearer to myself that I get exactly the matches I expect).
When serving on multiple IPs, it really depends on why you are doing so. Usually if you know why you want that, the matching you want is also obvious enough to you.


To configure this type of virtual hosting

  • tell apache the IP (or even multiple IPs) that may receive different host requests:
# Listen on all interfaces:
Listen 80
# OR listen on specific interface
Listen 192.168.1.12:80
 
 
# NameVirtualHost became optional in 2.3.11
#   When it sees a IP:port combination used in multiple VirtualHosts entries,
#   name-based virtual hosting is automatically enabled for that address.
NameVirtualHost *:80
 
# define <tt>VirtualHost</tt>s at will, e.g
<VirtualHost *:80>
  ServerName   www.example.com
  DocumentRoot /var/www/www
  #other config
</VirtualHost>
 
<VirtualHost *:80>
  ServerName   images.example.com
  DocumentRoot /var/www/images
  #other config
</VirtualHost>

Notes:

  • Mentioning the port in the VirtualHost is optional when it is 80
but can be clearer for everyone when you (may ever) use HTTPS
  • You can serve specific servernames on only some IPs, but * (for 'all apache-bound IPs) is usually easier
  • HTTP 1.0 requests without a hostname can (in general, not specific to apache) lead to an error, or to the server to decide to serve from a default virtual host.
  • (The other thing people can mean with 'virtual host' is IP-based virtual hosts: one computer has one site per IP, but multiple IPs, so multiple sites. In the olden days was the first/only way to host multiple sites, but these days it is the impractical option. I will not detail it here.)


See also: http://httpd.apache.org/docs/2.2/vhosts/details.html


Virtual hosts and SERVER_NAME

By default, apache takes the value of the SERVER_NAME and SERVER_PORT variables from your configuration.

When you have set up virtual hosts and want to rewrite based on hostnames, you probably want to tell apache to use the value in each request's Host: header by setting:

UseCanonicalName Off

See http://httpd.apache.org/docs/2.0/mod/core.html#usecanonicalname

There is more to this, particularly when you're doing reverse proxy type things.


vhost organization

Various unix/linux distributions have some vhost management, for example a list of files that contain one or more vhost definitions, that are imported via something along the lines of Include /etc/apache2/vhosts.d/*.conf.


Debian-style organisation

Debian (and ubuntu) have a central config that basically just includes:

  • /etc/apache2/conf-enabled/*.conf  (config files for general stuff like security, charset)
  • /etc/apache2/sites-enabled/*.conf (config files vhost stuff)
  • /etc/apache2/mods-enabled/*.conf  (config files with LoadModule, IfModule stuff)


The idea is that these things are symlinks into the respective -available/ directories, which you can do via:

  • a2enconf
    /
    a2disconf
  • a2ensite
    /
    a2dissite
  • a2enmod
    /
    a2dismod


This mostly just create symlinks. For example, when you
a2ensite 000-default
a site, it basically just does
ln -s /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-enabled/000-default.conf 

...except with some extra checking, in the case of modules it can enable dependencies, and a few more such details.


Yes, if you don't care for the a2* tools you could do much the same just my moving these files in and out of the -enable directories.


Semi-sorted

server-status

       <Location /server-status>
               SetHandler server-status
               Require local    # for munin and the likes
               # I'm on the same net as my server, so I do something like:
               Require ip 192.168.1.
       </location>
       # Keep track of extended status information for each request
       ExtendedStatus On



Connections stuck in W

The cause is typically that a response started but has not finished, e.g.

the client is reading responses slowly, or you're moving large things
something like nginx makes the buffering local so more predictable so moves it from apache worker pool to (mostly) client-specificness
dynamic stuff are contending for CPU, IO, database
you can often see this in top, debug with some logging (PID, time, step in whole tends to reveal the slow step)
dynamic stuff scales poorly - you've started writing but are
dynamic stuff is stuck on a lock or so
harder to debug, depending a little on what it is (e.g. a db transaction will show up easily enough, a library mutex not so much)
you gave the client the wrong Content-Length while doing persistent connetions



The result is often

that it occupies enough slots to run out, and stop responding
can also cause a browser to hit its 'amount of connections to a server' limit, an stop requesting
and refuse to request anything else until these time out
can be triggered by loading 100 images from the same host


Options:

to make buffering more predictable, use something like nginx
if you want more connections from each client, consider e.g. making multiple named vhosts
if you want more scaling, consider something CDN-like


Connections stuck in R

Can be:

  • hitting a limit like open file descriptors, or a small ephemeral port range.
If that's the only reason, increase those (check that the root cause isn't DoSing or wildly inefficint code, though)
  • DoS, with cliets opening connections but not sending a request. They will time out, but enough of these will tie up all connection.
lessened by putting something like nginx in front. It's a lot more lightweight per connection, meaning the above would take system resources (TCP stack, some memory) but not load apache


Mapping URL to filesystem

Add MIME mappings

(Applies to files that apache itself serves)


You can add extension-to-mime mappings, though this may only be useful when clients know what to do with the mime types. Examples:

AddType audio/mpeg        mp3
AddType application/x-ogg ogg
AddType image/jp2         jp2 j2k


There is also a default mime type. In some configurations this is text/plain, which means unknown binary files will be shown as garbage, rather than downloaded. A saner default it so force unknowns to be downloaded:

DefaultType application/octet-stream

You may want to force this for certain existing filetypes, so that they will be offered for download and not be played by a plugin that recognizes the MIME type.

AddType application/octet-stream  avi mpg


Note that detailed AddType-ing is fairly pointless. As a web host, you can't count on browsers plugins for correct website behaviour, and without mime types, it'll just download (via DefaultType), and everyone understands downloading before opening anyway.

In a few cases, it may be interesting and/or annoying. For example, quicktime installs a firefox plugin that handles a bunch of media. That it plays mp3 in the browser may be nice to some, annoying to others. That it views JPEG2000 files is interesting, since photo enthusiasts may convince each other to install it, particularly since few image viewers/editers support it yet.

Change basic directory Options

...like enabling directory indexes when they are globally disabled, or setting what files should be treated as indices:

Options        +Indexes
DirectoryIndex index.fancy index.php index.html index.html default.htm

See: Options, DirectoryIndex

Cache headers

Most coders does not explicitly set cache-related headers. Sometimes this is fine, as a framework handles it, and apache file handling is moderately smart as well, but you may want to override its behaviour, particularly setting the max-age for static filetypes to something quite large.

Note that the Header directive requires mod_headers

See also Webpage_performance_notes#Saving_Bandwidth_by_Caching and Webpage_performance_notes#mod_expires.


Conditions and environments

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

IfDefine

<IfDefine something>

or

<IfDefine !something>
...tests for things that were passed in at startup time via
-Dsomething

It seems you cannot define these on your own.

Environment

Refer to a shell-type environment that is passed into CGI and such; see also 'Environment Variables in Apache'.


mod_setenvif:

  • SetEnvIf, SetEnvIfNoCase: Set environment based on (regex) condition that tests one of:
    • Headers in the request
    • other request metadata (remote address, local address, protocol, request method, etc.)
    • Environment already present (note this allows more complex logig if you test against earlier SetEnv variables)
  • BrowserMatch, BrowserMatchNoCase: Do something only when talking to clients that tell us they are a specific User-Agent


mod_env:

  • PassEnv: whitelist variables to accept from the shell environment that started apache.
  • SetEnv: Add your own variables
  • UnsetEnv: remove variables


Apache itself mostly mostly allows you to set environment variable, and doesn't allow their use as conditions. However:

  • There are a number of special-purpose environment settings
  • Headers can be sent on environment conditions
  • mod_rewrite (specifically RewriteCond) can test for and therefore react to them.
  • various third-party things can use them, such as mod_ext_filter
    • ...mostly to activate logic that works around agent bugs, so is usually seen on BrowserMatch directives
    • ...but also to control mod_negotiation and mod_proxy behaviour somewhat



Custom error pages

...for example an amusing 404, or one that mails or logs specific things, or dynamically generates a 'perhaps you meant...' page.

When that doesn't apply:

# Can be a simple string, for exampe 
ErrorDocument 403 "Everyone is denied on fish day!"
 
# An internal redirect, for example 
ErrorDocument 404 /Nono.html
 
# An external redirect:
ErrorDocument 404 http://www.plinko.net/404/

You can also do an internal or external redirect to a script, for example php. Internal tends to be more useful, as you retain some information about the original request (and the use won't see the URL change).


See also ErrorDocument in the docs.


Traffic limiting

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
  • a reverse proxy, if you were using one anyway, may be able to do this
  • mod_cband
    • can limit data rate, data amount, requests per second, and simultaneous connections
    • for virtualhosts, users, and/or destinations
    • (so sort of a flexible combination of mod_bandwidth and mod_limitipconn)
  • mod_evasive - reduces the effect of DoS requests
    • triggered by high-rate repeated destinations, high rates on one apache child
    • temporarily blacklists when triggered
    • Works well against DoS, decently against DDoS, and should not affect any real users.
  • mod_bw
    • maximum per vhost, shared between all requests to it (verify)
  • mod_limitipconn - most useful to limit concurrent downloads at all (probably only useful for something only serving large files, not anything that's supposed to allow quick loading of client pages)
  • mod_bwshare
    • maximum rate per client
  • mod_ratelimit
    • per connection
    • per destination


Note that reverse proxies (think pound, nginx, and such) can often also rate-limite.

See also

General: