Mod wsgi notes

From Helpful
Jump to: navigation, search
Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · plugin notes · unsorted

Server stuff:

Dynamic server stuff:

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Application URL mounting - WSGIScriptAlias and WSGIScriptAliasMatch

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


WSGIScriptAlias statically maps an absolute vhost path to a WSGI script, for example:

WSGIScriptAlias /myapp /local/path/wsgiscripts/myapp.wsgi

Notes:

  • The .wsgi file need not be within the DocumentRoot tree
    • ...putting it (and your supporting modules) elsewhere means you'll never serve them accidentally
    • Importing your further modules will probably require adding the path they are in to sys.path
    • ...and since apache's access control is probably restrictive, you'll need to tell apache it may access the directory you do put those in. (verify)
  • You may want to avoid a final slash. If you use /myapp/, a request for /myapp won't be a prefix match
    • so yes, this detail only applies to the single case of the implied index page
  • You can add more than one of these mappings. They will be matched in order, which means that if you apply overlapping things, apply them in the order you want. So e.e. put a root/fallback handler last.
  • If you mount to the root (/), then all requests go via WSGI, including things like favico.ico and robots.txt. If you want to make apache handle do static file handling for specific files or directories, it may make sense to Alias them(verify)
  • WSGIScriptAlias works like a combination of Alias plus Options ExecCGI plus SetHandler wsgi-script, and you can configure things that way (in some cases that way may be preferable -- including cases where you would want to use AddHandler instead of SetHandler, to mix handlers within a directory)(verify)


See also:



WSGIScriptAliasMatch

WSGIScriptAliasMatch is a regex-matching variation of WSGIScriptAlias.

It can be useful for controlled mapping to script filenames (though see also the mention of mapping WSGIScriptAlias to a directory, above).


For example, if you want /image/something to go to image.wsgi, /uni/ to go to uni.wsgi, and so on, without doing that all manually, you might use something like:

WSGIScriptAliasMatch ^/wsgi-scripts/([^/]+) /web/wsgi-scripts/$1.wsgi

A request for /image/show would then go through image.wsgi


See also:

.wsgi files

That is, the files pointed to by WSGIScriptAlias and friends.

Putting the .wsgi extension on it (anything not .py) is purely a convention so that you wn't think of it as a regular python module, because there are differences.


A file with python code, conventionally containing a callable named application.

May contain anything you choose, really

  • one or two dozen lines that do things like
    • altering sys.path or other environment details to make things work
    • basic configuration for the main dispatched app
    • dispatching
  • hand over control to a framework (such as django).
  • a whole application
    • ...this often isn't very portable or convenient. You can't easily import this as a module.
    • ...and mod_wsgi won't consider these as part of #Code_reloading
    • Most of the limitations of these files seem to be by design - they were intended to be entry points, not workhorses.

Paths

Your DocumenRroot is not in your python path, so attempts to import things that are there or alongside the .wsgi file, which itself can and arguably should be elsewhere, will likely fail.

If you need modules that are specific to the virtualhost's code, you'll need them on the sys.path.

You can do this in code probably your .wsgi file, with something like:

path = '/path/to/mysite'
if path not in sys.path:
    sys.path.append(path)

You may like to know that (since mod_wsgi 2.0) WSGIPythonHome will act like site.addsitedir()[1], which means that eggs on the path are also handled.

This also makes it useful to point to a virtualenv's site-packages.

Process/daemon control

Embedded versus daemon mode

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Embedded mode (the default) means the application is hosted in (an interpreter in) the Apache worker process.

  • slightly faster than daemon mode, probably mostly because it avoids the overhead of communication to the daemon. (Startup overhead when a new child starts is unavoidable)
  • for best performance under load you may need (apache-wide) MPM tweaking.
  • Code changes are not picked up automatically until an apache restart, so not so handy while debugging
  • The only method in windows
  • configuration perhaps a little harder to get right (or at least, easier to do something strange)
  • details vary on the MPM actually used


Daemon mode means mod_wsgi starts up and manages separate processes

  • creates a set of daemon processes (mount is configurable) which requests are delegated to
  • apache acts like a proxy to the actual process executing WSGI (but apache itself still serves static files where WSGI does not apply)
  • auto-reloads when source date changes (stops and starts the mentioned daemon processes) (verify)
  • tuning can be done for these processes, and separately per daemon process(...group) (which can be a lot handier than doing it apache-wide via MPM settings)
  • may scale better under load (partially because it can be tweaked better, partly because it can avoids the memory overhead per apache child)
  • More suitable for shared hosting in that it's easier to separate applications (particularly if they need to be run as different users)

(For more comparison, see http://code.google.com/p/modwsgi/wiki/PerformanceEstimates)

Notes common to daemon mode and embedded mode

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Daemon mode notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

You define the use of invidvidual daemon processes, using WSGIDaemonProcess, and name it.

You can define one within an apache virtualhost or outside (globally). There is little difference, except how you can assign them: In a particular virtualhost you can tell mod_wsgi to use a specific named daemon process, be it one defined in the same virtualhost, or one defined globally, but not one defined in another vhost.


All WSGI apps with the same process group will execute within the context of the same group of daemon processes.

This mainly lets you re-use daemon processes - but note that subinterpreter isolation still applies, unless you override that too.


A daemon group consists of one or more multi-threaded processes (see also [2])

  • the threads are created by the APR, but (once they are created/used) they act exactly like usual python threads, and you can do all the locking, threading.local() stuff you are used to (there were some past bugs related to this, but as far as I know they're gone now)


For example, to have a vhost-specific daemon (two processes, each limited to 15 threads), you could add something like:

# Create a daemon group with name "example.com"
WSGIDaemonProcess example.com processes=2 threads=15

# send apps within this vhost to process group called "example.com"
WSGIProcessGroup example.com


Notes:

  • Not using WSGIProcessGroup means you're actually running in embedded mode (so you'll need both directives, though of course WSGIDaemonProcess can be global)



WSGIDaemonProcess

  • When creating a name, you can use a fixed name (as above), but also use certain variables, including:
    •  %{SERVER} - for distinct processes for different server hostname:port values
    •  %{RESOURCE} - like %{SERVER}, but adds the SCRIPT_NAME
    •  %{GLOBAL} - run in the parent interpreter (and not in a subinterpreter. Sometimes necessary, for python extensions that get confused by subinterpreters).
    •  %{ENV:variable} - the value of the request's environment variable.
  • further options allow you to control:
    • the amount of daemon processes (defaults to 1) (note: setting this value will set wsgi.multiprocess to True, even when you set it to 1)
    • the amount of handler threads in each daemon process (defaults to 15)
    • the amount of requests to handle before restarting a daemon (default is never)
    • who to run as (e.g. user=me group=users), useful for cases where apache runs as root (which is a bad idea, though)
    • various debug options
    • some path options
    • ...more

Embedded mode notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Embedded mode is the default, and it will be used unless you explicitly use WSGIProcessGroup.

WSGIApplicationGroup still applies.


Administrative restrictons

As the webserver admin, you may be interested in:

  • WSGIRestrictEmbedded
    • Basically means "Don't allow embedded mode; give an error Unless daemon mode is set up."
  • WSGIRestrictProcess
    • Restrict which daemon process names you can use
    • Can for example be useful for sysadmins that want to allow selection via .htaccess (possibly via %{ENV})


On streaming

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

http://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html

MPM tweaking

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
In daemon mode

Apache is just proxying to a fixed set of WSGI daemon processes, so the choice of MPM barely matters. (you can leave it up to other conditions, e.g. thread-unsafe PHP extensions)

Daemon mode's memory use tends to be a lot more predictable since the apache children (typically configured to scale to load) no not run the code at all.

At the apache side, the only thing left is that its amount of handlers should probably be slightly higher than the amount of handlers in the wsgi daemon processes.


In embedded mode

Each apache child has a python interpreter. It needs time to start up.

The default apache config has a lowish minumum of standby handlers, which means it will only create more handlers when it's under load, which will mean you'll get the dynamic app's startup overhead at the worst time.

Setting a larger number of standby handlers will have more more initialized interpreters around all the time.

...yes, at the cost of memory, yes, but memory that your config implied eventually using anyway. Tweaking the amount of extra processes is something you have to do yourself. If you want more predictability, consider daemon mode.

How much this early initialization matters depends on how significant it is (and the reason it's low by default because for apache's static file handling the overhead is very small).


See also:

Lazy loading, preloading

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Code reloading

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

...which you presumably need for development, or on a server that isn't only yours.


You can use daemon mode, in which it's on by default. The default method is probably the behaviour you want, even if it's not the fastest possible method. (you might find daemon mode preferable for production anyway - for different reasons)



The two relevant settings:

  • WSGIScriptReloading is on by default. You can disable it for production environments, but know that that means that if you ever do need to reload code, the only option to do so is restarting apache.
  • WSGIReloadMechanism how the reloading is actually done. Assuming you're using mod_wsgi 2.0 or later, there are two options:
    • Process
      • default in daemon mode, cannot be used in embedded mode
      • stops and restarts the process, creates a new one in its place (starting the code from scratch)
      • (...at the time of the first request after the change)
    • Module
      • default in embedded mode (also usable in daemon mode, but generally there's no reason to)
      • reloads the module that config points to (often a .wsgi file)
      • behaves differently from a basic reload()


In python in general there is no truly best way of reloading code in-place (largely because of dependent modules).

While there are hackish ways that may work well enough for your specific project, the simplest method is to start from scratch - start a new process (or interpreter, but that is made more complex by subinterpreters).

As such, the 'Process' method works best, but it will only work in daemon mode (because in embedded mode it is apache that owns and controls the processes, and in some MPMs it may never be killed at all).


'Module' will only reload the .wsgi file - which is not treated as a regular module-in-a-file, and reloading is not entirely equivalent to a reload(), more like executing it again.

The .wsgi file is usually a proxy script with configuration, meaning this reload will often reload the configuration, not so much the code (and even then the exact implications depend a little on how you use the setings there and whether/how you instantiate your app). If you use frameworks (like django and such), then both the app and most of its settings are elsewhere, and the Module method won't really do much at all.

In embedded mode you'll only get the 'Module' method -- so in embedded mode the surest option to really really reload everything in embedded mode is to restart apache.

If you place all your code in this wsgi file, then it will get reloaded. But by basic pyhton behaviour, nothing you import from this module will be. Sometimes that's fine, sometimes that's not enough - particularly for larger projects involving organization of a lot of code into modules.



See also:

Semi-sorted

PyEval_AcquireThread: non-NULL old thread state

You probably have both mod_wsgi and mod_python loaded, and both with their own statically linked copy of the standard library (libpython), which can lead to the runtime linker being confused and to confused state of python.

The workaround is to compile mod_wsgi (or either? both?) with the library dynamically linked.

The fix is to stop using mod_python.



Premature end of script headers

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


You can probably cause this explicitly if you really wanted to, but usually it's a bug.


If it happens only rarely and you use maximum-requests, it may well be related to daemon mode process restarts. When a handler takes longer than apache wants to wait (5 sec? unconfigurable?) it will kill it unelegantly, and cause this error.

(Long-running requests/responses can cause this. If you think this might be because of uploads (which can always be from slow clients), consider using using nginx for (particularly request) buffering.


If it happens consistently (...for a code path...), the browser sees no response (and will eventually decide to time out), this often means the daemon process (for the request) crashed.

Apps running in the same process group may not respond, but you may want to set process=1 for more consistent failing behaviour(verify)

You may see a "segmentation fault" (or something similarly serious) in the logs, or nothing (In some caes, LogLevel info in apache config may yield more helpful information).

Possible causes (many unverified)

  • it has been associated with warnings -- don't I don't yet understand how
    • fixable by ignoring the relevant (or all) warnings?(verify)
    • ...or asking for warnings to be thrown as errors?(verify)
  • C extensions that weren't written to behave well in subinterpreters
  • mixing python version (sometimes implied by system python and a different one pulled in by mod_wsgi)
  • incompatible shared library versions (between mod_wsgi, apache, php, others?)
    • e.g. PHP using the non-reentrant version of the mysql lib, and python using the re-entrant one
  • mod_python conflict


You may get half a workaround in switching to embedded mode(verify)


Fatal Python error: Couldn't create autoTLSkey mapping

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

TLS in this context means Thread Local Storage (not Transport Layer Security).


Apparently, when you do a process fork (via subprocess or some other way) from a threaded program (possibly particularly for threads not created by python itself?), there is some post-fork cleanup in python itself that seems to say this.

It seems to be somewhat specific to python 2.7, and seems fixed since 2.7.3.


While it is a python error, it is only brought out by a few things, mod_wsgi being one of them (may be related to subinterpreters?(verify) if so, then using WSGIApplicationGroup %{GLOBAL} may help(verify)).



I haven't really cared to figure this out yet. Do tell me if you do.


See also:

Unsorted

Notes:

  • Remember that the code will run as whatever user the server (e.g. apache) runs as.
consider e.g. filesystem access
  • Given the WSGI ability for multithreading, applications must be re-entrant (concurrently executed), so state should be kept within that application, and global/external data should be intended to be shared, and probably locked to avoid clobbering, racing and such.
  • Given apache's varying process models (MPMs), data that should be synced between all applications on a host is ideally handed around externally (database, memcache, maybe shared memory), or you'll probably be counting too much on behaviour of a specific MPM and possibly incompatible with others.


Memory leaks

I had one set of processes go berserk occasionally.

Hints:

  • use separate daemon processes where possible, and use the name= to indentify them in top.
(e.g. my largest ones is a large but perfectly valid reverse index)
I noticed it was my "the rest, the little stuff" that was the issue; munin showed it was doing the constant-growth-eventually-getting-oomkilled thing.
  • See if it's a module that isn't thread-safe, or doesn't understand threading or mod_wsgi's use of and subprocessors (both of which are good for efficient and isolated handlers)
Mostly some modules weren't written with these in mind. Sometimes it leads to nonsense behaviour (shared state and all), sometimes to racing that brings our real issues.
to see if this is the problem, see if running the code only in the main interpreter (and not in a subinterpreter) fixes it
WSGIApplicationGroup %{GLOBAL}
in my case it did, which tells me my options are roughly
figure out which module/code is causing this and isolate it (e.g. do the global-interpreter for only it)
leave it like this -- less efficient request handling, but not bad either, and for my miscellaneous stuff it's fine.
  • nothing good with good ol' logging
if you have half an idea what code it is, basic at this time I started X', 'at this time it finished and took this long' type logging



See also