General WSGI notes

From Helpful
(Redirected from Python notes - WSGI)
Jump to: navigation, search
Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · plugin notes · unsorted

Server stuff:

Dynamic server stuff:

Intro

WSGI (Web Server Gateway Interface) is a callback-based API defined by PEP 333 and later revised by PEP 3333.

It is an interface standardization to allow freeer combination of apps, servers, and middleware. This is its primary grace: it made python on the web pluggable, and for apps, it does this pretty well.

Due to a somewhat unclear/underspecified spec, servers and middleware are harder to write correctly, though. Fortunately you have to touch that unless you really want to.

For tutorials, see e.g.

For a reference implementation of WSGI, see [1].


Introduction by example

See also: #Code_snippets_for_a_quick_start for hosting it in a server.


Example apps

Very basic apps look something like:

def application(environ, start_response):
    start_response('200 OK', [('Content-type','text/plain')])
    return ['Hello world!\n']

or

# Shows your environ within WSGI
def application(environ, start_response):
    start_response('200 OK', [ ('Content-type', 'text/plain') ])
    for k in sorted(environ):
        yield '%-30s:  %r\n'%(k,environ[k])

or

# Some people prefer to wrap apps in a class. 
#   This is equivalent (WSGI just wants a callable)
#   and useful when you want to instantiate with some state (I never have, but hey)
class Enver(object):
    def __call__(self, environ, start_response):
        start_response('200 OK', [ ('Content-type', 'text/plain') ])
        for k in sorted(environ):
            yield '%-30s:  %r\n'%(k,environ[k])
 
application = Enver()

If you want to quickly serve these as a test, see #Code_snippets_for_hosting

Helpers, higher levels, frameworks and such

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

There are various other styles you can write in, which can be anything between 'a few helpers' and 'a whole framework'. There are also some python frameworks whose setup let them wrap-and-return things into WSGI apps to serve them that way.

These formalisms/frameworks/helpers include:

  • falcon
  • WerkZeug
  • bobo
  • WebOb (uses a Request/Response object style that you may be used to, and adds other parsing) [2] [3], part of paste
  • Routes, a python version on Ruby on Rails's routing
  • restish REST-style URL dispatcher, uses webob)
  • CherryPy (in that its application objects can also be used as WSGI application objects)
  • repoze (Zope stuff) [4] [5]
  • Grok

...and various others, see e.g. http://wsgi.org/wsgi/Frameworks


Note that there is overhead inherent in all convenience, which can matter particularly when comparing to minimalistic bare-WSGI setups - request rate can be pretty impressive (for a dynamic language) when you do just barely what is necessary. (the above list is very roughly from less to more overhead/convenience)

Chances are that in larger projects, you would end up writing your own approximation of the functionality that already exists in one of these, so you might as well choose one for projects on projects of any real complexity.

You might also wish to write some critical site portions in bare wsgi code to get maximum per-node requests/sec throughput, and write most of the HTML stuff using easier, higher-level frameworks and count on version (implicit and explicit) caches for certain speed aspects.

This is usually not very hard since no matter what the framework, the thing you hand to the WSGI hoster is a WSGI compliant application object.

Various things can decide to act as WSGI apps. For example, Pylons is modeled around WSGI and can be used that way. CherryPy often runs its own threadpooled HTTP server, but its application objects can also act as WSGI apps. Various other frameworks support WSGI in similar ways.


More technical

Responsibilities

The above examples are callables that conform to WSGI's basic structure:

  • environ is a dict (containing string keys and mostly string values, but often also a few objects from the WSGI host)


  • It is an application's responsibility to:
    • build the headers
    • call start_response() (with the headers and status)
    • return data in an iterable of zero or more strings
      • a list or tuple is often sensible
      • a generator (using yields) is sometimes nicer, though doing this beings in some extra footnotes
(there are a few subtle differences in the way the data leaves your application, which can have some effect on the way it is served, but you don't need to care about that unless you want to)


  • start_response is a callable that takes three positional arguments:
    • a string containing the HTTP Status-code and Reason-phrase [6], for example '200 OK'
    • headers, in a list of (name,value) tuples
    • exc_info (optional, defaults to None), used to pass exception exception information around (in the presence of an exception should be a (exceptiontype, exceptionvalue, tracebackobject) tuple (which is what sys.exc_info() returns when handling an exception)
    • returns a write function -- but you should probably generally ignore that


  • You leave it to the server to do the lowel-level serving. It may do some of its own thing.


Because this is primarily a description of behaviour, there is some flexibility while still conforming.

Including the use of middleware. For example:

# Wrap paste's error handling around this app, so that exceptions show in the browser
# in the browser, instead of a generic '500 internal server error'
from paste.exceptions.errormiddleware import ErrorMiddleware
application = ErrorMiddleware(application, debug=True)


No form, cookie or even URL/path parsing is provided by the server interface.

You may wish to use third party library (e.g. paste, or things like Webob). While the standard library has them, that tends to be a little more tedious.

For example, the following code is already relatively fleshed out, using paste:

import paste
 
def application():
    output=[]
    response_headers = []
    status='200 OK' # As a default, since you probably return it most of the time
 
    # A little parsing convenience:
    path      = environ.get('PATH_INFO','')
    reqvars   = paste.request.parse_formvars(environ, include_get_vars=True) #Note: MultiDict
    cookies   = paste.request.get_cookie_dict(environ) # ...for example
 
    if path.startswith('/hello'): #Gotta have one of these (apparently)
        response_headers.append( ('Content-type', 'text/plain') )
        output.append('Hello world!')
 
    elif path.startswith('/env'): # To give you an idea what's in the environment dict
        response_headers.append( ('Content-type', 'text/plain') )
        for k in sorted(environ):
            output.append('%-30s:  %r\n'%(k,environ[k]))
 
    # You could add elifs for your real handlers here, or of course do your 
    # URL dispatching differently, which on a larger scale you likely would.
 
    else: # Can be handy for debug, such as:
        status='404 Not Found'
        response_headers.append( ('Content-type', 'text/plain') )
        output.append('\nNo handler for this URL. Request details:\n\n')
        output.append(' Path: %r\n'%path)
        output.append(' Request/form variables: %r\n'%reqvars)
        output.append(' Cookies: %r\n'%cookies)
 
    #We can add this one:
    response_headers.append( ('Content-Length', str(sum(len(e) for e in output))) )
 
    start_response(status, response_headers)
    return output


Dispatching, mounting, and path parsing
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

A server may wish to host more than one app.

That means you'll want one root app to dispatch to others.


If your framework lets you mount apps on paths, most of what it's doing below the covers is altering PATH_INFO and SCRIPT_NAME variables, in this case in the environ dict

  • SCRIPT_NAME: The part of the URL's path part that was used to dispatch to the current application object -- basically the location this particular instance was placed at. (Can be empty, when the app is mounted at the root path)
  • PATH_INFO: The request path without SCRIPT_NAME -- i.e. the virtual path within the application


Notes:

  • Not all leaf apps need to care about these values, but a good amount does, so dispatchers (be it your framework or your own code) must do this correctly, or will have tomatoes thrown at them later.
  • An app itself usually doesn't have to care about SCRIPT_NAME, in that many need only take information from PATH_INFO
    • ...except when reconstructing absolute paths or full URLs to themselves - but you may want to use existing tools to do that for you, to catch a bunch of special cases (e.g. special headers in reverse-proxy cases) that apply in the real world
  • On escaping of those two: (unsure so far)


Pluggability, applications, and middleware

There are two sides to the WSGI API: the side the app sees, and the side the server sees.

Because both are well-defined and no-frills nature of both, things can easily be WSGI applications, be gateways to WSGI applications, or both.


When they are both, they sit between written applications and actual WSGI hosts, and we usually call it middleware.

Middleware can be useful for things like logging errors or mailing them to you, transparently supporting sessions (e.g. [7]), to do simple load balancing, to do selective content gzipping, authentication, support debugging, support profiling, give "please wait" user feedback when an application is being slow[8], and various other potentially useful things.

Writing middleware
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Write middleware only when you can explain clearly why middleware is the best form.

If it interacts with your actual app, it's not really middleware, but rather a semi-entangled pseudo-library, and is hazardous to portability. Put it in a real library.


Simple middleware is fairly easy, robust and compliant middleware is a little harder. When writing something serious it is suggested you work wsgiref.validate (on both ends) into your process.

Remember that a WSGI application is mostly a callable. Middleware both takes such a callable (to be executed later, as it must act like an application itself) and is a callable (allows itself to be called).


Since you often want to hand in some state (e.g. configuration), chances are that middleware is written in the class style.

The bare bones, would be:

class AddsNothingMiddleware(object):
    def __init__(self, app):
        self.app = app 
 
    def __call__(self, environ, start_response):
        return self.app(environ, start_response)
 
#given some application (or other middleware class instantiation) hello:
wrapped_app = AddsNothingMiddleware(hello)

This adds absolutely nothing. It passes though environ, and the start_application callable. This is about the minimum of what it must do to conform to WSGI host and WSGI app.


Say you want to do something, like set a header. This implies changing what start_response does, and the specs imply that you have to wrap that function (otherwise the responsibilities of the response become very fuzzy indeed). For example:

class StupidCookieMiddleware(object):
    def __init__(self, app):
        self.app = app 
 
    def __call__(self, environ, start_response):
        def my_start_response(status, headers, exc_info=None):
            headers.append(('Set-Cookie', "name=value"))
            return start_response(status, headers, exc_info)
 
        return self.app(environ, my_start_response)


Much middleware will do one or more of:

  • taking more arguments on __init__ than the basic application and storing it on self
  • wrap the call to the application, for example to return an error page when it raises an exception,
  • wrap start_response so that we can intercept and change the headers (instead of letting the call pass through to, eventually, the WSGI server).
  • check out exc_info to do error handling
You can capture, handle, log, and/or re-raise the exception if you wish, of course)


Doing all at once takes a little knowledge and practice to do right.


http://pylonsbook.com/en/1.0/the-web-server-gateway-interface-wsgi.html http://pythonweb.org/projects/webmodules/doc/0.5.3/html_multipage/lib/example-command-program-flow.html -->

Notes:

  • The environment is a handy place to place data to move it around(verify). Please use keys with names that are likely to be unique, though.

Notes on...

Input headers

Input headers are copied to the environ, and normalized in the process (capitalised, underscored, more?(verify)). For example, if you add "My-Header: 1", you'll get an entry like 'HTTP_MY_HEADER': '1'

It seems it's impossible to get to get at the underlying data (from strict WSGI, some servers may allow you to cheat(verify)).

environ
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Contents:

  • CGI variables (think SCRIPT_NAME, PATH_INFO, QUERY_STRING, REQUEST_METHOD, CONTENT_TYPE, CONTENT_LENGTH, SERVER_NAME, SERVER_PORT, SERVER_PROTOCOL)
  • HTTP_* (header values, mostly as in the HTTP request. Names capitalized), e.g. HTTP_HOST,
  • SSL variables [9] (see the way apache uses these), for example HTTPS, SSL_PROTOCOL
  • wsgi.something: (see the PEP for more details)
    • wsgi.version: e.g. (1,0), referring to WSGI 1.0
    • wsgi.url_scheme: e.g. 'http' or 'https'
    • wsgi.input: input steam
    • wsgi.errors: error output stream (often ends up in a log, sometimes sys.stderr)
    • wsgi.multithread
    • wsgi.multiprocess
    • wsgi.run_once
  • non-pre-defined variables, including those set by:
    • server
      • for example paste.parsed_formvars, paste.throw_errors, paste.httpserver.thread_pool
    • libraries
    • you


Note that your own variables should be named so that they are unlikely to clash, lowercased, and (verify). Note it is often a bad idea to rely on placing things in environ, unless you know why people say that an can explain why it isn't)


Most of these details copied from [the PEP http://www.python.org/dev/peps/pep-0333/#environ-variables

The return iterable, Content-Length, streaming (and write())
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


You return an interable, usually either

  • a list, of zero or more bytestrings
  • a generator, by using yield


There are implications from

  • WSGI spec (e.g. may not delay a block before reading the next),
  • HTTP,
  • Content-Length

...that may let the server do

  • persistent connections
  • chunked response
...sort of. That is, the WSGI specs require Content-Length, which implicitly prohibits chunking


The short summary of take-home suggestions include:

  • try to make chunks larger when easy/possible
...regardless of
whether you use yield or list-of-strings
whether you are aiming at a HTTP-chunked response
reason being that the server behind it has to handle each chunk separately, and
writing very small things on a socket is inefficient
any per-chunk overhead adds up for unnecessary inefficiency
the server cannot decide to merge things -- that would take all control away from you. So you have to do it yourself
when you don't care about incremental loading, then
return [''.join(parts)]
at the end of your handlers makes a lot of sense
you can write a bit of middleware-style code that aggregates things into few-kilobyte-at-least chunks.
  • Calculating Content-Length allows the underlying connections to be persistent, which is good for speed
not everything will do it, but it's a very good habit
When you are using the return style (and don't start your response before you know what your response is), then something like
response_headers.append( ('Content-Length', str(sum(len(e) for e in output))) )
goes a long way.
Note that a length-one response (such as that produced by the join mentioned above) allows backing servers add Content-Length when not present. Not all will, but it's valid for them to do so. (verify)
  • avoid write() unless you know more than you wanted to know about WSGI. It's hard to use correctly, and (even) when you know when you're doing adds little over yield style



More details

Specs say that the server/gateway/middeware must not delay a block (even in the case of a list of strings), that they must send one string to whatever underlies it (eventually the client connection) before requesting the next.

This makes it easier to avoid complex concurrency mechanisms (and related problems) in servers, but also has some implications to your coding.


For example, if you want output buffering, you must do it in your app (even doing it in middleware is technically a violation of specs, and one that you shouldn't break by design, though one which you could possibly justify as in some specific situations).


HTTP's specs around (absence of) the Content-Length header has implications on WSGI server behaviour.

  • It can send out the response without that header, and (have to) close a connection immediately afterwards (non-persistent connection; slower)
  • It could use a chunked response (if the request was HTTP1.1), in which case the connection can be reused.
Does rely on compliance with quite a bit of RFC 2616
  • If the response is a length-one list/tuple, it can itself add a Content-Length header (since it can do so without violating the one-chunk-at-a-time rule) and so the connection can be reused


If you use a generator-style iterable, the server may stream it, and it will try to so if implemented -- though details may vary(verify).

This can be handy for serving of large amounts of data, or use incremental parsing or such to avoid immediacy of other resources.

Again, without a Content-Length header it may have to close the connection(verify).



On use of write()

write() is a function returned by start_response.

Some people associate the use of write() with "low resource because I'm bypassing stuff", but this is not necessarily true.

Its use should be functionally equivalent to the generator case (although it is more producer-based than consumer-based).

If the server implements chunked coding (and technically it has to if it says it's HTTP 1.1), it may back write() calls with a chunked response.


The iterable setup is more flexible - you can choose for best throughput as well as get streamed response with it.

On the other hand, write() complicates things slightly:

  • in an application that uses write(), you should return an empty list/tuple
  • the write() implementation must guarantee either that the data was sent/buffered for transmission, so it may block
  • write() is considered a hack, and its use is discouraged
  • In the case of middleware, write adds some extra rules, such as that if the encapsulated app uses write, you must not use write() to transmit data that was yielded by an underlying application via the iterable, and must use write yourself to pass that data through.


It seems it is possible to use correctly, just more bother than it's ever worth.

Unicode

WSGI is made mostly for (coded) data transfer and makes no hard assumptions about content (Just as HTTP is).

As such, all encoding/decoding must be done by the application. Strings you hand into WSGI functions or as return data must not contain unicode characters .

The string you pass should contain only byte values (0x00-0xff). When you have a str/unicode distinction (e.g. cpython before 3) you should use str (which is a bytestring), while in in Jython, IronPython, Py3k, etc., where str is unicode-based, you can use that type as long as it contains only U+00-U+ff.

Cleanup
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

If you go the list-returning way, try/finally is a simple and handy enough way to get cleanup code to always run - though indefinitely blocking calls can still stall the handler.

Since py2.5, you can use yield in try-finally [10].

Servers can choose to kill stalling threads.


In many cases, you can also use the interpreter's atexit.


See e.g.:

Early client connection closes; blocking resources
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The effect that the TCP connection closing (e.g. pressing escape, a broken connection, etc.) has on execution of your handler depends on a few things, including how you return your data, on the server, and whether you (can) check.


The best defined case is probably the one in which your handler does all of its work in one go and returns a list (of one or more strings), because (unless you access wsgi.input or such itself) it's independent of the connection, and you can assume it will do all the work before returning it to the server that decides to throw it away (including all cleanup you need to do, which is why it may be the safest option in terms of dangling file handles, connections, and such).



You generally want timeouts on all the IO work you do in a handler. If you don't, you risk the handler hanging endlessly, which is independently of the server's connection to the client. In the general case, you can't count on the WSGI host killing off IO, particularly since even insertion of a python exception won't do anything if the blocking is in C code.


A word of warning on calling apps from apps

It may be tempting to write an app that relays to others using something like:

if path_info.startswith('/other_app'):
    import other_app
    output = other_app.application(environ,start_response)
# ...and others

Don't do that.

Yes, it can work, and sometimes you can justify a quick hack. But don't do without realizing how you are now blurring the responsibilities of the response.

  • other_app.application calls start_response (it has to, to be a valid app itself), so you cannot call this in the wrapping code (in this specific code path - you may have to in others).
  • You give up all control of the status and headers. (Also, possible use of the write() function can become even less clear).
  • The above example does not alter SCRIPT_NAME or PATH_INFO like you would expect, so applications that rely on that being set will may not work properly (...when not directly under the root path).


If you insist of doing this, you'll probably want to at least get back control of start_response, and probably alter SCRIPT_NAME and PATH_INFO so that this application work properly in URL mounting.

For start_response, you would wrap in a way similar to what middleware does (you are selectively being middleware), something like:

if path_info.startswith('/other_app'):
    import other_app
 
     def my_start_response(status, headers, exc_info=None):
        # captures the call's values
        headers.append(('Set-Cookie', "name=value")) # and you can change them if you want.
        # then emits it as our own:
        return start_response(status, headers, exc_info)
 
     output = other_app.application(environ,my_start_response)

Hosting WSGI

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

A WSGI apps's longevity (and of course the request processing overhead) depends on the type of hosting and the amount of wrapping that implies.


You probably want to host WSGI apps in moderately long-lived processes (such as apache children; see mod_wsgi), or in processes that won't be killed or only after serving a whole load of requests (then often a separate daemon/server that you HTTP-proxy to), since this lowers latency for all but the first request.


You could even use WSGI in a way CGI-like way if you really wanted to, but you'ld only do that if it were a lot simpler than one of the faster methods.

On speed: There are a number of comparisons, though no good or complete ones - that I've seen so far. That said, at least mod_wsgi, spawning (*nix-only), and CherryPyWSGIServer seem like fast enough options. paste's server can be handy for development, but is a bit slower (but still better than e.g. simpleserver).


WSGI connectors/hosts/clients include (I've not used many of these yet):

  • twisted web (twistd) [14]
  • paste.httpserver [15] (fairly simple threadpooling server)
  • mod_wsgi [16], an apache2 module (embeds interpreters in apache children, or proxies to a (apache-spawned) set of daemon processes) (see also notes on mod_wsgi in here). See also mod_wsgi notes
  • green unicorn[17], prefork-style (ported from unicorn for ruby)
  • wsgiref.simple_server [18] - useful as a simple test server (hosts one app, no URL resolution)
  • waitress[19] (pure-python)
  • cherrypy.wsgiserver (foregoes most of CherryPy, mostly uses its networking. See also [20])
  • Spawning [21] (threadpooling, more)
  • isapi-wsgi [24], an IIS plugin (there are some others)
  • some other major and minor web servers.
  • ...and more


Wrappers/gateways:

  • wsgiref.CGIHandler: Wrap WSGI into CGI (using sys.stdin, sys.stdout, sys.stderr and os.environ)
  • paste.cgiapp (takes a CGI app and wraps it into a WSGI interface)
  • Flup[[25] servers/gateways:
    • flup.server.ajp (Host WSGI apps in an AJP interface(verify))
    • flup.server.fcgi (Host WSGI via FastCGI; persistent apps)
    • flup.server.scgi (Host WSGI via SCGI; persistent apps)
    • flup.server.cgi (this will obviously be slow)
  • ajp-wsgi[26] (low level is C, with an embedded interpreter to run the WSGI. Faster than flup's ajp)
  • ...and more


See also:

Unsorted:


Code snippets for a quick start

There are many ways of hosting code. Some of the few-liners, to get you to choose one and get started:

These are just meant as copy-paste stuff to get above examples running.


WSGI reference implementation server: (common, but simple)

import wsgiref.simple_server
wsgiref.simple_server.make_server('',8282,application).serve_forever()


Paste:

import paste.httpserver
paste.httpserver.serve(application, host='0.0.0.0', port=8282)


Werkzeug / Flask (note: can also be done from code):

FLASK_DEBUG=1 FLASK_APP=containsmyapp.py python -m flask run --host 0.0.0.0 --port=8282

(FLASK_DEBUG is quite nice for debug feedback, but it's not how you'ld serve in production: it doesn't like concurrency, it doesn't like clients closing while generating data, it has the CLOSE_WAIT problem. Some of those go away when running it in typical mode, but you may also want to find another server, e.g. tornado)


Tornado[27][28] (a bit more capable, may have to install):

import tornado.wsgi
import tornado.httpserver
import tornado.ioloop 
 
server = tornado.httpserver.HTTPServer( tornado.wsgi.WSGIContainer( application ) )
server.listen(8282)
tornado.ioloop.IOLoop.instance().start()
Note: It logs via logging, so you can get more feedback like
logging.getLogger('tornado.access').setLevel(logging.INFO)
. See also [29]


Cherrypy's WSGI server:

from cherrypy import wsgiserver
server = wsgiserver.CherryPyWSGIServer(('0.0.0.0', 8282), application, server_name='localhost')
try: # if you don't have this try-catch, Ctrl-C doesn't always stop the server
    server.start()
except KeyboardInterrupt:
    server.stop()


Spawning works from the shell:

# spawn pythonfilename.appobjectname

If you want to do this without the shell, look at spawning_controller.py - running spawn calls that module's main(), which mostly just calls run_controller() after working out the options. Google around for details. One example might be:

from spawning import spawning_controller
args = { 'args': ['modulename.application'], 'host': '', 'port': 8282, }
spawning_controller.run_controller('spawning.wsgi_factory.config_factory', args, None)

Note that args is an application specified in the same way as it works from the command line, and that most settings are left to their defaults.


Tornado notes
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Tornado is centered around a single event loop [30].


On concurrency

To keep things fast, your code must always be able to return control to the ioloop quickly, whether that means coroutines, separate calculations, etc.

Quoth [31], in general, you should think about IO strategies for tornado apps in this order:

Use an async library if available (e.g. AsyncHTTPClient instead of requests).

Make it so fast that synchronously blocking the IOLoop is not noticeable

e.g. fine to implement a memcache, good enough for a well-indexed database

Do hard calculation in a ThreadPoolExecutor.

Remember that worker threads cannot access the IOLoop (even indirectly) so you must return to the main thread before writing any responses.

If there is work that can be done separately and cached, do that thing.

Things that do not need a response can be handled by a background script

e.g. "send mail" can easily be "store mail in database, count on separate mailer process to get to it"

Accept that occasional slow code in the IOLoop slows everything things down.


Note that you can get a forking server, which gets you one independent ioloop per process (helps when clients use multiple connections, and helps saturate your cores). Basically, after you construct the HTTPSserver and before you start the ioloop,

A single-process server will often be started like:

server.listen(8888)

A forked one like:

server.bind(8888)
server.start(0)  # 0 for 'detect amount of CPUs'

See also http://www.tornadoweb.org/en/stable/httpserver.html


On logging

https://www.tornadoweb.org/en/stable/log.html

util/library/middleware notes

Flask notes
Some routing examples
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Static files

# from dirs
@app.route('/css/<path:path>')
def send_css(path):
    return flask.send_from_directory('css', path, mimetype='text/css')
 
@app.route('/css/graphics/<path:path>')
def send_image(path):
    return flask.send_from_directory('image', path, mimetype='image/png')
 
 
# more hardcoded URL->files
@app.route('/favicon.ico')
def fav1():
    return flask.send_from_directory('ico', 'favicon-96x96.png', mimetype='image/png')
 
@app.route('/favicon-96x96.png')
def fav2():
    return flask.send_from_directory('ico', 'favicon-96x96.png', mimetype='image/png')


Dynamic handlers, where URL-path parts can be arguments if you wish:

@app.route('/shiftpng/<fid>')
def shiftpng(fid):
    fid = int(fid)
    return "Shift: %d"%fid
 
# the above passed it in as string, though you can restrict types and get conversion:
@app.route('/shiftpng/<int:fid>')
def shiftpng(fid):
    return "Shift: %d"%fid
 
 
# another choice would be to use plain arguments instead of a required path-based argument
# (allows them to be optional, and parsing more flexible)
@app.route('/shiftpng')
def shiftpng():
    fid = int( req.form.get(fid,'-1') )
    return "Shift: %d"%fid
The types are
string
(anything sans slash, also the default),
path
(accepts slashes),
int
,
float
,
uuid
,
any
(?)


Catch-all: (TODO: explain which applies)

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def catch_all(path):
    return 'You want path: %s' % path
Request context

So why does:

from flask import request
 
@app.route('/hello')
def hello():
    name = request.get('name',  'world')   # default to return 'world' instead of raising KeyError
    return 'Hello %s'%name

...work? request is a global, and not explicitly filled.

Short answer: Flask makes sure your handler always sees the right values (and is thread-safe)


Request request.method request.args request.cookies


Responses
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
Flask ends up using a
Reponse
object one way or the other (created by you, via the return shorthands, for render_template, make_response, etc.) (verify)


The most controlled (but longer) reponse is creating a
Response
object yourself, because it lets you at all the details, for example:
resp = Response(''.join(retbytes), mimetype='text/plain')
response.status_code = 200
resp.headers['Expires'] = 'blah' # TODO: actual value
return resp


return
ing something else is a one-liner shorthand. These variants include:
response
(response, status)
(response, status, headers)
(response, headers)

Where

anything not mentioned will have a default
e.g. status 200
Some defaults can be configured
status is an integer
response can be
byte data
list of byte data
a Response object (see e.g. example above). Sometimes makes sense to subclass.
a generator, to stream content from a function. Tends to look something like:
@app.route('/csv')
def generate_large_csv():
    def generate():
        for row in iter_all_db_rows():
            yield ','.join(row) + '\n'
    return Response(generate(), mimetype='text/csv')



If you write generator-style and get

UnboundLocalError: local variable referenced before assignment [duplicate]

Then you've just run into Python's stupid scoping rules. What's happening is a combination of two things:

  • at some point in the inner function you're (re)assigning to it. Without that it would come from the outer scope, the assignment makes it local to the inner one.
  • in that inner function you're reading from it before that assignment. Which makes it a generic case of reference before assignment.

Since there is no easy way to say "I want the parent scope", your options here are roughly:

if the use is actually local, and you don't mean to write to parent scope's variable (which in Flask is typically true for request params), then assign it into a different-named variable
if you have a class, use of self can make sense
there are some other tricks for this issue - search around
use global, if that's not a horrible hack

See also:

internal requests / unit tests
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
When what you want isn't a redirect, but actually a network request (and there are enough reasons that you can't share all the code behind the other handler, e.g. you've entangled with the argument handling a lot), you can get such a fetcher (from the underlying Werkzeug) via
app.test_client()

Meant for testing, this lives in the context of the given app, so lets you use its paths (verify)

See also:

paste.httpserver
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

paste.httpserver is relatively simple, based on BaseHTTPServer, but does allow handler threadpooling.

Simple usage example:

if __name__ == '__main__':
    from paste.httpserver import serve
    serve(application, host='0.0.0.0', port=8080)


serve() has a bunch of keyword arguments:

  • application
  • host: IP address to bind to (hand in an IP, or name to look up). Defaults to 127.0.0.1. (non-public)
  • port: port to bind to
  • ssl_pem: path to a PEM file. If '*', a development certificate will be generated for you. (verify)
  • ssl_context: by default based on ssl_pem
  • server_version, defaults to something like PasteWSGIServer/0.5
  • protocol_version (defaults to HTTP/1.0. There is decent but not complete support for HTTP/1.1)
  • start_loop: whether to call server.serve_forever(). You would set this to false if you want to call serve() to set up as part of more init code, but not start serving yet, or want to avoid blocking execution.
  • socket_timeout (default is none, which may lead to )
  • use_threadpool - if False, creates threads in response to requests. If True, it keeps a number of threads (threadpool_workers) around to use. Reduces the request time that comes from thread startup.
  • threadpool_workers: see above. Default is 10
  • threadpool_options - a dict of further options; see [32]
  • request_queue_size=5 - maximum number of connections that listen() keeps in queue (instead of rejecting them)
Paste Deployment
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


deploy is an optional part of paste. It is useful to ease configuration/mounting/use of WSGI apps.


From code, it can return a WSGI application, created/composited based on a given configuration file.

It allows (URL) mounting of WSGI apps by configuration, and can load code from eggs (and ?).


Together with packaging as eggs, this allows you to make life easier for sysadmins (since you need much less python knowledge).

See also:

AuthKit notes
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Primarily provides user login logic. Has three main pieces:

  • Authentication middleware:
    • Intercept HTTP 401 (instead of entangling on a code level)
    • present one of several supported authentication methods
    • Sets REMOTE_USER on sign-in.
  • Permission system to assign/check for \specific permissions that users may or may not have
  • Authorization adaptors:
    • Handles the actual permission check
    • Throws a PermissionError on a problem, intercepted by the middleware.


See also:

See also