CherryPy

From Helpful
Jump to: navigation, search
These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

Some of this is specific to CherryPy ≥3, most of it to ≥2.1.


Intro

CherryPy is an app engine with standalone threadpooling HTTP server. It can also act as a WSGI slave. See also the the tutorial; it covers basic explanation decently.

It is not a full framework in the ruby-on-rails sort of sense. See things like Turbogears and Django for that. However, various of us will like having more control than work-our-way-or-shoo style frameworks.


The CherryPy service is split into:

  • the server: the threaded HTTP-to-the-world server that you may well use
  • the application engine, which hosts the application(s) and does the actual work.

This is the reason you may have (the server) tell you the engine hasn't started yet. It is also made to run various unrelated apps. The server/engine and the site/app(/request) splits come back in the configuration.


From the Server API documentation page:

                 Server
                   |
                   V
              +--------+      +- - - - - +                                +-----------+
        HTTP  |        |                 |                          new   |           |
Client -----> |        | ---> |            ---> Engine/app.request() ---> | Request() |
   ^          |  HTTP  |          WSGI   |                                |           |
   |          | Server | ---> |(optional)  -------------------- run() --> +-----+-----+
   |    HTTP  |        |                 |                                      |
   +--------- |        | <--- |            <----------------------------<-------+ 
              +--------+      + - - - - -+


WSGI

Each cherrypy.Application object (mostly an object to organize handlers) is also usable as a WSGI application. Because of cherrypy's tree URL resolving, it can also be used as a dispatcher to multiple WSGI apps. You can also chain WSGI apps, and be a WSGI host to external WSGI apps. See e.g. this on WSGI in cherrypy.

URL mapping, handlers

Applications and Mounting

One CherryPy server can handle multiple applications, which are represented by a class, and a basic one is little more than a collection of handlers. A very simple case for an Application:

class HelloWorld(object):
 
    @cherrypy.expose #Exposing a function makes it (usable as) a handler.
    def index(self):
        ''' A single handler '''
        return "Hi"


An application instance can be mounted at a unique path in the mount tree. When resolving a request's URL to an application, this acts as its base path.

When you have applications HelloWorld and Regression, you could mount them like:

cherrypy.tree.mount(HelloWorld(), '/')
cherrypy.tree.mount(HelloRegression(), '/tests')

An application has direct handlers, but can also have object children which act as subdirectories, so you can build a tree of handlers in code. For exampe:

helloworld = HelloWorld()
helloworld.tests = HelloRegression()
cherrypy.tree.mount(HelloWorld(), '/')

...has pretty much the same effect as the above.


When you're trying to get a quick test running, you probably want to start with something like:

cherrypy.engine.start(blocking=False) # Start the engine in the background
cherrypy.server.quickstart()          # Start the default HTTP server
#you could do other app initialization while those threads initialize
cherrypy.engine.block()

For production, you want to read up on the server/engine details.


Exposing functions as handlers

Handlers should be callables. Exposing them can be done in two styles, with a decorator and with a member set. The effect is the same.

class Test()
    @cherrypy.expose
    def foo(self):
        return "Foo"
 
    def bar(self):
        return "Bar"
    bar.exposed=True

The default dispatcher (URL-handler resolution)

The basic URL-to-handler resolution can be summarized by translating each slash in the URL as a member lookup, as far as you have registered them. For example, /admin/user?login=foo would cause CherryPy to look for cherrypy.root.admin.user.

At each level, it check whether there is some applicable configuration ( hooks that have code to prevent it, such as a path being the root of a staticdir, would sensibly stop this URL resolution). If it ends up at a callable that is @exposed, it applies applicable configuration and calls it.

If not all of the URL was consumed, for example because there are no members deeper in the tree, or objects are impossible as code symbols because they start with numbers, this is a 'partial match' and imply that unparsed parts of the path are handed along in positional arguments.

The query string may be accepted into keyword arguments.


For example, with the following mounted at /:

def users(self, *args, **kwargs):
     return "%s, %s"%(`args`,`kwargs`)
  • /users reports
    (), {}
  • /users/foo reports
    ('foo'), {}
  • /users/foo/bar?kitten=blue&button reports
    ('foo', 'bar'), {'button': '', 'kitten': 'blue'}
  • /users/file.txt?foo reports
    ('file.txt',), {'foo': ''}
  • /users/foo/index reports
    ('index'), {}
    (without an index() handler, this is just another string)


Sometimes, this handling is almost directly functional. If you mount the following at /:

def blog(self, year, *args):
    return "%s, %s"%(year,`args`)
then accessing /blog/2007/11/02 will report
2007, ('11', '02')



Further notes:

  • Handlers called called index will get special treatment. It will match both 'index' literally, and a path ending on the path-object that index() definition was stuck on.
    • if the URL ended with a slash, it will call index(). If that slash is not there, it will send an external(verify) redirect to add the slash first.
    • index terminates matches - it does not do partial matches (worth noting since this can block nonexisting paths even if there is nothing mounted there)
  • tools and hooks they add may decide to block URL processing at some depth, e.g. if the parts so far point to the root of a tools.staticdir.
  • Putting handlers in a class is not necessary, just useful for organization to place a complete application (or subapplications) into a class. It also allows you to mount multiple instances of the class.

Other dispatchers

  • cherrypy.dispatch.XMLRPCDispatcher
  • cherrypy.dispatch.VirtualHost

Coding handlers

cherrypy.response contains the response data, headers, cookies and status.

Request

Headers

See cherrypy.request.headers. It acts as a dict.

Forms

Keyword arguments to exposed objects will take data from form fields of the same name. Actually, they will receive data regardless of whether it was supplied via GET or POST. Note that you should supply default values or risk an internal error when a value is not supplied.

You can avoid that problem, as well as the potential problem of variable form values (think checkbox behaviour) by using **kwargs instead.

Files

Files, in 2.1 and higher, are like form values, except instread of str objects, they're objects from which you can stream.

The default behaviour seems to be cgi.FieldStorage's make_file, meaning the data is downloaded to disk and automatically removed afterwards. While this is usually what you want, you can override this if you want.

If you take elements from kwargs, and like checks, it'd be roughly:

def handlerwithuploads(self, args, kwargs):
    if 'file' in kwargs:
        f = kwargs['file']
 
        size=0  #we're only going to calculate its length
        if f is not str and f.filename: #then you can assume it's a file upload
            while True:
                data = f.file.read(8192)
                if not data:
                    break
                #do something with the data

If the browser submits the field without file data, it looks like it won't fill anything but name (the field name)(verify). The above assumes filename will be filled.

The object follows the cgi.FieldStorage details:

  • You probably want to use the .file attribute, it's a file-like obejct, so you can read from it.
  • You can get the entire file from the .value, but that's a bad idea if the file can be large, as it'd all have to be read to memory.

Response

Non-content responses

Redirection, errors and other status / non-content responses are done by raising various exceptions:

  • cherrypy.HTTPError(404) (A 404 can also be done using cherrypy.NotFound())
  • cherrypy.HTTPRedirect('can/be/relative') sends a redirect response to the client
  • cherrypy.InternalRedirect((verify)) decides to process as if another path was given

For manual/other statuses, set e.g.

cherrypy.response.status = 304

See also [1].

Headers

Use cherrypy.response.headers. For example,

cherrypy.response.headers['Content-Type']='text/xml; charset=utf-8'
cherrypy.response.headers['Content-Disposition']='downloaded-article.xml'

Content

The usual example returns the entire page of data.


You could also choose to yield data, which technically means you return a generator on the first yield, which causes cherrypy to finish up the headers, then write the yielded data as it gets it. This is a way of streaming long outputs, but note that because the headers have already been sent, there's really no error handling in the code after your first yield; see [2].

Apparently you also need to set response.stream=True

See also [3]

Cookies

See http://www.cherrypy.org/wiki/Cookies

This article/section is a stub — probably a pile of half-sorted notes and assertions, some of which may well be wrong. Feel free to ignore it, fix it, or bug me about it.

cherrypy.request.cookie contains cookies sent by the browser. Read e.g. like:

cookie = cherrypy.request.cookie
for name in cookie.keys():
    print "cookie %s is set to %s"%(name, cookie[name].value)

To add/change/expire cookies, you set them in cherrypy.response.cookie (copying values from incoming cookies where applicable)

Setting:

cookie = cherrypy.response.cookie
cookie[name] = 'value'
cookie[name]['path'] = '/'
cookie[name]['max-age'] = 3600
cookie[name]['version'] = 1

Expiring a cookie is probably best done by sending the same cookie (name) with an expiration time in the past(verify):

cookie[name] = 'some value' # then:
cookie[name]['expires'] = 0

Sessions

This article/section is a stub — probably a pile of half-sorted notes and assertions, some of which may well be wrong. Feel free to ignore it, fix it, or bug me about it.

(verify)

tools.sessions allows you to store session data in persistent cookie -- provided sessions are configured to be enabled.

Example:

tools.sessions.on = True
tools.sessions.storage_type = "file"
tools.sessions.storage_path = "/home/site/sessions"
tools.sessions.timeout = 60

(timeout is in minutes)

And in code:

# Getting:
cherrypy.session.get('fieldname')
# Setting:
cherrypy.session['fieldname'] = 'fieldvalue'


See also http://www.cherrypy.org/wiki/CherryPySessions

Hooks and Tools

By default CherryPy is quite lean. Hooks allow extra functionality, and tools logically package up hooks.

Tools can be set globally or per path in the configuration, so you can have a part of your site do XML-RPC, a part serve files, a part be compressed and another not, etc.

Filters can be written to augment, or to fully replace other processing. They basically consiste of functions to be called called at specific points of a request:

  • on_start_resource
  • before_request_body
  • before_main
  • before_finalize
  • on_end_resource
  • before_error_response
  • after_error_response


gzip compression

tools.gzipprovides HTTP compression of the reponse

tools.gzip.on = True

Response unicode encoding

tools.encode converts unicode strings in the response to an encoding (by default using utf8 (verify))

tools.encode.on = True

Request unicode decoding

tools.decode automatically handles unicode data on the request, converting the raw strings that are sent by the browser into native python unicode strings.

tools.decode.on = True

Static file serving

tools.staticdir.file and tools.staticdir.dir serve files or directories from the filesystem at a time.

[/style.css]
tools.staticfile.on = True
tools.staticfile.filename = "/home/site/style.css"
 
[/static]
tools.staticdir.root = '/home/site/'
tools.staticdir.on = True
tools.staticdir.dir = "static"

It seems paths must be absolute, either directly or through relative to an also specified staticdir.root.


You can hand these a dict containing an additional fileextension:MIME mapping in case you need to, something like:

tools.staticdir.content_types = {'rss':  'application/xml',
                                 'atom': 'application/atom+xml'}

Page cache

tools.caching allows you to cache results. This is based on the URL by default and kept a minute by default, so this doesn't really apply to dynamic pages other than RSS feeds.

Configuration

In both a configuration file and a config dict, you have global configuration options and those that are paths, relative to the app's root. The following file and dict are equivalent:

[global]
server.socket_port = 8080
 
[/service/xmlrpc]
xmlrpc_filter.on = True
settings = {
    'global':            { 'server.socket_port':  8080 },
    '/service/xmlrpc':   { 'xmlrpc_filter.on':    True }
}


You can update with a file and dict using one of:

cherrypy.config.update(file="my.conf")
# or
cherrypy.config.update(settings)

global

Logging

The relevant options, which are [global], are:

  • server.log_to_screen
  • server.log_file (startup, messages, visits)
  • server.log_access_file (visits in common log format)

When you assign a filename to log_access_file that's where that log will go instead of the screen. To explicityle log nothing to the screen, assign False to log_to_screen.

Tracebacks

Tracebacks don't need to go to the browser. This is useful for debugging, but may be useless, and sometimes too informative, production. I believe it's:

server.show_tracebacks = False

autoreloader

As you may notice, the engine conveniently restarts itself when you edit the source. You can disable that if you wish, and otherwise change it.

In production environments you probably want to disable the autoreloader since it'll just mean pointless filesystem checking.

autoreload.on = False

Helpful objects

Note that a thread handles one request at a time. Some of the below objects are removed-and-destroyed on each request.

  • cherrypy.request contains all information relevant to the current HTTP request, post-parsing, including cherrypy.request.headers, request method, request line, remote host information, and cookies
  • cherrypy.session automatically generated and encoded by CherryPy; can be used to store session data in a persistent cookie. ('tools.sessions.on' must be True in the config)
  • cherrypy.thread_data contains data for this handler thread only (use to threadsafe data storage)


cherrypy.tree keeps track of where things are mounted, and allows you to mount things

  • cherrypy.config allows you to see the configuration with get() or getAll(), and update it from a dict or a file.


  • cherrypy.engine
  • cherrypy.server refers to the HTTP server. You can e.g. start it manually with .start()

Tricks

Running as a windows service

http://docs.cherrypy.org/cherrypy22-as-windows-service


See also


Unsorted

Errors and warnings

The ability to pass multiple apps is deprecated and will be removed in 3.2. You should explicitly include a WSGIPathInfoDispatcher instead

You passed in what is assumed to be a list of (path_prefix, wsgi_app) tuples instead of a single app. If you want multiple apps, explicitly compose a dispatcher, for example cherrypy's:

dispatching_app = WSGIPathInfoDispatcher(those_tuples)