CGI, FastCGI, SCGI, WSGI, servlets and such

From Helpful
(Redirected from CGI)
Jump to: navigation, search
Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · XHR/XDR/Fetch notes · plugin notes · unsorted

Server stuff:

Dynamic server stuff:

CGI

CGI (Common Gateway Interface) refers to a basic standardization to the input and output of a process that serves a HTTP request, mostly used by web servers to run an external programs (CGI and request headers variables in environment, request body (if any) in stdin, response headers and body on stdout)

It is now somewhat dated, but still arguably useful for minimal implementations, such as embedded devices.

See also:


Things newer than CGI (see below) often have code loaded into a webserver, to offload a bunch of basic parsing, the sending work (particularly HTTP 1.1 fanciness like chunked transfers) and other responsibilities. I would guess there are few CGI apps that really conform to HTTP 1.1. Entangling with a web server makes that easier, and allows some things (e.g. persistent interpreter, worker pools, and such) that lower the response latency and scale a little better than just starting a new process for each request.


CGI variables, HTTP variables, and similar

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


The canonical set defined by CGI 1.1 (see RFC 3875) are:

  • SERVER_NAME
  • SERVER_PORT
  • REQUEST_METHOD (values such as "GET", "POST", "HEAD", etc. )
  • SCRIPT_NAME - The part of the URL path that was used to arrive at this particular script
  • PATH_INFO - the rest of the path in the URL (which can be empty)
  • PATH_TRANSLATED - resolves an URL path to a file within what the server serves
    • possibly not present (various things now consider it outdated)
    • possibly only present if there is an applicable virtual-to-physical mapping
    • possibly does not point to something that exists (e.g. be the path to a real dynamic script, with PATH_INFO just appended)
    • isn't always an absolute path
  • QUERY_STRING
  • CONTENT_LENGTH of request body, if present
  • CONTENT_TYPE (of request body, if present)
  • REMOTE_ADDR
  • REMOTE_HOST
  • GATEWAY_INTERFACE - version of CGI supported by implementation, e.g.
    CGI 1.1
  • SERVER_PROTOCOL
  • AUTH_TYPE ("Basic" or "Digest", if used)
  • REMOTE_USER - authenticated username
  • SERVER_SOFTWARE
  • REMOTE_IDENT - Not used much. (see RFC 1413)

Many of these are used in dynamic generation, as a useful standardized way to communicate some central things. Details may deviate, though. For example, there are Apache-specific notes to SERVER_NAME.


You may also see:

  • extension variables specific to the CGI implementation, which should be prefixed with X_ (...but I haven't seen this much)
  • Protocol-specific (meta-)variables.
    • ...often because oldschool CGI could only get to these headers if the thing that served the CGI copied them into the executable's environment - which was fairly common. Examples:
      • HTTP_HOST, the client-supplied Host: value
      • HTTP_COOKIE
      • HTTP_REFERER
      • HTTP_USER_AGENT
      • Things like HTTP_ACCEPT, HTTP_ACCEPT_CHARSET, HTTP_ACCEPT_ENCODING, HTTP_ACCEPT_LANGUAGE
      • HTTP_CONNECTION
    • ...and more. The exact set could vary between servers, and configurations.
    • When SSL is enabled and used for the particular connection you'll see often see:
      • HTTPS (often with value 'on')
      • Quite a few starting with SSL_, depending a little on the context/implementation (see e.g. [1])


  • Some apache additions, depending a little on on what type of dynamic serving this is (oldschool CGI, embedded interpreters like PHP, perl, python, other modules).


Notes:

  • There are some real-world addenda to HTTP_HOST and SERVER_NAME, particularly when using Apache and/or PHP.

FastCGI and SCGI

FastCGI most broadly refers to the concept of running a persistent process to handle many requests over its lifetime, avoiding the process startup overhead that basic CGI implies.

FastCGI and SCGI are protocols to communicate to such servers; SCGI (Simple CGI) is an alternative to FastCGI for which the protocol is a little easier to implement.


Notes:

  • It is fairly easy to make many public-facing web servers gateways to your internal SCGI and FastCGI apps. For example, apache has mod_scgi, mod_fastcgi.
  • Such a gateway can be a nice and flexible way
    • to apply per-app or per-process security policies - and separate them
    • to do SSL in one spot
    • potentially to load-balance


See also:

WSGI

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

WSGI (Web Server Gateway Interface) is a callback-based API for Python web apps, which eases application hosting and wrapping.

Comparable to various APIs in other languages.

See Python notes - WSGI for more detail

AJP

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

AJP (Apache JServ Protocol) seems to be a simple and fast binary protocol, which makes it play in the same area as FastCGI, and has functionality similar to WSGI.

It is used in Tomcat, Jetty, and more. See also Java notes#Servlets_and_such

Some non-servlet and non-Java things speak the protocol too, for more cross-service pluggability. This also makes it useful in FastCGI sorts of ways.

Others/Unsorted

  • Apache API
  • ISAPI (Internet Server API), mostly library loading (so comparable to CGI without the process start overhead)
  • Java servlet API
  • NSAPI (Netscape Server API): No formal standard, not very common.
  • Oracle's WRB
  • SAPI Spyglass Server API