Difference between revisions of "CGI, FastCGI, SCGI, WSGI, servlets and such"

From Helpful
Jump to: navigation, search
m
m (CGI variables, HTTP variables, and similar)
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{notes}}
+
{{webdev related}}
  
 
==CGI==
 
==CGI==
CGI refers to a basic standardization usually used by web servers to run an external program to handle a HTTP request. See e.g.a:
+
CGI (Common Gateway Interface) refers to a basic standardization to the input and output of a process that serves a HTTP request,
 +
mostly used by web servers to run an external programs {{comment|(CGI and request headers variables in environment, request body (if any) in stdin, response headers and body on stdout)}}
 +
 
 +
It is now somewhat dated, but still arguably useful for minimal implementations, such as embedded devices<!-- (even microcontrollers), partly because it talks to browsers directly and can get away with very minimal (HTTP 1.0) implementations-->.
 +
 
 +
See also:
 
* http://en.wikipedia.org/wiki/Common_Gateway_Interface
 
* http://en.wikipedia.org/wiki/Common_Gateway_Interface
 
* http://www.w3.org/CGI/
 
* http://www.w3.org/CGI/
  
  
Things newer than CGI (see below) often have code loaded into a webserver, to offload a bunch of basic parsing work and responsibilities (I would guess there are few CGI apps that really conform to HTTP 1.1, but offloading the actual transfer to a web server makes that easy), and allows some things that are lower-latency and better-scaling (e.g. persistent interpreter, worker pools) than just starting a new process for each request.
+
Things newer than CGI (see below) often have code loaded into a webserver, to offload a bunch of basic parsing, the sending work (particularly HTTP 1.1 fanciness like chunked transfers) and other responsibilities. I would guess there are few CGI apps that really conform to HTTP 1.1.
 +
Entangling with a web server makes that easier, and allows some things (e.g. persistent interpreter, worker pools, and such) that lower the response latency and scale a little better than just starting a new process for each request.
  
  
===CGI/HTTP/Environment variables===
+
===CGI variables, HTTP variables, and similar===
 
{{stub}}
 
{{stub}}
  
The canonical set defined by CGI 1.1 (see RFC 3875) are:
 
* <tt>REQUEST_METHOD</tt> (values such as "GET", "POST", "HEAD", etc. )
 
* <tt>QUERY_STRING</tt>
 
* <tt>SCRIPT_NAME</tt>
 
* <tt>PATH_INFO</tt>
 
* <tt>PATH_TRANSLATED</tt> - local path after any applicable virtual-to-physical translation
 
* <tt>SERVER_NAME</tt>
 
* <tt>SERVER_PORT</tt>
 
* <tt>CONTENT_LENGTH</tt> (of request)
 
* <tt>CONTENT_TYPE</tt> (of request)
 
* <tt>REMOTE_ADDR</tt>
 
* <tt>REMOTE_HOST</tt>
 
* <tt>AUTH_TYPE</tt> ("Basic" or "Digest", if used)
 
* <tt>REMOTE_IDENT</tt> (see RFC 1413)
 
* <tt>REMOTE_USER</tt> (as part of auth)
 
* <tt>GATEWAY_INTERFACE</tt> - version of CGI, e.g. "CGI 1.1"
 
* <tt>SERVER_PROTOCOL</tt>
 
* <tt>SERVER_SOFTWARE</tt>
 
  
You'll also frequently see these in post-CGI things, largely because they're a settled, standard way of communicating basic things interesting to apps.
+
The '''canonical set defined by CGI 1.1''' (see RFC 3875) are:
 +
* '''<tt>SERVER_NAME</tt>''' <!-- the (primary) name of the server being served from (which isn't necessarily the hostname the client sees - see also HTTP_HOST, and various conventions around reverse proxies). Often comes from server configuration, though can also be looked up. -->
 +
* '''<tt>SERVER_PORT</tt>''' <!-- - port being served on -->
 +
* '''<tt>REQUEST_METHOD</tt>''' (values such as "GET", "POST", "HEAD", etc. )
  
 +
* '''<tt>SCRIPT_NAME</tt>''' - The part of the URL path that was used to arrive at this particular script
 +
* '''<tt>PATH_INFO</tt>''' - the rest of the path in the URL (which can be empty)
 +
* '''<tt>PATH_TRANSLATED</tt>''' - resolves an URL path to a file within what the server serves
 +
** possibly not present (various things now consider it outdated)
 +
** possibly only present if there is an applicable virtual-to-physical mapping
 +
** possibly does not point to something that exists (e.g. be the path to a real dynamic script, with PATH_INFO just appended)
 +
** isn't always an absolute path
  
You may also see:
+
* '''<tt>QUERY_STRING</tt>'''
* extension meta-variables - should be prefixed with <tt>X_</tt>
+
  
* Protocol-Specific Meta-Variables, for example:
+
* '''<tt>CONTENT_LENGTH</tt>'''  of request body, if present
** Servers may set some <tt>HTTP_</tt>... variables with content copied from HTTP headers, such as:
+
* '''<tt>CONTENT_TYPE</tt>''' (of request body, if present)
*** <tt>HTTP_HOST</tt>
+
*** <tt>HTTP_REFERER</tt>
+
*** <tt>HTTP_USER_AGENT</tt>
+
*** <tt>HTTP_ACCEPT</tt>
+
*** <tt>HTTP_COOKIE</tt>
+
*** ...and others
+
** When SSL is used you'll see often see:
+
*** HTTPS (set, and often with value 'on') if used
+
*** SSL_* - quite a few, depending a little on the context/implementation (see e.g. [http://httpd.apache.org/docs/2.0/mod/mod_ssl.html])
+
<!--
+
  
REQUEST_URI  ?
+
* '''<tt>REMOTE_ADDR</tt>''' <!-- - IP address of client -->
 +
* '''<tt>REMOTE_HOST</tt>''' <!-- - FQDN of client. Should be set by server, but server ''may'' use REMOTE_ADDR -->
 +
* '''<tt>GATEWAY_INTERFACE</tt>''' - version of CGI supported by implementation, e.g. {{inlinecode|CGI 1.1}}
 +
* '''<tt>SERVER_PROTOCOL</tt> <!-- protocol used in the request (often {{inlinecode|HTTP/1.0}} or {{inlinecode|HTTP/1.1}})-->
 +
* '''<tt>AUTH_TYPE</tt>''' ("Basic" or "Digest", if used)
 +
* '''<tt>REMOTE_USER</tt>''' - authenticated username
 +
* '''<tt>SERVER_SOFTWARE</tt>''' <!-- something like {{inlinecode|Apache/2.2.17}} or {{inlinecode|Microsoft-IIS/5.1}} -->
 +
* '''<tt>REMOTE_IDENT</tt>''' - Not used much. (see RFC 1413)
  
 +
Many of these are used in dynamic generation, as a useful standardized way to communicate some central things.
 +
Details may deviate, though. For example, there are Apache-specific notes to SERVER_NAME<!-- [http://stackoverflow.com/questions/2297403/http-host-vs-server-name]-->.
  
Server details and configuration:
 
* <tt>SERVER_NAME</tt>
 
* <tt>SERVER_ADDR</tt>
 
* <tt>SERVER_PORT</tt>
 
* <tt>SERVER_ADMIN</tt>
 
* <tt>SERVER_PROTOCOL</tt>
 
* <tt>SERVER_SOFTWARE</tt>
 
* <tt>SERVER_VERSION</tt>
 
* <tt>DOCUMENT_ROOT</tt>
 
* <tt>API_VERSION</tt>
 
* ...and others
 
  
  
Apache-specific notes:
+
'''You may also see''':
* <tt>HTTP_HOST</tt>: Derived from the request
+
* extension variables specific to the CGI implementation, which ''should'' be prefixed with <tt>X_</tt> (...but I haven't seen this much)
* <tt>SERVER_NAME</tt>: is either
+
 
** when <tt>UseCanonicalName On</tt>: the name from configuration (not necessarily an FQDN or the one you want; consider virtual hosts, reverse proxied cases)  
+
* Protocol-specific (meta-)variables.
** when <tt>UseCanonicalName Off</tt>: the name from the request (often less troublesome)
+
** ...often because oldschool CGI could only get to these headers if the thing that served the CGI copied them into the executable's environment - which was fairly common. Examples:
 +
*** '''<tt>HTTP_HOST</tt>''', the client-supplied Host: value
 +
*** '''<tt>HTTP_COOKIE</tt>'''
 +
*** '''<tt>HTTP_REFERER</tt>'''
 +
*** '''<tt>HTTP_USER_AGENT</tt>'''
 +
*** Things like '''<tt>HTTP_ACCEPT</tt>''', '''<tt>HTTP_ACCEPT_CHARSET</tt>''', '''<tt>HTTP_ACCEPT_ENCODING</tt>''', '''<tt>HTTP_ACCEPT_LANGUAGE</tt>'''
 +
*** '''<tt>HTTP_CONNECTION</tt>''' <!-- for keep-alive -->
 +
** ...and more. The exact set could vary between servers, and configurations.
 +
** When SSL is enabled and used for the particular connection you'll see often see:
 +
*** '''<tt>HTTPS</tt>''' (often with value 'on')
 +
*** Quite a few starting with '''<tt>SSL_</tt>''', depending a little on the context/implementation (see e.g. [http://httpd.apache.org/docs/2.0/mod/mod_ssl.html])
 +
 
 +
 
 +
* Some apache additions, depending a little on on what type of dynamic serving this is (oldschool CGI, embedded interpreters like PHP, perl, python, other modules). <!--These may add:
 +
** '''<tt>DOCUMENT_ROOT</tt>''' - the value of DocumentRoot for the containing (virtual)host
 +
** '''<tt>SERVER_ADMIN</tt>''' {{verify}}
 +
** more '''SERVER_''' stuff
 +
*** '''<tt>SERVER_ADDR</tt>'''
 +
*** '''<tt>SERVER_VERSION</tt>'''
 +
*** '''<tt>DOCUMENT_ROOT</tt>'''
 +
*** '''<tt>API_VERSION</tt>'''
 +
** from mod_rewrite, if/when used
 +
*** '''<tt>THE_REQUEST</tt>''' - request line from client, e.g. {{inlinecode|GET /index.html HTTP/1.1"}}
 +
*** '''<tt>REQUEST_URI</tt>''' - path (not URL) to requested resource, e.g. {{inlinecode|/index.html}}. You might want to use SCRIPT_NAME if you care about more portable code.
 +
*** '''<tt>REQUEST_FILENAME</tt>''' - path on local filename for resource (...or, if not (yet) determined, the same value as REQUEST_URI)
 +
*** '''<tt>IS_SUBREQ</tt>''' - if this is an internal sub-request
 +
*** '''<tt>API_VERSION</tt>''' - apache module API version. Not very interesting most of the time.
 
-->
 
-->
  
  
 +
Notes:
 +
* There are some real-world addenda to <tt>HTTP_HOST</tt> and <tt>SERVER_NAME</tt>, particularly when using Apache and/or PHP.
 +
<!--
 +
* '''<tt>HTTP_HOST</tt>''': Derived from the request
 +
* '''<tt>SERVER_NAME</tt>''': is either
 +
** when <tt>UseCanonicalName On</tt>: the name from configuration (not necessarily an FQDN or the one you want; consider the effect of virtual hosts, and reverse proxied cases)
 +
** when <tt>UseCanonicalName Off</tt>: the name from the request (often less troublesome)
 +
-->
  
 
==FastCGI and SCGI==
 
==FastCGI and SCGI==
Line 84: Line 103:
  
 
Notes:
 
Notes:
* Apps are free from web server details; various web servers can be gateways to SCGI and FastCGI apps behind them (e.g. using mod_scgi and mod_fastcgi on apache). (Though in a few details it would be preferable to run on the web server to use some of its guts)
+
* It is fairly easy to make many public-facing web servers gateways to your internal SCGI and FastCGI apps. For example, apache has mod_scgi, mod_fastcgi.
  
* Makes it easier to scale up apps as it is usually easier to add references to more FastCGI hosts than to set up more hosts with embedded interpreters like PHP.
+
* Such a gateway ''can'' be a nice and flexible way
 
+
** to apply per-app or per-process security policies - and separate them
* Makes it easier to apply separate security policies for each such process
+
** to do SSL in one spot
 +
** potentially to load-balance
  
 
<!--
 
<!--
Line 106: Line 126:
 
{{stub}}
 
{{stub}}
  
WSGI (Web Server Gateway Interface) is a callback-based API for Python web apps, which eases application hosting, wrapping, and such in much the same way similar APIs in other languages do.
+
WSGI (Web Server Gateway Interface) is a callback-based API for Python web apps, which eases application hosting and wrapping.
 +
 
 +
Comparable to various APIs in other languages.
  
See also [[Python notes - WSGI]]
+
See [[Python notes - WSGI]] for more detail
  
 
==AJP==
 
==AJP==
Line 125: Line 147:
 
* Java servlet API
 
* Java servlet API
  
* NSAPI (Netscape Server API): No formal standard, not too common.
+
* NSAPI (Netscape Server API): No formal standard, not very common.
  
 
* [http://www.w3.org/TR/WD-ilu-requestor ILU Requester]
 
* [http://www.w3.org/TR/WD-ilu-requestor ILU Requester]

Latest revision as of 14:06, 10 May 2012

Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · plugin notes · unsorted

Server stuff:

Dynamic server stuff:

CGI

CGI (Common Gateway Interface) refers to a basic standardization to the input and output of a process that serves a HTTP request, mostly used by web servers to run an external programs (CGI and request headers variables in environment, request body (if any) in stdin, response headers and body on stdout)

It is now somewhat dated, but still arguably useful for minimal implementations, such as embedded devices.

See also:


Things newer than CGI (see below) often have code loaded into a webserver, to offload a bunch of basic parsing, the sending work (particularly HTTP 1.1 fanciness like chunked transfers) and other responsibilities. I would guess there are few CGI apps that really conform to HTTP 1.1. Entangling with a web server makes that easier, and allows some things (e.g. persistent interpreter, worker pools, and such) that lower the response latency and scale a little better than just starting a new process for each request.


CGI variables, HTTP variables, and similar

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


The canonical set defined by CGI 1.1 (see RFC 3875) are:

  • SERVER_NAME
  • SERVER_PORT
  • REQUEST_METHOD (values such as "GET", "POST", "HEAD", etc. )
  • SCRIPT_NAME - The part of the URL path that was used to arrive at this particular script
  • PATH_INFO - the rest of the path in the URL (which can be empty)
  • PATH_TRANSLATED - resolves an URL path to a file within what the server serves
    • possibly not present (various things now consider it outdated)
    • possibly only present if there is an applicable virtual-to-physical mapping
    • possibly does not point to something that exists (e.g. be the path to a real dynamic script, with PATH_INFO just appended)
    • isn't always an absolute path
  • QUERY_STRING
  • CONTENT_LENGTH of request body, if present
  • CONTENT_TYPE (of request body, if present)
  • REMOTE_ADDR
  • REMOTE_HOST
  • GATEWAY_INTERFACE - version of CGI supported by implementation, e.g.
    CGI 1.1
  • SERVER_PROTOCOL
  • AUTH_TYPE ("Basic" or "Digest", if used)
  • REMOTE_USER - authenticated username
  • SERVER_SOFTWARE
  • REMOTE_IDENT - Not used much. (see RFC 1413)

Many of these are used in dynamic generation, as a useful standardized way to communicate some central things. Details may deviate, though. For example, there are Apache-specific notes to SERVER_NAME.


You may also see:

  • extension variables specific to the CGI implementation, which should be prefixed with X_ (...but I haven't seen this much)
  • Protocol-specific (meta-)variables.
    • ...often because oldschool CGI could only get to these headers if the thing that served the CGI copied them into the executable's environment - which was fairly common. Examples:
      • HTTP_HOST, the client-supplied Host: value
      • HTTP_COOKIE
      • HTTP_REFERER
      • HTTP_USER_AGENT
      • Things like HTTP_ACCEPT, HTTP_ACCEPT_CHARSET, HTTP_ACCEPT_ENCODING, HTTP_ACCEPT_LANGUAGE
      • HTTP_CONNECTION
    • ...and more. The exact set could vary between servers, and configurations.
    • When SSL is enabled and used for the particular connection you'll see often see:
      • HTTPS (often with value 'on')
      • Quite a few starting with SSL_, depending a little on the context/implementation (see e.g. [1])


  • Some apache additions, depending a little on on what type of dynamic serving this is (oldschool CGI, embedded interpreters like PHP, perl, python, other modules).


Notes:

  • There are some real-world addenda to HTTP_HOST and SERVER_NAME, particularly when using Apache and/or PHP.

FastCGI and SCGI

FastCGI most broadly refers to the concept of running a persistent process to handle many requests over its lifetime, avoiding the process startup overhead that basic CGI implies.

FastCGI and SCGI are protocols to communicate to such servers; SCGI (Simple CGI) is an alternative to FastCGI for which the protocol is a little easier to implement.


Notes:

  • It is fairly easy to make many public-facing web servers gateways to your internal SCGI and FastCGI apps. For example, apache has mod_scgi, mod_fastcgi.
  • Such a gateway can be a nice and flexible way
    • to apply per-app or per-process security policies - and separate them
    • to do SSL in one spot
    • potentially to load-balance


See also:

WSGI

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

WSGI (Web Server Gateway Interface) is a callback-based API for Python web apps, which eases application hosting and wrapping.

Comparable to various APIs in other languages.

See Python notes - WSGI for more detail

AJP

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

AJP (Apache JServ Protocol) seems to be a simple and fast binary protocol, which makes it play in the same area as FastCGI, and has functionality similar to WSGI.

It is used in Tomcat, Jetty, and more. See also Java notes#Servlets_and_such

Some non-servlet and non-Java things speak the protocol too, for more cross-service pluggability. This also makes it useful in FastCGI sorts of ways.

Others/Unsorted

  • Apache API
  • ISAPI (Internet Server API), mostly library loading (so comparable to CGI without the process start overhead)
  • Java servlet API
  • NSAPI (Netscape Server API): No formal standard, not very common.
  • Oracle's WRB
  • SAPI Spyglass Server API