Communicated state and calls

From Helpful

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.


Contents

Local(ish) and Network geared

Varying technologies, techniques, and implementation types include the following:

'IPC'

Inter-process communication is a header sometimes used for many of the local-ish methods, dealing with one or more of:

  • message passing
  • synchronization
  • shared memory
  • remote procedure calls


In the broad sense, this can describe things of varying type, complexity, speed, goals, guarantees, and cross-platformness, including:

  • File-related methods (e.g. locking)
  • Sockets (network, local)
  • Message queues
  • process signals
  • named pipes
  • anonymous pipes
  • semaphores
  • shared memory

...and more.


One of the easiest methods to get cross-platformness is probably the use of (network) sockets - mostly since Windows isn't POSIX compliant, and most other things are even less cross-platform.

Threads (same-process)

In multi-threaded applications, IPC-like methods are sometimes used for safe message passing and perhaps because of convenience (some can ease communication between multiple threads spread among multiple processes).


Same-computer

Fast same-computer (same-kernel, really) process interaction:

  • POSIX named pipes
  • POSIX (anonymous) pipes
  • POSIX shared memory
  • POSIX semaphore
  • SysV IPC (queues, semaphores, shared memory)
  • unix sockets (non-routable network-style IO on unix-stlye OSes), not unlike...
  • windows' LPC

Interaction between applications, nodes in a cluster, etc.

Networkable application embedding like

  • DCOP (KDE 2, 3)
  • D-Bus (KDE 4, GNOME, others)


Relatively manual clustering/interoperation clustering support mechanisms, such as


Relatively general-purpose (networked)

...and often cross-language frameworks (that often create fairly specific-purpose protocols), such as in:

  • Apache's Thrift, geared to expose something something local via a daemon

Language-specific data transport frameworks

...such as


Fairly narrow-purpose protocols

...like

  • Flash's remoting
  • Various proprietary protocols (some of which are mentioned above)

RPC variations

A Remote Procedure Call is a function/procedure call that is not necessarily inside a process' local/usual context/namespace, often done via a network socket (even if calling something locally), also often on another computer, and possibly across the internet.

RPC may in some cases be nothing more than a medium that carries function calls to the place they should go, or a means of modularizing a system into elements that communicate using data rather than with linked code.


XML-RPC

XML-RPC is a Remote Procedure Call language that communicates via a simple, standardized XML protocol.

It is mainly a way to move function calls between computers. It adds typing, although only for simpler data like numbers and strings.


It is not the fastest or most expressive way to do remote calls, but is simpler than most others and can be very convenient, and not even that slow when implemented well. Usually, the focus is one of:

  • Run code elsewhere as transparently as possible
  • Provide a service over a well defined network interface
  • Provide a webservice

Arguably there's not much difference, but there can be in terms of what an implementation requires you to do: Does it make exising functions usable? All? Implicitly or explicitly? How much code do you need to actually hook in new functions? You'll run into this difference in example code and explanation when you look around for implementations.


See also:

XML-RPC, transparency angle:

XML-RPC, webservice angle:


ONC-RPC
This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

Open Network Computing RPC, the base of NFS

See also:

Others
  • DCE RPC
  • ISO RPC

Unsorted

  • MSRPC (a modified DCE/RPC) [1]
  • DDE
  • COM which seems an umbrella term for at least OLE, ActiveX, COM+, DCOM
  • .NET remoting [2]
  • WCF [3]

Web-geared

REST

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

A convention/formalism in which you clearly

  • define a resource (type)
  • have an address for a type of resource
  • a mapping of basic CRUD-style operations to manage it


REST versus RPC in practice:

  • REST usually has a simpler external interface than RPC, partly because of the convention of modelling REST in a way so that you need minimal knowledge of the content type and trying to handle more details related to the underlying content/type in the code behind it.
  • REST usually uses more identifiers (in non-trivial models), which can be human-opaque
  • REST usually means a simpler interface than RPC. As such, RPC is usually handier to expose a detailed (and previously modelled) programming interface, while REST is often handier for interconnection services that can be modelled in a fairly simple way.
  • REST's focus on resources and their identifiers means that operations are by convention and almost per implication well-separated, which not all RPC interfaces do.


At a technical level:

  • REST does everything via URL(/HTTP) accesses, which may be more interactions than RPC methods
  • accesses between systems, languages is often a little faster to implement via REST than RPC -- URLs are well defined and fairly simple, and URL are often at the core of web systems (and modern languages)


In REST over HTTP, a viewed HTML page is often a representation of the resource, not the resource itself.

SOAP

SOAP is, in the widest sense, a well-typed serialization format in XML, which can be a nice strict no-brainer option between systems that are not easily binarily compatible. (.NET uses it as one of its two object serialization formats.


SOAP also describes a protocol for exchanging SOAP data, often to do remote procedure calls.

This use resembles XML-RPC in various ways (including that you can use it over HTTP to get through firewalls), and could be said to be an improvement over it in terms of typing. However, you can argue that pragmatism wasn't an issue when SOAP was defined; SOAP implementations may differ on what parts they implement/allow (meaning interoperation can be difficult), SOAPAction's format is a little underspecified, and its use forces you to hook into HTTP. Also, the amount of XML namespaces is mildly ridiculous -- you won't write any SOAP content yourself, or be able to get away with a quick and dirty parser instead of a decent SOAP implementation. Its verbosity also makes it slower to parse than some other RPC methods (particularly binary RPC) so not as practical for latency-critical needs. (It doesn't help my opinion that various SOAP implementations are noticably slower than they could be)

You can argue that this makes SOAP a good serialization format but not a very good RPC medium.


WSDL

WSDL describes web services, often (but not necessarily) SOAP.

WSDL mostly just mentions:

  • the URL at which you can interact (often using SOAP),
  • The functions you can call there and the structure of the data you should send and will get (using XML-Schema)

WSDL allows you bootstrap SOAP RPC based on a single URL, without writing code for the SOAP yourself. You can use a program that converts WDSL to code, or compiles it on the fly (each time, or just regularly).

This doesn't always work as well as the theory says, though, mostly because of the complex nature of SOAP.

Unsorted

If you want to avoid a SOAP stack (I did because the options open to me at the time were flawed), the main and often only difference with a normal POST request (with the SOAP XML in the body) is that you need to add a SOAPAction header.

The SOAPAction header is used to identify the specific operation being accessed. The exact format of the value is chosen by the service provider.


See also


XML-DA

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

Seems to be a a simpler alternative to SOAP when all you want is some basic data access.

See also:

UDDI

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

UDDI (Universal Description, Discovery, and Integration) is a way to list exposed web services.

It is primarily useful as a local (publish-only) service registry, to allow clients to find services and servers: the idea is that it yields WSDL documents, via SOAP.

It was apparently intended to be a centalized, yellow pages type of registry for business-to-business exchange, but is not often used this way.(verify)


See also:

Data and serialization

Serialization formats:

  • Serialization formats made for interchange such as
  • Serialization formats made for storage (and/or readability) are usually more detailed than they are efficient, but can still be handy. Consider
    • YAML
    • JSON (see also the idea of JSON-RPC)
    • XML (doubtful in this list as you might as well use the better defined and XML-based SOAP, unless you are communicate structure more easily expressed in nested nodes -- such as data already in that format).
  • Language-specific serialization implementations can also be convenient (e.g. in memcaches), but with obvious drawbacks when cooperating between multiple languages.


XML

While XML itself has upsides over self-cooked solutions (e.g. in that encoding and parsing is well-defined and done for you), its delimited nature and character restrictions mean it is not an easy or efficient way to transport binary data without imposing some extra non-XML encoding/decoding step. (CDATA almost gets there, but requires that the string "]]>" never appears in your data.)

One trick I've seen used more than once (e.g. filenames in a xml-based database, favicons in bookmark formats) is to encode data in Base64 or URL encoding. This is fairly easy to encode and decode, and transforms all data into ASCII, free of the delimiters XML uses and byte values XML specifically disallows. It's safe, but does waste space/bandwidth.

Of course, storage of arbitrary binary data is often not strictly necessary, or a rare enough use that overhead is not a large problem.


Validity

Major syntax
Well formedness

Well-formedness largely excludes things that a parser could trip over.

Structure constraints:

  • well-balanced
  • properly nested
  • contains one or more elements
  • contains exactly one root element

Syntax constraints

  • does not use invalid characters (see section below)
  • & is not used in a meaning other than starting a character reference (except in CDATA)
  • < is not used in a meaning other than starting a tag (except in CDATA)
  • entities
    • does not use undeclared named entities (such as HTML entities in non-XHTML XML)
  • attributes:
    • attribute names may not appear more than once per tag
    • attribute values do not contain <
  • comments:
    • <!-- ends with -->, and may not contain --, and does not end with --->
    • cannot be nested. The attempt leads to syntax errors.
Validity

Well-formed documents are not necessarily valid.

Valid documents must:

  • be well-formed
  • contain a prologue
  • be valid to a DTD if they refer to one.


Characters

Valid characters are:

  • U+09
  • U+0A
  • U+0D
  • U+20 to U+D7FF
  • U+E000 to U+FFFD
  • U+10000 to U+10FFFF

Another way to describe that is "All unicode under the current cap, except...":

  • ASCII control codes other than newline, carriage return, and tab - so 0x00-0x09, 0x0b,0x0c, 0x0e, 0x0f are not allowed
  • Surrogates (U+D800 to U+DFFF)
  • U+FFFE and U+FFFF (semi-special non-characters) (perhaps to avoid BOM confusion?)


Note that this means binary data can't be stored directly. Percent-escaping and base64 are not unusual for this.

URIs

URIs have no special status, which means they are simply text and should be entity-escaped as such.

For example, & characters used in URLs should appear as &amp;.

(Note that w3 suggests ; as an alternative to & in URIs to avoid these escaping problems, but check that your server-side form parsing code knows about this before you actually use this)


As to special characters:

  • non-ASCII characters are UTF8 escaped (to ensure that only byte-range values appear)
  • disallowed characters in the result are %-hex-escaped, which includes:
    • 0x00 to 0x1F, 0x20 (note this now includes newline, carriage return and tab)
    • 0x7F and non-ASCII (note this includes all UTF-8 bytes)
    • <>"}|`^[]\
Other details


XML parsers may choose to allow parsing though well-formed XML fragments. In the case of SAX and SAX-like parsers, it may not even be required to supply something with a single root element.

XML based formats

These are primarily notes
This is probably not going to be complete in any real sense, and exists to contain bits of useful information.

Web

  • XHTML 1.0 and 1.1 (basically the XML-strict serialization of HTML4)
    • XHTML 1.0 Transitional
    • XHTML 1.0 Strict
    • XHTML 1.0 Frameset
    • XHTML 1.1 (reformulation of 1.0 Strict)
  • Versions of [RSS and Atom notes RSS]
  • XBEL (bookmark format)


Historical / time data

  • xCal
  • SIMILE Timeline(/Timeplot) format (not very documented)
  • HEML


Dictionary data:


Geography and related

Documents/documentation

Graphs:

Unsorted:

XSL

XSL workarounds
This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

See also Namespaces

Entities

When filtering, say, xhtml to xhtml, you run into the problem that that XML itself (and therefore XSL by default) has only its five basic named entities defined. This does not include most of the HTML entities you may be used to.

Entity declarations can be added quickly in an inline DTD, or imported all at once

((FIXME: Don't know yet))

<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
 %HTMLlat1;

JSON

JSON is the idea of using javascript data structures (in text form) as a serialized /data exchange format. JavaScript Object Notation rewrites data as Javascript-eval()uable strings and can contain numbers, strings (without worrying about unicode characters), arrays, associative arrays (a.k.a. hashes, dictionaries).

It is primarily used to move data from a server into a JavaScript application, regularly though AJAX.


Note that you can also use JSON from your server-side language to write data directly into a page's <script> tag, for example, from python you could do:

somedata = {'tags':['foo','bar'],'id':3}
print '<script type="text/javascript">var data=%s;</script>'%json.write(somedata)


Since JSON is javascript code usually fed to eval(), you can do assignments, calls -- and write exploits. Some people categoricaly dislike JSON because of the security risks originating in blindly trusting and eval()-ing anything you receive.

You can avoid that by using a non-code format to do the same, but those are often more work, particularly for more complex data types.


JSONP

JSONP refers to letting the client prepend a bit of text to the JSON text that the server returns.

For example, say you have a web API that returns some blog post metadata:

([{url:'http://example.com/post1', id:'1', t:['tag','foo']}
  {url:'http://example.com/post2', id:'2', t:['bar','quu']}
])

A simple use would be to prepend nothing and use it in a straight eval:

blog.posts = eval( fetch("http://api.example.com" )
// so that you can do  blog.posts[0].url


With JSONP, you can do something like:

eval( fetch("http://api.example.com?jsonp=callbackfname" )

The server would respond with something like:

callbackfname([{url:'http://example.com/post1', id:'1', t:['tag','foo']}
  {url:'http://example.com/post2', id:'2', t:['bar','quu']}
])

...which calls a callback when evaluated (this is why the parentheses are around the object; they fall away in a basic eval).



The page that I stole the example from refers to a specific callback by identifier, which is one way of placing most of the decision of what to do with the response in the exchange rather than in client logic - though it does entangle the server and client code a little more.


JSONP is regularly used to work around same-source restrictions for AJAX -- if you dynamically add a <script> tag to load JS, that new script can come from anywhere. If that script is actually just data wrapped in a call that hooks into the rest of a page (JSONP style), that makes it more or less the same as loading the JSON and then doing a call from within your script.

Note that JSONP works around this by using remote code execution -- which is a larger potential security breach than just transferring data (JSON, XML, whatnot) is, so trust your sources or don't use it. This includes various APIs.

One alternative is to put a data proxier on your own domain.

"Invalid label" error

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

The usual cause is having the server side send JSON that is not technically correct.

It seems like eval() is peculiar when you do not add brackets around certain objects. Add them for robustness, either at the server side, or in the javascript like:

var o = eval( '('+instring+')' );

(My own mistake was a different and server-side mistake, namely forgetting to send a python dict through JSON before sending it. Python's format looks very similar, but is often not valid JSON)

Unsorted

Note that JSON is potentially less secure than any other exchange in that you must know that the source of the JSON is trusted. Even if JSON usually deals with data, it may also contain code, which will get executed and so can easily enable things like XSS exploits.

See also

YAML

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

YAML (YAML Ain't a Markup Language) is a data serialization format intended to be readable and writable by both computers and humans, stored in plain text.

A lot of YAML values aren't delimited other than that they follow the basic YAML syntax. Only complex features and complex strings (binary data, unicode and such) will require more reading up.


YAML spends a lot language definition to push complexity into the parser and away from a person that wants to write YAML. Various things are possible in two styles (one more compact, the other a little more readable).

YAML is arguably a little handier for code-based manipulation than the likes of XML are. The conversion between YAML to data and back is often simple and obviously, which makes the result more predictable.

It does rely on indenting, which some people don't like (seemingly mostly those with editors that don't care so much about about whitespace)


Scalars

Null:

~
null

Integers (dec, hex, oct):

1234
0x4D2 
02322

Floats:

1.2
0.
1e3
-3.1e+10 
2.7e-3
.inf
-.inf
.nan

Booleans:

true
false

Basic syntax

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

Items are often split by explicit syntax or unindents, e.g.:

foo: [1,2]
bar: [3]
foo:
  - 1
  - 2
bar:
  - 3
quu: {
   r: 3.14,
   i: 0.
 }


Splitting documents(/records):

---

Usuallly sit on its own line, doesn't have to. Not unusually followed by a #comment on what the next item is.

If you want to emit many records and mark start and end, use --- and ...



Composite/structure types

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.


Lists

- milk
- pumpkin pie
- eggs 
- juice

or inline style:

[milk, pumpkin pie, eggs, juice]


Maps:

name: John Smith
age: 33

or inline style:

{name: John Smith, age: 33}


Comments

# comment


Strings, data

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

Strings

most any text
"quoted if you like it"

Also:

"unicode \u2222 \U00002222"
"bytestrings \xc2"

Note that strings are taken to be unicode strings, and there is no formally distinguishing it with bytestrings.

(If you want to distinguish them and/or want the exact string type detail to be preserved through YAML, you may want to use a tag (perhaps !!binary for base64 data), or perhaps code some schema/field-specific assumptions)



Mixing strctures

Generally works how you would expect it, given that YAML is indent-sensitive.

For example, a nested list:

- features
  - feature 1
  - feature 2
- caveats
  - caveat 1
  - caveat 2

...a list of hashes:

- {name: John Smith, age: 33}
- name: Mary Sue
  age: 27

...a hash containing lists:

men: [John Smith, Bill Jones]
women:
  - Mary Sue
  - Susan Williams

Tags

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

YAML has a duck-type sort of parser, which won't always do what you want.

You can force/cast specific handling using tags, e.g.

not_a_date: !!str 2009-12-12
flt: !!float 123

The tags that currently have meaning in YAML include:

!!null
!!int
!!float
!!bool
!!str
!!binary
!!timestamp
!!seq
!!map
!!omap
!!pairs
!!set
!!merge
!!value 
!!yaml

You can also include your own.


See also:


Advanced features

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.
Relations (anchor/alias)
This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

Further notes

Note that the inline style for the basic data structures (string, numbers, lists, hashes) is often close to JSON syntax, may occasionally be valid JSON. JSON can be seen as a subset of YAML. (apparently specifically after YAML 1.2, because of Unicode handling details)

Given some constraints, you can probably produce text that can be parsed by both YAML and JSON parsers.


Libraries

See e.g. http://en.wikipedia.org/wiki/YAML#Implementations

There tend to be multiple parsers/libraries for most languages. You may want to compare features and speed; for example, for scripting languages approaches like that of SYCK is often much faster.

Netstrings

Netstrings are a way of transmitting bytestrings by prepending their length, instead of delimiting them. This means they can be placed in general text files or other datastreams, and since it is a simple to implement spec, it can be used by many different programs.

See also:


Bencode

Bencode is similar to netstring, but can also code numbers, lists, and dictionaries. Used in the BitTorrent protocol.

Apparently not really formally defined(verify), but summarized easily and well enough.

See also:


Others

See e.g.

Some implementation notes

Python SOAP libraries

pywebsvcs

pywebsvcs ('Web Services for Python') can be both the client-side and server-side of SOAP.

The most interesting part of the package is usually ZSI, but in more detail it consists of:

  • ZSI: Zolera SOAP Infrastructure
  • wstools (WSDL tools)
  • SOAPPy (previously separate, now somewhat redundant. Long-term thread to integrated it into ZSI not quite executed?)(verify)
    • Docstring (API)
    • readme
    • depends on PyXML (one among various XML parsers)
    • depends on fpconst (IEEE754 floating point number stuff. Consists of a single pure-python file that can be installed, or just copied in)
    • not developed anymore, and apparently never finished: it fails to parse Amazon's (admittedly unusual) WSDL. Nice enough to use when it does work, though.



In ZSI, clients can be created

  • using ServiceProxy(wsdlfile) (WSDL-based)
  • Binding(baseurl) (self-defined / uses only simple types)
  • using WSDL-to-python code generation (wsdl2py). Note this code may be (very) bulky, still depends on ZSI, and still seems experimental (fails on Amazon ECS 4)

Unsorted

Note that trying to interface with Amazon ECS from scatch is probably rather complex. I suggest using something like PyAWS (for ECS4. pyamazon, which it is based on, seems to be for the now defunct ECS3)

  • Soapy (not to be confused with SOAPpy)


Yet to read myself

See also