XML notes

From Helpful
(Redirected from XML)
Jump to: navigation, search
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Intro

Some upsides and downsides

On control codes and arbitrary binary data in XML

See Escaping and delimiting notes#On_control_codes_and_arbitrary_binary_data_in_XML

namespaces in XML

In XML, each element can be in its own namespace. Heck, each distinct attribute can be.


Namespaces as a hack on XML

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Namespaces definitions in a document are primarily an identifier any unique values, conventionally URIs and often URLs (to the DTD).

In most uses they also get an alias to be used as a shorthand within the same document.

Namespace declarations are valid (as in usable) in nodes under the place they are declared.


Namespace aliases/prefixes are references to a namespace value declared elsewhere within that document.

They are primarily useful for keeping the serialized document less verbose than they need be.

Human-readable prefixes happen primarily in examples, (other) human-written documents, and XML generated as strings. This leads to some confusion around namespaces, primarily that an alias name is itself significant.

When you match on an alias, you only match on the value it points to - that is, what you have declared it to point to.


aliases are only part of the on-disk/in-communication representation of the document, not really of the in-memory document. As such, loading from disk and writing to disk often means aliases are used as the unique identifiers that they are, but the string used for them is lost.

Prefixes can mislead newcomers to XML, thinking that it is easily matched on by string-like things. It's not a string, and their being readable is a convenience only really relevant to human-written documents.


To an XML parser, there is absolutely no difference between

<root xmlns:example="http://example.com"><example:element/></root>

and

<root xmlns:fhwdgads="http://example.com"><fhwdgads:element/></root>

And parsing and serializing it might yield yet another equivalent version, like:

<root><ns0:element xmlns:ns0="http://example.com" /></root>

Note that aliases work out as anonymous identifiers.


Notes:

  • If you consider prefixes part of the document model in any way, then the document model is not fully defined by the DTD (or such), only be the document itself.
  • ...and so the DTD/Schema has no say about aliases
  • In transforms, like XSLT, this amounts to matching on the namespace value/URI. You cannot match on prefix. (If you want to use a readable prefix, you can declare it which is )



In some cases the use of namespaces and of readable prefixes are both pretty elegant - for example in XSLT, separating xsl from content is clear for human and computer consumption. (note: partly because these are and stay human-written)


It is a problem because once you communicate or store XML data, prefixes are lost, or rather, present an unique-per-document thing and essentially anonymous (you could see them as under-the-covers enumeration), in that their names are completely meaningless without resolving them to the identifiers they point to, which is usually quite bothersome (see e.g. XSLT1).

This explanation may seem counterintuitive because you can *see* them in the XML (or rather, its most common serialized form, text), and in the XML you write you choose readable aliases at that, but in system-(re)generated XML (probably most XML) namespaces only conserve the namespace assignents and the aliases are often not the same as you may have named them - that's entirely valid to XML specs.

Notes on (ab)use of namespaces

Problems in XSLT

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

You must, in your XSL, declare all namespaces that you wish to match in the input XML. There is one semi-exception: A default namespace in the XML is anonymous (xmlns="bla") instead of aliased (xmlns:dc="bla"). To match that in XSL, you need to define an alias with the same (URL) identifer (eg. xmlns:x="http://www.w3.org/1999/xhtml" if you happen to know the input is XHTML) after which you must use it on every tag you wish to be matched - which tends to be almost everything.

When that's a pain, blame the creators of the schema for unnecessary use of bothersome namespaces. ...or use XSLT2, which solves much of the alias bother.

See also