XML notes

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Intro

Some upsides and downsides

On control codes and arbitrary binary data in XML

See Escaping and delimiting notes#On_control_codes_and_arbitrary_binary_data_in_XML

On externally defined entities

XML allows the definition of entities in a DTD.

Which implies that parsing the XML may require fetching any DTDs mentioned.

Depending on the XML library you use (and possibly settings)

it may have the DTD in a catalogue (if it's a very common one, or you put it in there)
it may have to fetch it
it may fail

...so yes, there are XML documents that cannot be parsed offline.

namespaces in XML

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In XML, each element can be in its own namespace.

Each distinct attribute can be in its own namespace too, but 99% of the time that's just masochism.

On uses

What are namespaces useful for?

A few things.

Overall document namespace - tack one onto the root element as a definition of the document type and version, so that something aware of specific namespaces (i.e. more than just a parser) can deal with different document types automatically

there are other ways to do this, and some can be more practical, but this is another perfectly valid way

Evolving complex document formats over time

In that it lets you segment off different concepts, different versions, etc.

Allows (mechanical) validation of such mixes

might make it easy for a parser to say "I know of the type but not the version, bug us to update it" instead of "error parsing document"

...but I'm not sure I have ever seen this done via namespaces, possibly because this is probably cleaner to do via some attributes.

Embedding fragments, or whole documents defined by another standard

Say, if you embed SVG in XHTML or indeed HTML, you should probably start it with <svg xmlns="http://www.w3.org/2000/svg">

upsides:

namespaces avoid ambiguity of what standard each node/attribute refers to,

namespaces avoids potential clashes if they use the same node/attribute name.

it's easy for a program to simply ignore anything we don't know.

arguables/downsides:

in many cases, a mix of standards is either

already completely standardized by explicit design (e.g. office documents, de facto standards used by any one program) and any embedding is essentially hardcoded in its specific parser, or

arbitrarily dumping XML in other XML basically doesn't happen -- because it is practically unclear how to relate the part to the whole, even if you know how to parse it perfectly well.

The best uses I can think of is

having a XML container format that can contain others -- e.g. specific things like XML-based search interface serving only XML metadata

having formats be future compatible based on a "you can completely ignore records that you do now the namespace of"

Namespaces as a hack on XML

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

On aliases

While you can stick the value/URI explicitly on every node that is to be in that namespace, this is unreadably verbose and very space-inefficient, so the typical way is to define an alias' (practically a.k.a. prefix) to be used as a shorthand.

Aliases are scoped, in that they are valid references only in the subtree under the node the alias is declared on.

...yet it is relatively typical to declare them all document-wide.

...and in fact typical to add the things you might use, even if you don't

Aliases are primarily useful for keeping human-written and serialized documents less verbose than they could be.

They are also arguably a broken abstraction that have has led to a lot of confusion.

For starters,The namespace is that value/URI, not that alias.

While looking at an example this looks like semantics, but there are real-world reasons this is weird. This can be a little abstract to grasp intuitively, so a few angles:

An XML parser, or XSLT transform, cannot tell you what that alias was in serialized form.

Loading from disk and writing to disk means aliases stay unique, but the actual string used for the alias does not and cannot persist.

The alias does not exist in the represented data, even though it exists in the (typically[1] human-readable) file that represents that data.

When processing matches on a namespace, it only cares about the value/URI, and when saving the result of processing in XML, the alias name cannot be saved (it's essentially just generating random unique identifiers, usually via enumeration as the ns0, ns1, ... convention shows).

Sure, examples often use human-sensible alias.

And examples will put that alias consistently into the XML, related documents, e.g. XSLT that transforms it.

But it turns out that aliases being readable names is a convenience only relevant to humans hand-crafting documents.

To machine parsing, this has no meaning. To a parser there isn't any real difference between:

<root xmlns:example="http://example.com"><example:element/></root>

and

<root xmlns:fhwdgads="http://example.com"><fhwdgads:element/></root>

and:

<root><ns0:element xmlns:ns0="http://example.com" /></root>

The last came from a pass of parsing an writing it out again, and demonstrates that while it looks different (here it moved the namespace definition down to the highest place in the tree that actually uses it) but is actually equivalent.

Notes:

If you do try to consider aliases part of the document model in any way, that implies the document model is not fully defined by the DTD (or such), only be the document itself.

...in practice, the the DTD/Schema has no say about aliases

In transforms, like XSLT, this amounts to matching on the namespace value/URI.

aliases are there just for human readability

Namespaces and DTDs

tl;dr:

DTDs do not support XML namespaes at all

so if you want validation and namespaces, you need XML schema

DTDs have no syntax to define a namespace declaration or alias.

You can put a prefix: on names -- but it won't be a prefix, or namespace, in the XML sense of being separate, or of representing something a value/URI.

It essentially becomes part of the node/attribute name.

You can make a DTD with colons in its node names, you cannot create one that is actually namespace-aware. (e.g. how could you tell that that not-alias name maps to different things in different part of a document?).

"Usually won't in practice" is not why we do strict validation, which is why we use XML Schema instead (which works around this by itself being expressed in XML).

Notes on (ab)use of namespaces

Namespaces in XSLT

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

You must, in your XSL, declare all namespaces that you wish to match in the input XML, because you won't be able to match those otherly-namespaced things if you don't.

This means XSL must always be specifically hand-crafted for every specific transform you want and dealing with deviations in document, or even just different variants and versions of what conceptually is the same namespace (that would be easy to express in code) may be awkward or even impossible to express in XSL.

It's also quite wordy, because every part must match the namespace. You can save some typing by using the default namespace, which is anonymous (letting you write xmlns="bla" instead of aliased (xmlns:dc="bla"), but you still need to get the identifier right.

To match that in XSL, you need to define an alias with the same (URL) identifier (eg. xmlns:x="http://www.w3.org/1999/xhtml" if you happen to know the input is XHTML) after which you must use it on every tag you wish to be matched - which tends to be almost everything.

When that's a pain, blame the creators of the schema for unnecessary use of bothersome namespaces. ...or use XSLT2, which solves much of the alias bother.

Related - XSLT, XPath, XQuery, XSL-FO, XSL, etc.

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

XPath - XML Path Language

a query language that select/query nodes in a tree document, in a path-like way.

XPath notes

/bookstore/book/title
/bookstore/book[1]/title 
/bookstore/book[@price>35]/price 
/bookstore/book/title
book/*[position()=1]
//tagname[@attribute='value']
//body/main/main-text//paragraph
//a | //b

/ is root // is anywhere under

XML notes

Contents

Intro

Some upsides and downsides

On control codes and arbitrary binary data in XML

On externally defined entities

namespaces in XML

On uses

Namespaces as a hack on XML

On aliases

Notes on (ab)use of namespaces

Namespaces in XSLT

Related - XSLT, XPath, XQuery, XSL-FO, XSL, etc.

See also

XPath notes

Navigation menu