Validation is not the last word in conformance

Validation is not the last word in conformance, as Ian Hickson excellently reminds us.

It all started when a coffee room discussion ended with a test that shows a tricky ambiguity in XML. Mika Raento‘s email has more:

> On Tue, Apr 27, 2004 at 10:05:20AM +0300, Mika Raento wrote: > > and you can't disallow the a-b-a with an XML DTD. > > I didn't know about that XML limitation. That's quite a serious one.

:-). Well, you can have a look at the relevant writings but the basic
idea is:

SGML: complex, described in a 400 page ISO standard, does include so
called exclusions and inclusions so that you can say:
b: all inline
a: all inline excluding a

XML: simple, described in 40 pages. Much simpler content models.

The good thing: XML is a lot easier to write software for.
The bad thing: since XML is upwards compatible with SGML they couldn't
go to another kind of grammar. So to be simple they had to give up a few
things, although we could have had more flexibility without the
complexity. See Relax NG.

A good starting point is http://xml.coverpages.org/topics.html#grammar

If you are interested in these topics you are very welcome to
participate on the tree-grammar course I and Miro are holding next
autumn.

Surely a security issue too? Someone can create a maliciously malformed
XML document, and pass it through the doctype as valid. Or am I missing
something?

Well, with almost any kind of grammar you have restrictions that you
cannot check on the grammar level. Think C: a syntactically correct
program might still not link. These have to be checked in some other
way - you just have to be aware of what these are. Validity just means
that you don't have to check the stuff that can be expressed in the
DTD.

So my point was really: XHTML doesn't remove the possibility for
ambiguity in the document structure and doesn't remove the need to make
judgements on the application level. But I agree with you that there is
a big difference between the kinds of HTML docs normally on the web and
with the strictness of XHTML. What I don't know is how significant
this difference is.

It always amazes me how much XML can burn you…

Found any of my content interesting or useful?