DTD prohibited in xml document exception

Question

First, some background.

What is a DTD?

The document you are trying to parse contains a document type declaration; if you look at the document, you will find near the beginning a sequence of characters beginning with <!DOCTYPE and ending with the corresponding >. Such a declaration allows an XML processor to validate the document against a set of declarations which specify a set of elements and attributes and constrain what values or contents they can have.

Since entities are also declared in DTDs, a DTD allows a processor to know how to expand references to entities. (The entity pubdate might be defined to contain the publication date of a document, like “15 December 2012”, and referred to several times in the document as &pubdate; — since the actual date is given only once, in the entity declaration, this usage makes it easier to keep the various references to publication date in the document consistent with each other.)

What does a DTD mean?

The document type declaration has a purely declarative meaning: a schema for this document type, in the syntax defined in the XML spec, can be found at such and such a location.

Some software written by people with a weak grasp of XML fundamentals suffers from an elementary confusion about the meaning of the declaration; it assumes that the meaning of the document type declaration is not declarative (a schema is over there) but imperative (please validate this document). The parser you are using appears to be such a parser; it assumes that by handing it an XML document that has a document type declaration, you have requested a certain kind of processing. Its authors might benefit from a remedial course on how to accept run-time parameters from the user. (You see how hard it is for some people to understand declarative semantics: even the creators of some XML parsers sometimes fail to understand them and slip into imperative thinking instead. Sigh.)

What are these ‘security reasons’ they are talking about?

Some security-minded people have decided that DTD processing (validation, or entity expansion without validation) constitutes a security risk. Using entity expansion, it’s easy to make a very small XML data stream which expands, when all entities are fully expanded, into a very large document. Search for information on what is called the “billion laughs attack” if you want to read more.

One obvious way to protect against the billion laughs attack is for those who invoke a parser on user-supplied or untrusted data to invoke the parser in an environment which limits the amount of memory or time the parsing process is allowed to consume. Such resource limits have been standard parts of operating systems since the mid-1960s. For reasons that remain obscure to me, however, some security-minded people believe that the correct answer is to run parsers on untrusted input without resource limits, in the apparent belief that this is safe as long as you make it impossible to validate the input against an agreed schema.

This is why your system is telling you that your data has a security issue.

To some people, the idea that DTDs are a security risk sounds more like paranoia than good sense, but I don’t believe they are correct. Remember (a) that a healthy paranoia is what security experts need in life, and (b) that anyone really interested in security would insist on the resource limits in any case — in the presence of resource limits on the parsing process, DTDs are harmless. The banning of DTDs is not paranoia but fetishism.

Now, with that background out of the way …

How do you fix the problem?

The best solution is to complain bitterly to your vendor that they have been suckered by an old wive’s tale about XML security, and tell them that if they care about security they should do a rational security analysis instead of prohibiting DTDs.

Meanwhile, as the message suggests, you can “set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method.” If the input is in fact untrusted, you might also look into ways of giving the process appropriate resource limits.

And as a fallback (I do not recommend this) you can comment out the document type declaration in your input.

What is a DTD?

What does a DTD mean?

What are these ‘security reasons’ they are talking about?

How do you fix the problem?

Leave a Comment Cancel reply