Rationale behind SimpleXMLElement’s handling of text values in addChild and addAttribute

Just to make sure we’re on the same page, you have three situations.

  1. The insertion of an ampersand into an attribute using addAttribute

  2. The insertion of an ampersand into an element using addChild

  3. The insertion of an ampersand into an element by property overloading

It’s the discrepancy between 2 and 3 that has you flummoxed. Why does addChild not automatically escape the ampersand, whereas adding a property to the object and setting its value does escape the ampersand automatically?

Based on my instincts, and buoyed by this bug, this was a deliberate design decision. The property overloading ($a->d = ‘Five & Six’;) is intended to be the “escape ampersands for me” way of doing things. The addChild method is meant to be “add exactly what I tell you to add” method. So, whichever behavior you need, SimpleXML can accommodate you.

Let’s say you had a database of text where all the ampersands were already escaped. The auto-escaping wouldn’t work for you here. That’s where you’d use addChild. Or lets say you needed to insert an entity in your document

$a = simplexml_load_string('<root></root>');
$a->b = 'This is a non-breaking space &nbsp;';
$a->addChild('c','This is a non-breaking space &nbsp;');    
print $a->asXML();

That’s what the PHP Developer in that bug is advocating. The behavior of addChild is meant to provide a “less simple, more robust” support when you need to insert a ampersand into the document without it being escaped.

Of course, this does leave us with the first situation I mentioned, the addAttribute method. The addAttribute method does escape ampersands. So, we might now state the inconsistency as

  1. The addAttribute method escapes ampersands
  2. The addChild method does not escape ampersands
  3. This behavior is somewhat inconsistent. It’s reasonable that a user would expect the methods on SimpleXML to escape things in a consistent way

This then exposes the real problem with the SimpleXML api. The ideal situation here would be

  1. Property Overloading on Element Objects escapes ampersands
  2. Property Overloading on Attribute Objects escapes ampersands
  3. The addChild method does not escape ampersands
  4. the addAttribute method does not escape ampersands

This is impossible though, because SimpleXML has no concept of an Attribute Object. The addAttribute method is (appears to be?) the only way to add an attribute. Because of that, it turns out (seems?) SimpleXML in incapable of creating attributes with entities.

All of this reveals the paradox of SimpleXML. The idea behind this API was to provide a simple way of interacting with something that turns out to be complex.

The team could have added a SimpleXMLAttribute Object, but that’s an added layer of complexity. If you want a multiple object hierarchy, use DomDoument.

The team could have added flags to the addAttribute and addChild methods, but flags make the API more complex.

The real lesson here? Maybe it’s that simple is hard, and simple on a deadline is even harder. I don’t know if this was the case or not, but with SimpleXML it seems like someone started with a simple idea (use property overloading to make the creation of XML documents easy), and then adjusted as the problems/feature requests came in.

Actually, I think the real lesson here is to just use JSON 😉

Leave a Comment