PHP “pretty print” HTML (not Tidy)

you’re right, there seems to be no indentation for HTML (others are also confused). XML works, even with loaded code.

function tidyHTML($buffer) {
    // load our document into a DOM object
    $dom = new DOMDocument();
    // we want nice output
    $dom->preserveWhiteSpace = false;
    $dom->formatOutput = true;

// start output buffering, using our nice
// callback function to format the output.

    <title>foo bar</title><meta name="bar" value="foo"><body><h1>bar foo</h1><p>It's like comparing apples to oranges.</p></body></html>
// this will be called implicitly, but we'll
// call it manually to illustrate the point.


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "">
<title>foo bar</title>
<meta name="bar" value="foo">
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>

the same with saveXML() …

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "">
    <title>foo bar</title>
    <meta name="bar" value="foo"/>
    <h1>bar foo</h1>
    <p>It's like comparing apples to oranges.</p>

probably forgot to set preserveWhiteSpace=false before loadHTML?

disclaimer: i stole most of the demo code from tyson clugg/php manual comments. lazy me.

UPDATE: i now remember some years ago i tried the same thing and ran into the same problem. i fixed this by applying a dirty workaround (wasn’t performance critical): i just somehow converted around between SimpleXML and DOM until the problem vanished. i suppose the conversion got rid of those nodes. maybe load with dom, import with simplexml_import_dom, then output the string, parse this with DOM again and then printed it pretty. as far as i remember this worked (but it was really slow).

Leave a Comment
