Return to Snippet

Revision: 62033
at January 29, 2013 09:47 by iroybot


Updated Code
# command line/exec(),etc. or use the php functions to tidy up your document
tidy -indent -utf8 -xml -wrap 1000 input.xml > output.xml

<?php
/**
  * Replaces invalid:
  * <element>
  *     <![CDATA[whatever content]]>
  * </element>
  *
  * With well-formed:
  * <element><![CDATA[whatever content]]></element>
  */
$out = preg_replace('~>[\s\t\w\n\r]+<\!\[CDATA\[~', '><![CDATA[', file_get_contents("output.xml"));
$out = preg_replace('~]]>\s+<~', ']]><', $out);
file_put_contents("final.xml", $out);
?>

Revision: 62032
at January 29, 2013 09:43 by iroybot


Initial Code
# command line/exec(),etc. or use the php functions to tidy up your document
tidy -indent -utf8 -xml -wrap 1000 input.xml > output.xml

<?php
$out = preg_replace('~>[\s\t\w\n\r]+<\!\[CDATA\[~', '><![CDATA[', file_get_contents("output.xml"));
$out = preg_replace('~]]>\s+<~', ']]><', $out);
file_put_contents("final.xml", $out);
?>

Initial URL

                                

Initial Description
For some reason tidy inserts new lines before/after <![CDATA[ content in XML files. Since I like the benefits of a reformatted, readable XML...
... i run tidy first, then remove the spaces before/after the CDATA block:

Initial Title
Fix CDATA blocks in XML files after reformatting with Tidy

Initial Tags
xml

Initial Language
PHP