Posted By

tillkruess on 11/06/09


Tagged

html xhtml whitespace clean indent indention


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

vali29


Clean / Indent XHTML Document


 / Published in: PHP
 

URL: http://pralinenschachtel.de/

Do you every wanted to clean and perfectly intent your xhtml document?

  1. function clean_xhtml($string, $keep_tags = null) {
  2.  
  3. if (!$keep_tags) {
  4. $keep_regexp = '~<script[^>]*>.*?<\/script>|<pre[^>]*>.*?<\/pre>|<textarea[^>]*>.*?<\/textarea>~s';
  5. }
  6.  
  7. // replace
  8. with \n
  9. $string = preg_replace('~
  10. ~m', "\n", $string);
  11.  
  12. // replace \r with \n
  13. $string = preg_replace('~\r~m', "\n", $string);
  14.  
  15. // remove whitespace from the beginnig
  16. $string = preg_replace('~^\s+~s', '', $string);
  17.  
  18. // remove whitespace from the end
  19. $string = preg_replace('~\s+$~s', '', $string);
  20.  
  21. // store all tag which should remain the same
  22. preg_match_all($keep_regexp, $string, $original_tags);
  23.  
  24. // remove whitespace from the beginning of each line
  25. $string = preg_replace('~^\s+~m', '', $string);
  26.  
  27. // remove whitespace from the end of each line
  28. $string = preg_replace('~\s+$~m', '', $string);
  29.  
  30. // removes empty lines
  31. $string = preg_replace('~\n\s*\n~ms', "\n", $string);
  32.  
  33. // removes line breaks inside normal text
  34. $string = preg_replace('~([^>\s])(\s\s+|\n)([^<\s])~m', '$1 $3', $string);
  35.  
  36. // correct indention
  37. $indent = 0;
  38. $string = explode("\n", $string);
  39. foreach ($string as &$line) {
  40. $correction = intval(substr($line, 0, 2) == '</'); // correct indention, if line starts with closing tag
  41. $line = str_repeat("\t", $indent - $correction).$line;
  42. $indent += substr_count($line, '<'); // indent every tag
  43. $indent -= substr_count($line, '<!'); // subtract doctype declaration
  44. $indent -= substr_count($line, '<?'); // subtract processing instructions
  45. $indent -= substr_count($line, '/>'); // subtract self closing tags
  46. $indent -= substr_count($line, '</') * 2; // subtract closing tags
  47. }
  48. $string = implode("\n", $string);
  49.  
  50. // fetch all tag which could been changed
  51. preg_match_all($keep_regexp, $string, $current_tags);
  52.  
  53. // restore all stored tags
  54. foreach ($current_tags[0] as $key => $match) {
  55. $string = str_replace($match, $original_tags[0][$key], $string);
  56. }
  57.  
  58. return $string;
  59.  
  60. }

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: tillkruess on November 6, 2009

Just try something like:

$contents = file_get_contents(dirname(__FILE__).'/index.html'); 
print clean_xhtml($contents);
Posted By: tillkruess on December 2, 2009

looks like snipplr.com has some display issues with windows line feeds in comments or whatever: http://nopaste.info/1cb01b379f.html

You need to login to post a comment.