Posted By

localhorst on 02/27/08


Tagged

php tags remove clean word MS


Versions (?)

Who likes this?

6 people have marked this snippet as a favorite

pagetoscreen
vodou
roflman79
vali29
cindreta
mikael12


Clean Word HTML using Regular Expressions


 / Published in: PHP
 

URL: http://tim.mackey.ie/CommentView,guid,2ece42de-a334-4fd0-8f94-53c6602d5718.aspx

The PHP Code appears in Post Comments

  1. function cleanHTML($html) {
  2. /// <summary>
  3. /// Removes all FONT and SPAN tags, and all Class and Style attributes.
  4. /// Designed to get rid of non-standard Microsoft Word HTML tags.
  5. /// </summary>
  6. // start by completely removing all unwanted tags
  7.  
  8. $html = ereg_replace("<(/)?(font|span|del|ins)[^>]*>","",$html);
  9.  
  10. // then run another pass over the html (twice), removing unwanted attributes
  11.  
  12. $html = ereg_replace("<([^>]*)(class|lang|style|size|face)=(\"[^\"]*\"|'[^']*'|[^>]+)([^>]*)>","<\\1>",$html);
  13. $html = ereg_replace("<([^>]*)(class|lang|style|size|face)=(\"[^\"]*\"|'[^']*'|[^>]+)([^>]*)>","<\\1>",$html);
  14.  
  15. return $html
  16. }

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: Naveenkumar on November 25, 2009

can u give me code for extracting the destination attribute from the tag. i.e

My Contributions from this tag i need only http://www.dreamincode.net/?p=kudos&kudosmember=292011. please give me regular expression code in php

You need to login to post a comment.