Posted By

nigelnquande on 12/15/14


Tagged

html text convert plain


Versions (?)

Who likes this?

2 people have marked this snippet as a favorite

Leon1950
JeffSFO


HTML to Plain Text


 / Published in: PHP
 

This function takes HTML input and converts it to plain text. It needs improvement so that it converts multiple blank lines to a single blank line and converts an <a ... > link to the markup equivalent (same for images). It should be rewritten to using a DOM/XML parser.

  1. function html_to_plain($html) {
  2. $plain_message = str_replace(array('<br />', '<br>', '<p>', '</p>', '</title>'), "\n", $html);
  3. $plain_message = str_replace(array("<table>", "</tr></table>"), "\n============", $plain_message);
  4. $plain_message = str_replace("<tr>", "| ", $plain_message);
  5. $plain_message = str_replace("</tr>", "\n-------------", $plain_message);
  6. $plain_message = str_replace(array("<title>", '<h1>'), "# ", $plain_message);
  7. $plain_message = str_replace(array('<th>'), "__", $plain_message);
  8. $plain_message = str_replace(array('</th>'), "__ | ", $plain_message);
  9. $plain_message = str_replace(array('</td>'), " | ", $plain_message);
  10. $plain_message = str_replace(array('<strong>', '</strong>'), '__', $plain_message);
  11. $plain_message = str_replace(array('<em>', '</em>'), '_', $plain_message);
  12. $plain_message = str_replace(array('<a'), '[', $plain_message);
  13. $plain_message = str_replace(array('href="'), '](', $plain_message);
  14. $plain_message = str_replace(array('<img src="'), '[', $plain_message);
  15. $plain_message = str_replace(array('alt='), '', $plain_message);
  16. $plain_message = str_replace(array('/>'), ']', $plain_message);
  17. $plain_message = strip_tags($plain_message);
  18. //$plain_message = str_replace(" ", ' ', $plain_message);
  19. $plain_message = preg_replace('|(?mi-Us)[ ]{2,}|', ' ', $plain_message);
  20.  
  21. return $plain_message ;
  22. }

Report this snippet  

You need to login to post a comment.