Strip HTML Tags


/ Published in: Ruby

This replaces the PHP-based function in the HTML bundle. You need to add two files to your bundle's Support/lib folder (create the directories if they're not present):

http://dev.rubyonrails.org/browser/trunk/actionpack/lib/action_controller/vendor/html-scanner/html/node.rb?format=raw
http://dev.rubyonrails.org/browser/trunk/actionpack/lib/action_controller/vendor/html-scanner/html/tokenizer.rb?format=raw


Copy this code and paste it in your HTML
  1. #!/usr/bin/env ruby -w
  2. require ENV['TM_BUNDLE_SUPPORT'] + "/lib/tokenizer.rb"
  3. require ENV['TM_BUNDLE_SUPPORT'] + "/lib/node.rb"
  4.  
  5. def strip_tags(html)
  6. return html if html.empty? || !html.include?('<')
  7. output = ""
  8. tokenizer = HTML::Tokenizer.new(html)
  9. while token = tokenizer.next
  10. node = HTML::Node.parse(nil, 0, 0, token, false)
  11. output += token unless (node.kind_of? HTML::Tag) or (token =~ /^<!/)
  12. end
  13. return output
  14. end
  15.  
  16. print strip_tags(STDIN.read)

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.