We Recommend

bash Cookbook: Solutions and Examples for bash Users bash Cookbook: Solutions and Examples for bash Users
bash Cookbook teaches shell scripting the way Unix masters practice the craft. It presents a variety of recipes and tricks for all levels of shell programmers so that anyone can become a proficient user of the most common Unix shell -- the bash shell -- and cygwin or other popular Unix emulation packages.


Posted By

noah on 11/13/07


Tagged

command Bash commandline validation crawler perl tidy textonly scraping crawling


Versions (?)


check linked pages for Tidy validation errors, on the command line


Published in: Bash 


Given a list of HTML links (most likely a saved Google results page) check each linked page and report if Tidy complains that its doctype declaration is missing.

Besides DOCTYPE, other strings to search for include "discarding", "lacks value", "Error:"

  1. lwp-request -o links file:///SAVED_GOOGLE_RESULTS.htm|grep -P "A\s*http://\w*.MY_DOMAIN" | perl -pe "m#A\s*(.*)#; $notify = qq{\n\t$1:\n}; $_=qx{lwp-request $1|tidy -eq 2>&1|grep -e Error -e DOCTYPE}; $_ = $notify .$_ if $_" > report.txt
  2.  
  3. #Alternate: print out just the HTTP response code for linked pages that have my domain in the link
  4.  
  5. lwp-request -o links http://onemorebug.com|perl -pe "chomp; $_ =~ s#\w*\s*##; undef $_ unless m/onemorebug.com/; $_ .= qq{\t} . qx{lwp-request -ds $_} if $_"
  6.  
  7. #old version
  8.  
  9. lwp-request -o links file:///C:/SAVED_GOOGLE_RESULTS.htm|grep -P "A\s*http://\w*.MY_DOMAIN" | perl -pe "m#A\s*(.*)#; $notify = qq{\t$1: }; $_=qx{lwp-request $1|tidy -e 2>&1 | grep \"DOCTYPE\"}; print $notify if $_"

Report this snippet 

You need to login to post a comment.