/ Published in: Bash
Given a list of HTML links (for example, a Google results page that has been saved locally) check each link that points to a page on a given domain, and report if Tidy complains that its doctype declaration is missing.
Besides DOCTYPE, other strings to search for include "discarding", "lacks value" and "Error:"
Remember to replace MY_DOMAIN with the actual domain you want to validate against.
Expand |
Embed | Plain Text
lwp-request -o links file:///SAVED_GOOGLE_RESULTS.htm|grep -P "A\s*http://\w*.MY_DOMAIN" | perl -pe "m#A\s*(.*)#; $notify = qq{\n\t$1:\n}; $_=qx{lwp-request $1|tidy -eq 2>&1|grep -e Error -e DOCTYPE}; $_ = $notify .$_ if $_" > report.txt #Alternate: print out just the HTTP response code for linked pages that have my domain in the link lwp-request -o links http://onemorebug.com|perl -pe "chomp; $_ =~ s#\w*\s*##; undef $_ unless m/onemorebug.com/; $_ .= qq{\t} . qx{lwp-request -ds $_} if $_" #old version lwp-request -o links file:///C:/SAVED_GOOGLE_RESULTS.htm|grep -P "A\s*http://\w*.MY_DOMAIN" | perl -pe "m#A\s*(.*)#; $notify = qq{\t$1: }; $_=qx{lwp-request $1|tidy -e 2>&1 | grep \"DOCTYPE\"}; print $notify if $_"
You need to login to post a comment.
