<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>Snipplr</title>
<link>http://snipplr.com/language/bash/tags/scraping</link>
<description>Recent snippets posted on Snipplr.com</description>
<language>en-us</language>
<pubDate>Sat, 11 Oct 2008 06:49:35 GMT</pubDate>
<item>
<title>(Bash) LDAP/NTLM authentication on the command line with curl - noah</title>
<link>http://snipplr.com/view/5578/ldapntlm-authentication-on-the-command-line-with-curl/</link>
<description><![CDATA[ <p>Note that curl will not follow redirects.  Will prompt interactively for password if the -u option is omitted.</p> ]]></description>
<pubDate>Thu, 27 Mar 2008 11:33:30 GMT</pubDate>
<guid>http://snipplr.com/view/5578/ldapntlm-authentication-on-the-command-line-with-curl/</guid>
</item>
<item>
<title>(Bash) WGet entire site with wget -pkr - noah</title>
<link>http://snipplr.com/view/5094/wget-entire-site-with-wget-pkr/</link>
<description><![CDATA[ <p>Download and archive an entire Web site, starting with the given page and recursing down 1 level.   Adjust how many levels deep by changing the numeric argument given after -l

Won't follow @import links in CSS.</p> ]]></description>
<pubDate>Sat, 16 Feb 2008 21:42:46 GMT</pubDate>
<guid>http://snipplr.com/view/5094/wget-entire-site-with-wget-pkr/</guid>
</item>
<item>
<title>(Bash) Scrape Google from the command line - noah</title>
<link>http://snipplr.com/view/4299/scrape-google-from-the-command-line/</link>
<description><![CDATA[ <p>This code is POC only -- actually using it would violate Google's TOS, which forbids scraping.  It is published here for educational value only.

Hypothetically, the following command should return a list of the top 500 or so hits in Google for mysite.com.

The results will be prepended with digits, followed by a dot and some whitespace (Lynx adds these).</p> ]]></description>
<pubDate>Sun, 09 Dec 2007 21:16:58 GMT</pubDate>
<guid>http://snipplr.com/view/4299/scrape-google-from-the-command-line/</guid>
</item>
<item>
<title>(Bash) check linked pages for Tidy validation errors, on the command line - noah</title>
<link>http://snipplr.com/view/4130/check-linked-pages-for-tidy-validation-errors-on-the-command-line/</link>
<description><![CDATA[ <p>Given a list of HTML links (most likely a saved Google results page) check each linked page and report if Tidy complains that its doctype declaration is missing.

Besides DOCTYPE, other strings to search for include "discarding", "lacks value", "Error:"</p> ]]></description>
<pubDate>Tue, 13 Nov 2007 23:47:28 GMT</pubDate>
<guid>http://snipplr.com/view/4130/check-linked-pages-for-tidy-validation-errors-on-the-command-line/</guid>
</item>
<item>
<title>(Bash) Validate a list of Web pages with Tidy, on the command line - noah</title>
<link>http://snipplr.com/view/4129/validate-a-list-of-web-pages-with-tidy-on-the-command-line/</link>
<description><![CDATA[ <p></p> ]]></description>
<pubDate>Tue, 13 Nov 2007 22:54:37 GMT</pubDate>
<guid>http://snipplr.com/view/4129/validate-a-list-of-web-pages-with-tidy-on-the-command-line/</guid>
</item>
<item>
<title>(Bash) Download linked JPEGs from a Web page, on the command line - noah</title>
<link>http://snipplr.com/view/4063/download-linked-jpegs-from-a-web-page-on-the-command-line/</link>
<description><![CDATA[ <p>The following command will download all the files with a JPG extension that are linked from http://flickr.com.</p> ]]></description>
<pubDate>Fri, 02 Nov 2007 22:57:45 GMT</pubDate>
<guid>http://snipplr.com/view/4063/download-linked-jpegs-from-a-web-page-on-the-command-line/</guid>
</item>
</channel>
</rss>