We Recommend

bash Cookbook: Solutions and Examples for bash Users bash Cookbook: Solutions and Examples for bash Users
bash Cookbook teaches shell scripting the way Unix masters practice the craft. It presents a variety of recipes and tricks for all levels of shell programmers so that anyone can become a proficient user of the most common Unix shell -- the bash shell -- and cygwin or other popular Unix emulation packages.


Posted By

noah on 12/09/07


Tagged

search google results commandline iterator perl metrics aggregator lynx scraping analysis


Versions (?)


Scrape Google from the command line


Published in: Bash 


This code is POC only -- actually using it would violate Google's TOS, which forbids scraping. It is published here for educational value only.

Hypothetically, the following command should return a list of the top 500 or so hits in Google for mysite.com.

The results will be prepended with digits, followed by a dot and some whitespace (Lynx adds these).

  1. perl -e "$i=0;while($i<1000){open(WGET,qq/|xargs lynx -dump/);printf WGET qq{http://www.google.com/search?q=site:mysite.com&hl=en&start=$i&sa=N},$i+=10}"|grep "\/\/mysite.com\/"

Report this snippet 

You need to login to post a comment.