Posted By

jerryvig on 01/16/12


Tagged

groovy stats alexa competecom webstats


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

jbyerson


Compete.com Webstats Scrape Groovy


 / Published in: Groovy
 

This is a script for collecting webstats data from compete.com. The scripts takes as input the list of domains that you want to analyze and outputs the compete.com webstats data.

  1. import com.gargoylesoftware.htmlunit.WebClient
  2. import com.gargoylesoftware.htmlunit.BrowserVersion
  3.  
  4. def domainList = (new File("/root/Desktop/Morningstar/AlexaTop3000.txt")).readLines()
  5. def outFile = new File("/root/Desktop/Morningstar/CompeteStats3000.csv")
  6. outFile.delete()
  7. def wc = new WebClient( BrowserVersion.FIREFOX_3_6 )
  8.  
  9. domainList.each {
  10. def domainName = it.trim()
  11. println domainName
  12. def url = "http://siteanalytics.compete.com/export_csv/${domainName}/"
  13. def page = wc.getPage( url )
  14. def pageLines = page.getContent().split("\n")
  15.  
  16. def lineCount = 0
  17. pageLines.each { line ->
  18. if ( lineCount > 3 ) {
  19. outFile.append( "\"${domainName}\",${line}\n" )
  20. }
  21. lineCount++
  22. }
  23. sleep( 400 )
  24. }

Report this snippet  

You need to login to post a comment.