Posted By

halotis on 08/14/09


Tagged

links web beautifulsoup delicious scraper


Versions (?)

Get Del.icio.us links from a search


 / Published in: Python
 

URL: http://www.halotis.com/2009/07/31/find-links-on-del-icio-us-with-a-python-script/

find great websites by scraping links from delicious.com

  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. # (C) 2009 HalOtis Marketing
  4. # written by Matt Warren
  5. # http://halotis.com/
  6.  
  7. """
  8. Scraper for Del.icio.us SERP.
  9.  
  10. This pulls the results for a match for a query on http://del.icio.us.
  11. """
  12.  
  13. import urllib2
  14. import re
  15.  
  16. from BeautifulSoup import BeautifulSoup
  17.  
  18. def get_delicious_results(query, page_limit=10):
  19.  
  20. page = 1
  21. links = []
  22.  
  23. while page < page_limit :
  24. url='http://delicious.com/search?p=' + '%20'.join(query.split()) + '&context=all&lc=1&page=' + str(page)
  25. req = urllib2.Request(url)
  26. HTML = urllib2.urlopen(req).read()
  27. soup = BeautifulSoup(HTML)
  28.  
  29. next = soup.find('a', attrs={'class':re.compile('.*next$', re.I)})
  30.  
  31. #links is a list of (url, title) tuples
  32. links += [(link['href'], ''.join(link.findAll(text=True)) ) for link in soup.findAll('a', attrs={'class':re.compile('.*taggedlink.*', re.I)}) ]
  33.  
  34. if next :
  35. page = page+1
  36. else :
  37. break
  38.  
  39. return links
  40.  
  41. if __name__=='__main__':
  42. links = get_delicious_results('halotis marketing')
  43. print links

Report this snippet  

You need to login to post a comment.