Posted By

eristoddle on 04/11/12


Tagged

soup pythonpinterestbeautiful


Versions (?)

Pinterest Scraping with Python and BeautifulSoup


 / Published in: Python
 

This requires: * BeautifulSoup - http://www.crummy.com/software/BeautifulSoup/ * SoupSelect - http://code.google.com/p/soupselect/

  1. def pin_categories():
  2. soup = BeautifulSoup.BeautifulSoup(URL("https://pinterest.com/").download())
  3. cat_list = []
  4. for c in select(soup, ".submenu a"):
  5. cat_list.append(c['href'])
  6. return cat_list
  7.  
  8. def crawl_pin_category(category):
  9. #TODO: find next pages
  10. soup = BeautifulSoup.BeautifulSoup(URL("https://pinterest.com/" + category).download())
  11. return harvest_pins(soup)
  12.  
  13. def harvest_pins(soup):
  14. return [p.find("a",{"class":"PinImage ImgLink"})['href'] for p in select(soup, ".pin")]
  15.  
  16. def grab_pin(pin_id):
  17. soup = BeautifulSoup.BeautifulSoup(URL("https://pinterest.com" + pin_id).download())
  18. return {
  19. "url": select(soup, 'meta[property="og:url"]')[0]['content'],
  20. "title": select(soup, 'meta[property="og:title"]')[0]['content'],
  21. "description": select(soup, 'meta[property="og:description"]')[0]['content'],
  22. "image": select(soup, 'meta[property="og:image"]')[0]['content'],
  23. "pinboard": select(soup, 'meta[property="pinterestapp:pinboard"]')[0]['content'],
  24. "pinner": select(soup, 'meta[property="pinterestapp:pinner"]')[0]['content'],
  25. "source": select(soup, 'meta[property="pinterestapp:source"]')[0]['content'],
  26. "likes": select(soup, 'meta[property="pinterestapp:likes"]')[0]['content'],
  27. "repins": select(soup, 'meta[property="pinterestapp:repins"]')[0]['content'],
  28. "comments": select(soup, 'meta[property="pinterestapp:comments"]')[0]['content'],
  29. "actions": select(soup, 'meta[property="pinterestapp:actions"]')[0]['content'],
  30. }

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: cornmacabre on April 6, 2013

Great post - this could potentially provide a really valuable functionality for SEO & Content Marketing. However, the 'Import BeautifulSoup' stuff seems missing from this script, and I can't seem to get the script to output anything.

I know it's been over a year since you posted this, but do you have any documentation or full .py script for this? I'd really love to get this working and start tweaking it. It provides a great example of the functionality of BeautifulSoup. Thanks!

You need to login to post a comment.