Posted By

ssteuteville on 07/23/15


Tagged

scrapy


Versions (?)

Who likes this?

4 people have marked this snippet as a favorite

tesupc
Coldblackice
shantanuo
knayam


Scrape Easy


 / Published in: Python
 

URL: https://github.com/ssteuteville/scrapyz

Hey guys. I implemented a package to make writing simple spiders much easier. Here is some example code.

  1. class RedditSpider(GenericSpider):
  2. name = "reddit"
  3. start_urls = ["https://www.reddit.com/"]
  4.  
  5. class Meta:
  6. items = ".thing"
  7. targets = [
  8. CssTarget("rank", ".rank::text"),
  9. CssTarget("upvoted", ".upvoted::text"),
  10. CssTarget("dislikes", ".dislikes::text"),
  11. CssTarget("likes", ".likes::text"),
  12. CssTarget("title", "a.title::text"),
  13. CssTarget("domain", ".domain > a::text"),
  14. CssTarget("datetime", ".tagline > time::attr(datetime)"),
  15. CssTarget("author", ".tagline > .author::text"),
  16. CssTarget("subreddit", ".tagline > .subreddit::text"),
  17. CssTarget("comments", ".comments::text")
  18. ]
  19.  
  20.  
  21. class RedditSpider2(IndexDetailSpider):
  22. name = "reddit2"
  23. start_urls = ["https://www.reddit.com/"]
  24.  
  25. class Meta:
  26. detail_path = CssTarget("detail_path", ".title > a::attr(href)", [absolute_url])
  27. detail_targets = [
  28. CssTarget("content", ".usertext-body > div > p::text", [join]),
  29. ]
  30. items = ".thing"
  31. targets = [
  32. CssTarget("rank", ".rank::text"),
  33. CssTarget("upvoted", ".upvoted::text"),
  34. CssTarget("dislikes", ".dislikes::text"),
  35. CssTarget("likes", ".likes::text"),
  36. CssTarget("title", "a.title::text"),
  37. CssTarget("domain", ".domain > a::text"),
  38. CssTarget("datetime", ".tagline > time::attr(datetime)"),
  39. CssTarget("author", ".tagline > .author::text"),
  40. CssTarget("subreddit", ".tagline > .subreddit::text"),
  41. CssTarget("comments", ".comments::text")
  42. ]

Report this snippet  

You need to login to post a comment.