Posted By

diggernaut on 10/29/16


Tagged

data web scraping extraction journalism diggernaut


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

diggernaut


El Paso County Sheriff's Office Blotter


 / Published in: Other
 

URL: https://www.diggernaut.com

Configuration for Diggernaut's digger to scrape El Paso County Sheriff's Office Blotter. Written using Excavator application.

  1. do:
  2. - link_add: 'http://www.epcsheriffsoffice.com/blotter'
  3. - walk:
  4. to: links
  5. do:
  6. - find:
  7. path: 'tbody > tr'
  8. do:
  9. - object_new: incident
  10. - find:
  11. path: ' td:nth-child(1)'
  12. do:
  13. - parse
  14. - object_field_set:
  15. object: incident
  16. field: call_number
  17. - find:
  18. path: ' td:nth-child(2) > .date-display-single:nth-child(1)'
  19. do:
  20. - parse
  21. - object_field_set:
  22. object: incident
  23. field: date
  24. - find:
  25. path: div .date-display-single
  26. do:
  27. - parse
  28. - object_field_set:
  29. object: incident
  30. field: time
  31. - find:
  32. path: ' td:nth-child(3)'
  33. do:
  34. - parse
  35. - object_field_set:
  36. object: incident
  37. field: address
  38. - find:
  39. path: ' td:nth-child(4)'
  40. do:
  41. - parse
  42. - object_field_set:
  43. object: incident
  44. field: grid
  45. - find:
  46. path: ' td:nth-child(5)'
  47. do:
  48. - parse
  49. - object_field_set:
  50. object: incident
  51. field: problem
  52. - find:
  53. path: ' td:nth-child(6)'
  54. do:
  55. - parse
  56. - object_field_set:
  57. object: incident
  58. field: disposition
  59. - object_save:
  60. name: incident
  61. - find:
  62. path: .pager-next a
  63. do:
  64. - parse:
  65. attr: href
  66. - link_add

Report this snippet  

You need to login to post a comment.