Posted By

martinson on 02/15/18


Tagged

free web profile Business scraper instagram


Versions (?)

Instagram Business Profile Scraper


 / Published in: Other
 

URL: https://www.diggernaut.com

To use this free scraper for instagram business profiles you need to have account at known web scraping service. This scraper allow you to scrape contact details from business profiles and also indicate if profile is business or not.

Scraper uses mobile API, so you will need to use instagram login and password. MAKE SURE YOU DONT USE YOUR MAIN ACCOUNT. API usage is unofficial and you are using it on your own risk.

So to use it you need to login to your Diggernaut account, create project, then create a digger and then click on "Add configuration" button and copy&paste below scraper code there.

You need to set your instagram username at line 8, instagram password at line 11 and list of usernames you want to retrieve data for (as comma separated list) at the line 14.

Then save your configuration and run the digger. In some time you should be able to download data.

  1. ---
  2. config:
  3. agent: Firefox
  4. debug: 2
  5. do:
  6. - variable_set:
  7. field: username
  8. value: YOU_ACCOUNT_USERNAME_HERE
  9. - variable_set:
  10. field: password
  11. value: YOU_ACCOUNT_PASSWORD_HERE
  12. - variable_set:
  13. field: accounts
  14. value: LIST OF USERNAMES YOU WANT TO EXTRACT, COMMA SEPARATED
  15. - walk:
  16. to: https://www.instagram.com/
  17. do:
  18. - find:
  19. path: body
  20. do:
  21. - parse:
  22. filter: window\._sharedData\s+\=\s+([^;]+);
  23. - normalize:
  24. routine: json2xml
  25. - to_block
  26. - find:
  27. path: config>csrf_token
  28. do:
  29. - parse
  30. - variable_set: token
  31. - walk:
  32. to:
  33. post: https://www.instagram.com/accounts/login/ajax/
  34. headers:
  35. x-csrftoken: <%token%>
  36. x-instagram-ajax: 1
  37. x-requested-with: XMLHttpRequest
  38. data:
  39. username: <%username%>
  40. password: <%password%>
  41. do:
  42. - find:
  43. path: status
  44. do:
  45. - parse
  46. - if:
  47. match: "fail"
  48. do:
  49. - cannot_login_probably_checkpoint_is_required
  50. - exit
  51. - find:
  52. path: authenticated
  53. do:
  54. - parse
  55. - if:
  56. match: "true"
  57. else:
  58. - wrong_login_or_password
  59. - exit
  60. - cookie_get: mid
  61. - variable_set: mid
  62. - cookie_get: rur
  63. - variable_set: rur
  64. - cookie_get: ds_user_id
  65. - variable_set: dsuserid
  66. - cookie_get: sessionid
  67. - variable_set: sessionid
  68. - variable_get: accounts
  69. - to_block
  70. - split:
  71. context: text
  72. delimiter: ','
  73. - find:
  74. path: div.splitted
  75. do:
  76. - parse
  77. - space_dedupe
  78. - trim
  79. - variable_set: account
  80. - walk:
  81. to: https://www.instagram.com/<%account%>/?__a=1
  82. do:
  83. - find:
  84. path: graphql > user > id
  85. do:
  86. - parse
  87. - variable_set: id
  88. - walk:
  89. to: https://i.instagram.com/api/v1/users/<%id%>/info/
  90. headers:
  91. X-IG-App-ID: 567067343352427
  92. X-IG-Capabilities: 3brDAw==
  93. X-IG-Connection-Type: WIFI
  94. X-IG-Connection-Speed: 3400
  95. X-IG-Bandwidth-Speed-KBPS: -1.000
  96. X-IG-Bandwidth-TotalBytes-B: 0
  97. X-IG-Bandwidth-TotalTime-MS: 0
  98. Cookie: mid=<%mid%>; csrftoken=<%token%>; rur=<%rur%>; ds_user_id=<%dsuserid%>; sessionid=<%sessionid%>; ig_or=;
  99. X-FB-HTTP-Engine: Liger
  100. Accept: '*/*'
  101. Accept-Language: en-US
  102. do:
  103. - find:
  104. path: body_safe > user
  105. do:
  106. - object_new: item
  107. - find:
  108. path: address_street
  109. do:
  110. - parse
  111. - space_dedupe
  112. - trim
  113. - object_field_set:
  114. object: item
  115. field: address_street
  116. - find:
  117. path: category
  118. do:
  119. - parse
  120. - space_dedupe
  121. - trim
  122. - object_field_set:
  123. object: item
  124. field: category
  125. - find:
  126. path: city_name
  127. do:
  128. - parse
  129. - space_dedupe
  130. - trim
  131. - object_field_set:
  132. object: item
  133. field: city_name
  134. - find:
  135. path: contact_phone_number
  136. do:
  137. - parse
  138. - space_dedupe
  139. - trim
  140. - object_field_set:
  141. object: item
  142. field: contact_phone_number
  143. - find:
  144. path: external_url
  145. do:
  146. - parse
  147. - space_dedupe
  148. - trim
  149. - object_field_set:
  150. object: item
  151. field: external_url
  152. - find:
  153. path: full_name
  154. do:
  155. - parse
  156. - space_dedupe
  157. - trim
  158. - object_field_set:
  159. object: item
  160. field: full_name
  161. - find:
  162. path: is_business
  163. do:
  164. - parse
  165. - space_dedupe
  166. - trim
  167. - object_field_set:
  168. object: item
  169. field: is_business
  170. - find:
  171. path: latitude
  172. do:
  173. - parse
  174. - space_dedupe
  175. - trim
  176. - object_field_set:
  177. object: item
  178. field: latitude
  179. - find:
  180. path: longitude
  181. do:
  182. - parse
  183. - space_dedupe
  184. - trim
  185. - object_field_set:
  186. object: item
  187. field: longitude
  188. - find:
  189. path: pk
  190. do:
  191. - parse
  192. - space_dedupe
  193. - trim
  194. - object_field_set:
  195. object: item
  196. field: id
  197. - find:
  198. path: public_email
  199. do:
  200. - parse
  201. - space_dedupe
  202. - trim
  203. - object_field_set:
  204. object: item
  205. field: public_email
  206. - find:
  207. path: public_phone_country_code
  208. do:
  209. - parse
  210. - space_dedupe
  211. - trim
  212. - object_field_set:
  213. object: item
  214. field: public_phone_country_code
  215. - find:
  216. path: public_phone_number
  217. do:
  218. - parse
  219. - space_dedupe
  220. - trim
  221. - object_field_set:
  222. object: item
  223. field: public_phone_number
  224. - find:
  225. path: username
  226. do:
  227. - parse
  228. - space_dedupe
  229. - trim
  230. - object_field_set:
  231. object: item
  232. field: username
  233. - find:
  234. path: zip
  235. do:
  236. - parse
  237. - space_dedupe
  238. - trim
  239. - object_field_set:
  240. object: item
  241. field: zip
  242. - object_save:
  243. name: item
  244. - sleep: 5

Report this snippet  

You need to login to post a comment.