/ Published in: Other

URL: https://www.diggernaut.com
This config can be used with diggernaut service to scrape adidas.com and retrieve products information. Attention: you will need to use your own proxy for this digger as basic Diggernaut's proxies doesnt work with adidas.com
- You need to create free account at diggernaut.com
- Login to your account
- Create a project with any name and description you want
- Get into your new project by clicking it and create new digger with any name
- Then you will see 3 options suggested to you, you need to use one where you will use meta-language
- Config editor will open and you can simply copy and paste config code and click on save button.
- Run your digger.
- Wait for completion.
- Download data.
- Schedule your runs if required.
Expand |
Embed | Plain Text
--- config: proxy: #USE YOUR OWN PROXY HERE LIKE 1.1.1.1:8888 agent: Chrome debug: 2 do: - pool_clear: catalog - pool_clear: pages - walk: to: http://www.adidas.com/us/ do: - find: path: div.contentasset slice: 0:2 do: - find: path: ul>li>a do: - parse: attr: href - trim - if: match: \w+ do: - normalize: routine: url - link_add: pool: catalog - walk: to: links pool: catalog do: - sleep: 3 - find: path: a.product-link:not(.design-starter-click) do: - parse: attr: href - trim - if: match: \w+ do: - normalize: routine: url - link_add: pool: pages - find: path: a.pagging-next-page slice: 0 do: - parse: attr: href - trim - if: match: \w+ do: - normalize: routine: url - link_add: pool: catalog - cookie_reset - walk: to: links pool: pages do: - sleep: 3 - find: path: head do: - variable_clear: pid - object_new: product - eval: routine: js body: '(function (){var d = new Date(); return d.toISOString()})();' - object_field_set: object: product field: date - static_get: url - object_field_set: object: product field: url - filter: args: \/([A-Z0-9]+)\.html$ - if: match: \w+ do: - walk: to: https://www.adidas.com/api/products/<%register%>?sitePath=us do: - find: path: body_safe>id do: - parse - space_dedupe - trim - object_field_set: object: product field: sku - variable_set: pid - register_set: Adidas - object_field_set: object: product field: brand - find: path: body_safe>name do: - parse - space_dedupe - trim - object_field_set: object: product field: name - find: path: body_safe>product_description>text do: - parse - space_dedupe - trim - object_field_set: object: product field: description - find: path: body_safe>breadcrumb_list>text do: - parse - space_dedupe - trim - if: match: \w+ do: - object_field_set: object: product field: category joinby: "|" - find: path: body_safe>view_list>image_url do: - parse - if: match: \w+ do: - normalize: routine: url - object_field_set: object: product field: images joinby: "|" - find: path: body_safe>attribute_list>color,body_safe>product_link_list>default_color do: - parse - space_dedupe - trim - if: match: \w+ do: - object_field_set: object: product field: variations joinby: "|" - find: path: body_safe>pricing_information>currentprice do: - parse - object_field_set: object: product type: float field: price - register_set: USD - object_field_set: object: product field: currency - object_save: name: product - cookie_reset
You need to login to post a comment.