Easy scraping and HTML parsing with PHP5 and XPath

 Published in: PHP

This example uses filegetcontents to retrieve remote HTML. From there, we can parse through it using PHP5's DOMDocument and DOMXpath. XPath Queries are easy to create using the Firefox extension "XPather"

  1. <?php
  2. //a URL you want to retrieve
  3. $my_url = '';
  4. $html = file_get_contents($my_url);
  5. $dom = new DOMDocument();
  6. $dom->loadHTML($html);
  7. $xpath = new DOMXPath($dom);
  9. //Put your XPath Query here
  10. $my_xpath_query = "/html/body/div[@id='container']/div[@id='contents']/div[@class='list' and @id='wrapper']/div[@class='main' and position()=1]/div[contains(@class, 'news-summary')]/div[@class='news-body']/h3";
  11. $result_rows = $xpath->query($my_xpath_query);
  13. //here we loop through our results (a DOMDocument Object)
  14. foreach ($result_rows as $result_object){
  15. echo $result_object->childNodes->item(0)->nodeValue;
  16. }
  17. ?>

Posted By: webdatascraping on July 15, 2014


This is great tutorial of Web Scraping in PHP, SimpleHTML DOM is really easy library to develop php based scraper that uses Xpath.

Check pdf scraping using php on my blog......

