Easy scraping and HTML parsing with PHP5 and XPath


/ Published in: PHP
Save to your folder(s)

This example uses file_get_contents to retrieve remote HTML. From there, we can parse through it using PHP5's DOMDocument and DOMXpath. XPath Queries are easy to create using the Firefox extension "XPather"


Copy this code and paste it in your HTML
  1. <?php
  2. //a URL you want to retrieve
  3. $my_url = 'http://www.digg.com';
  4. $html = file_get_contents($my_url);
  5. $dom = new DOMDocument();
  6. $dom->loadHTML($html);
  7. $xpath = new DOMXPath($dom);
  8.  
  9. //Put your XPath Query here
  10. $my_xpath_query = "/html/body/div[@id='container']/div[@id='contents']/div[@class='list' and @id='wrapper']/div[@class='main' and position()=1]/div[contains(@class, 'news-summary')]/div[@class='news-body']/h3";
  11. $result_rows = $xpath->query($my_xpath_query);
  12.  
  13. //here we loop through our results (a DOMDocument Object)
  14. foreach ($result_rows as $result_object){
  15. echo $result_object->childNodes->item(0)->nodeValue;
  16. }
  17. ?>

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.