PHP spider that retrieves url meta data and other informations


/ Published in: PHP
Save to your folder(s)

From a given url this function retrieves page title, meta description, keywords, favicon, and an array of images to use for links. It call file_get_contents and then make some regular expression job.


Copy this code and paste it in your HTML
  1. print_r(getLinksInfo("http://www.rockit.it/articolo/825/nada-studio-report-quando-nasce-una-canzone"));
  2.  
  3. function getLinksInfo($url) {
  4. $web_page = file_get_contents($url);
  5.  
  6. $data['keywords']="";
  7. $data['description']="";
  8. $data['title']="";
  9. $data['favicon']="";
  10. $data['images']=array();
  11.  
  12. preg_match_all('#<title([^>]*)?>(.*)</title>#Uis', $web_page, $title_array);
  13. $data['title'] = $title_array[2][0];
  14. preg_match_all('#<meta([^>]*)(.*)>#Uis', $web_page, $meta_array);
  15. for($i=0;$i<count($meta_array[0]);$i++) {
  16. if (strtolower(attr($meta_array[0][$i],"name"))=='description') $data['description'] = attr($meta_array[0][$i],"content");
  17. if (strtolower(attr($meta_array[0][$i],"name"))=='keywords') $data['keywords'] = attr($meta_array[0][$i],"content");
  18. }
  19. preg_match_all('#<link([^>]*)(.*)>#Uis', $web_page, $link_array);
  20. for($i=0;$i<count($link_array[0]);$i++) {
  21. if (strtolower(attr($link_array[0][$i],"rel"))=='shortcut icon') $data['favicon'] = makeabsolute($url,attr($link_array[0][$i],"href"));
  22. }
  23. preg_match_all('#<img([^>]*)(.*)/?>#Uis', $web_page, $imgs_array);
  24. $imgs = array();
  25. for($i=0;$i<count($imgs_array[0]);$i++) {
  26. if ($src = attr($imgs_array[0][$i],"src")) {
  27. $src = makeabsolute($url,$src);
  28. if (getRemoteFileSize($src)>15000) array_push($imgs,$src);
  29. }
  30. if (count($imgs)>2) break;
  31. }
  32. $data['images']=$imgs;
  33.  
  34. return $data;
  35. }

URL: http://www.barattalo.it/2010/01/12/bot-that-retrieves-url-meta-data-info/

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.