Posted By

ginoplusio on 01/12/10


Tagged

title bot spider meta filegetcontents description webbot


Versions (?)

Who likes this?

6 people have marked this snippet as a favorite

ginoplusio
redstorm
khaled
daipratt
BrianCoyDesign
tux-world


PHP spider that retrieves url meta data and other informations


 / Published in: PHP
 

URL: http://www.barattalo.it/2010/01/12/bot-that-retrieves-url-meta-data-info/

From a given url this function retrieves page title, meta description, keywords, favicon, and an array of images to use for links. It call filegetcontents and then make some regular expression job.

  1. print_r(getLinksInfo("http://www.rockit.it/articolo/825/nada-studio-report-quando-nasce-una-canzone"));
  2.  
  3. function getLinksInfo($url) {
  4. $web_page = file_get_contents($url);
  5.  
  6. $data['keywords']="";
  7. $data['description']="";
  8. $data['title']="";
  9. $data['favicon']="";
  10. $data['images']=array();
  11.  
  12. preg_match_all('#<title([^>]*)?>(.*)</title>#Uis', $web_page, $title_array);
  13. $data['title'] = $title_array[2][0];
  14. preg_match_all('#<meta([^>]*)(.*)>#Uis', $web_page, $meta_array);
  15. for($i=0;$i<count($meta_array[0]);$i++) {
  16. if (strtolower(attr($meta_array[0][$i],"name"))=='description') $data['description'] = attr($meta_array[0][$i],"content");
  17. if (strtolower(attr($meta_array[0][$i],"name"))=='keywords') $data['keywords'] = attr($meta_array[0][$i],"content");
  18. }
  19. preg_match_all('#<link([^>]*)(.*)>#Uis', $web_page, $link_array);
  20. for($i=0;$i<count($link_array[0]);$i++) {
  21. if (strtolower(attr($link_array[0][$i],"rel"))=='shortcut icon') $data['favicon'] = makeabsolute($url,attr($link_array[0][$i],"href"));
  22. }
  23. preg_match_all('#<img([^>]*)(.*)/?>#Uis', $web_page, $imgs_array);
  24. $imgs = array();
  25. for($i=0;$i<count($imgs_array[0]);$i++) {
  26. if ($src = attr($imgs_array[0][$i],"src")) {
  27. $src = makeabsolute($url,$src);
  28. if (getRemoteFileSize($src)>15000) array_push($imgs,$src);
  29. }
  30. if (count($imgs)>2) break;
  31. }
  32. $data['images']=$imgs;
  33.  
  34. return $data;
  35. }

Report this snippet  

You need to login to post a comment.