Posted By

Tamedo on 11/23/07


Tagged

curl scrape


Versions (?)

Who likes this?

2 people have marked this snippet as a favorite

vali29
wirenaught


Scrape with curl PT 2


 / Published in: PHP
 

  1. <html>
  2. <?
  3. // Set up the CURL object
  4. $ch = curl_init( "http://www.metacritic.com/video/" );
  5.  
  6. // Fake out the User Agent
  7. curl_setopt( $ch, CURLOPT_USERAGENT, "Internet Explorer" );
  8.  
  9. // Start the output buffering
  10.  
  11. // Get the HTML from MetaCritic
  12. curl_exec( $ch );
  13. curl_close( $ch );
  14.  
  15. // Get the contents of the output buffer
  16. $str = ob_get_contents();
  17.  
  18. // Get just the list sorted by name
  19. preg_match( "/\<DIV ID=\"sortbyname1\"\>(.*?)\<\/DIV\>/is",
  20. $str, $byname );
  21.  
  22. // Get each of the movie entries
  23. preg_match_all( "/\<SPAN.*?>(.*?)\<\/SPAN\>.*?\<A.*?\>(.*?)\<BR\>/is",
  24. $byname[0], $moviedata );
  25.  
  26. // Work through the raw movie data
  27. $movies = array();
  28. for( $i = 0; $i < count( $moviedata[1] ); $i++ )
  29. {
  30. // The score is ok already
  31. $score = $moviedata[1][$i];
  32.  
  33. // We need to remove tags from the title and decode
  34. // the HTML entities
  35. $title = $moviedata[2][$i];
  36. $title = preg_replace( "/<.*?>/", "", $title );
  37. $title = html_entity_decode( $title );
  38.  
  39. // Then add the movie to the array
  40. $movies []= array( $score, $title );
  41. }
  42. ?>
  43. <body>
  44. <table>
  45. <tr>
  46. <th>Name</th><th>Score</th>
  47. </tr>
  48. <? foreach( $movies as $movie ) { ?>
  49. <tr>
  50. <td><? echo( $movie[1] ) ?></td>
  51. <td><? echo( $movie[0] ) ?></td>
  52. </tr>
  53. <? } ?>
  54. </table>
  55. </body>
  56. </html>

Report this snippet  

You need to login to post a comment.