Posted By

wa-rufio on 12/19/11


Tagged

mysql php DOM xml extract baseball mlb pitchfx boxscore


Versions (?)

Extract PitchFX from MLB.com


 / Published in: PHP
 

Open PHP file and enter User Credentials for localhost. The default dates are set to negotiate the 2011 season, but change the date variables to match your desired season data. The start date should be the day before opening day, and the end date the last day of the regular season.

Append ignored attributes to conditional statement on line 81 to match your pitchfx table structure.

  1. <?php
  2. $uname = '';
  3. $pswrd = '';
  4. $con = mysql_connect("localhost", $uname, $pswrd);
  5. if (!$con)
  6. {
  7. die('Could not connect: ' . mysql_error());
  8. }
  9. mysql_select_db("pitchfx",$con) or die('Error while selecting db');
  10.  
  11.  
  12. /*Debugging purposes
  13. $ob_file = fopen('test.txt' , 'w');
  14. function ob_file_callback($buffer)
  15. {
  16.   global $ob_file;
  17.   fwrite($ob_file,$buffer);
  18. }
  19. ob_start('ob_file_callback');*/
  20.  
  21. $start_date = '2011-03-30'; //Opening day -1 day
  22. $end_date = '2011-10-03'; //Last regular season game
  23. $cur_d = $start_date;
  24.  
  25. //Open dedicated XML stream
  26. $opts = array(
  27. 'http' => array(
  28. 'user_agent' => 'PHP libxml agent',
  29. )
  30. );
  31.  
  32. $context = stream_context_create($opts);
  33. libxml_set_streams_context($context);
  34.  
  35. $name_p = array();
  36. $value_p = array();
  37. $p_seq = array();
  38. while ($cur_d != $end_date) {
  39. $cur_d = date ("Y-m-d", strtotime ($cur_d . "+1 day"));
  40. $year = date("Y",strtotime($cur_d));
  41. $month = date("m",strtotime($cur_d));
  42. $day = date("d",strtotime($cur_d));
  43. $xmldoc = new DOMDocument();
  44. $url = 'http://gd2.mlb.com/components/game/mlb/year_' . $year . '/month_' . $month . '/day_' . $day . '/scoreboard.xml';
  45. $xmldoc->load($url);
  46. $doc = $xmldoc->documentElement;
  47. $game = $doc->getElementsbyTagName('game');
  48. foreach($game as $g){
  49. $gid = $g->getattribute('id');
  50. $t_url = 'http://gd2.mlb.com/components/game/mlb/year_' . $year . '/month_' . $month . '/day_' . $day . '/gid_' . $gid . '/inning/inning_all.xml';
  51. $xmldoc = new DOMDocument();
  52. $xmldoc->load($t_url);
  53. if(!$xmldoc->load($t_url)){
  54. continue;
  55. }
  56. $doc = $xmldoc->documentElement;
  57. $event = $doc->getElementsbyTagName('inning');
  58. $away = $event->item(0)->getattribute('away_team');
  59. $home = $event->item(0)->getattribute('home_team');
  60.  
  61. foreach($event as $ev){
  62. $inning = $ev->getattribute('num');
  63. $atbat = $ev->getElementsbyTagName('atbat');
  64. foreach($atbat as $ab){
  65. if ($ab->parentNode->nodeName == 'top'){
  66. $top = 1;
  67. }
  68. else {
  69. $top = 0;
  70. }
  71. $ab_id = $ab->getattribute('batter');
  72. $p_id = $ab->getattribute('pitcher');
  73. $ab_hand = $ab->getattribute('stand');
  74. $p_hand = $ab->getattribute('p_throws');
  75. $pitch = $ab->getElementsbyTagName('pitch');
  76. foreach ($pitch as $p){
  77. unset($name_p);
  78. unset($value_p);
  79. unset($p_seq);
  80. foreach($p->attributes as $a) {
  81. if ($a->nodeName == 'tfs' || $a->nodeName == 'tfs_zulu' || $a->nodeName == 'mt' || $a->nodeName=='cc' ) {
  82. continue;
  83. }
  84. $value_p[] = $a->nodeValue;
  85. $name_p[] = $a->nodeName;
  86. $p_seq[] = $p->getattribute('type');
  87. }
  88. $sql = "INSERT INTO pitchfx.pitches (inning,date_id,top_bot,away,home,pitch_id,pitch_hand,ab_id,ab_hand," . implode($name_p,',') . ",pitch_seq) VALUES ('$inning','$gid','$top','$away','$home','$p_id','$p_hand','$ab_id','$ab_hand','" . implode($value_p,"','") . "','" . implode($p_seq) . "')";
  89. mysql_query($sql,$con) or die ('Error executing query. ' . mysql_error() . "\n");
  90. }
  91. }
  92. }
  93. }
  94. }
  95. //ob_end_flush();
  96. //fclose($ob_file);
  97. ?>

Report this snippet  

You need to login to post a comment.