Posted By

caioariede on 04/02/09


Tagged

php html title


Versions (?)

Who likes this?

3 people have marked this snippet as a favorite

rickfu
danielalexwood
creativeboulder


Get page title


 / Published in: PHP
 

URL: http://caioariede.com/

This function attempts to get the title of specified url, by html tag "title". The title is returned with UTF-8 encoding. If ok, returns array(true, ok, title), if an error occurs, return array(false, error, null).

  1. <?php
  2.  
  3. function get_page_title($url)
  4. {
  5. if (!preg_match('#^(http:\/\/|https:\/\/|www\.)#', $url))
  6. return array(FALSE, 'not-valid-url', NULL);
  7. while (TRUE)
  8. {
  9. $headers = @get_headers($url, TRUE);
  10. if (empty($headers['Location']))
  11. break;
  12. elseif (is_array($headers['Location']))
  13. $url = end($headers['Location']);
  14. else $url = $headers['Location'];
  15. }
  16. if (empty($headers['Content-Type']))
  17. return array(FALSE, 'non-html', NULL);
  18. $type = is_array($headers['Content-Type'])
  19. ? end($headers['Content-Type'])
  20. : $headers['Content-Type'];
  21. if (stristr($type, 'html') == FALSE)
  22. return array(FALSE, 'non-html', NULL);
  23. if (!$fp = @fopen($url, 'r'))
  24. return array(FALSE, 'error', NULL);
  25. $t = FALSE;
  26. $buf = '';
  27. $tbuf = '';
  28. while (!feof($fp))
  29. {
  30. $c = fgetc($fp);
  31. if ($c == '<') $buf = $c;
  32. elseif ($c == '>')
  33. {
  34. if ($t && strcasecmp($buf, '</title') == 0)
  35. break;
  36. elseif (strcasecmp($buf, '<title') == 0)
  37. {
  38. $t = TRUE;
  39. $tbuf = '';
  40. $buf = '';
  41. }
  42. elseif (strcasecmp($buf, '</head') == 0 || strcasecmp($buf, '<body') == 0)
  43. break;
  44. }
  45. elseif (substr($buf, 0, 1) == '<')
  46. $buf .= $c;
  47. elseif ($t)
  48. $tbuf .= $c;
  49. $info = stream_get_meta_data($fp);
  50. if ($info['timed_out'])
  51. return array(FALSE, 'timeout', NULL);
  52. }
  53. fclose($fp);
  54. // hack to convert or fix utf8 encoding
  55. $tbuf = trim($tbuf);
  56. if (empty($tbuf))
  57. return array(FALSE, 'empty', NULL);
  58. if (utf8_encode(utf8_decode($tbuf)) !== $tbuf)
  59. $tbuf = iconv(mb_detect_encoding($tbuf), 'UTF-8', utf8_encode($tbuf));
  60. foreach (array(
  61. 'UTF-8',
  62. 'ISO-8859-1',
  63. 'ISO-8859-15',
  64. 'cp866',
  65. 'cp1251',
  66. 'cp1252',
  67. //'KOI8-R', # php bug - cannot yet handle MBCS
  68. 'BIG5',
  69. 'GB2312',
  70. 'BIG5-HKSCS',
  71. 'Shift_JIS',
  72. 'EUC-JP'
  73. ) as $enc) $tbuf = html_entity_decode($tbuf, ENT_QUOTES, $enc);
  74. $tbuf = preg_replace('#[\n\t\r]+#', ' ', $tbuf);
  75. return array(TRUE, 'ok', $tbuf);
  76. }
  77.  
  78. ?>

Report this snippet  

You need to login to post a comment.