Posted By

bingjian on 10/11/09


Tagged

google scholar


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

nilanjan


Extract citations from Google Scholar


 / Published in: Ruby
 

  1. #!/usr/local/bin/ruby
  2. require 'net/http'
  3. require 'uri'
  4.  
  5.  
  6. def escape_title(s)
  7. s = s.gsub(/(<\/a>)|(&nbsp;)/i,'')
  8. s = s.gsub(/&hellip;/i,'...')
  9. return s
  10. end
  11.  
  12. def get_citation_list(id, start=0, max_num=10)
  13. str = '/scholar?start=%d&cites=%s'%[start,id]
  14. enc_uri = URI.escape(str)
  15. papers = []
  16. Net::HTTP.start('scholar.google.com') do |http|
  17. req = Net::HTTP::Get.new(enc_uri)
  18. s = http.request(req).body
  19. localizer = s.index('</b> citing <b>')
  20. s = s[localizer+20,s.length()-localizer+20]
  21. s = s.gsub(/(<b>)|(<\/b>)/i,'')
  22. start = 0
  23. 1.upto(max_num){
  24. pos2 = s.index('</h3>', start)
  25. pos1 = s.rindex('>', pos2-5)
  26. title = s[pos1+1, pos2-pos1-1]
  27. title = escape_title(title)
  28. #puts title
  29. papers << title
  30. start = pos2 + 10
  31. }
  32. end
  33. return papers
  34. end
  35.  
  36.  
  37. puts ' *** Googling paper "%s" *** '%ARGV[0]
  38. enc_uri = URI.escape('/scholar?q="'+ARGV[0]+'"&num=1')
  39.  
  40. Net::HTTP.start('scholar.google.com') do |http|
  41. req = Net::HTTP::Get.new(enc_uri)
  42. s = http.request(req).body
  43. pos1 = s.index('Cited by ')
  44. if pos1
  45. pos2 = s.index('</a>', pos1+9)
  46. citation_num = Integer(s[pos1+9, pos2-pos1-9])
  47. pos3 = s.rindex('cites', pos1)
  48. pos4 = s.index('amp', pos3)
  49. citation_id = s[pos3+6, pos4-pos3-7]
  50. puts " -- Google Scholar Citation ID: %s"%citation_id
  51. puts " -- Cited by the following %d papers:"%citation_num
  52. pages = citation_num / 10
  53. papers = []
  54. 0.upto(pages-1){ |num|
  55. papers += get_citation_list(citation_id, num*10)
  56. }
  57. num_in_last_page = citation_num - pages*10
  58. papers += get_citation_list(citation_id, pages*10, num_in_last_page)
  59. papers.each_with_index {|p,i|
  60. puts "[#{i+1}] #{p}"
  61. }
  62. else
  63. puts "no citation found!"
  64. end
  65. end

Report this snippet  

You need to login to post a comment.