Posted By

inkdeep on 01/26/09


Tagged

ruby pdf count Gem pdf-reader


Versions (?)

Who likes this?

2 people have marked this snippet as a favorite

ricdapaz
webstic


Recurse directory tree and count pages in all pdf files using pdf-reader gem


 / Published in: Ruby
 

I had a directory tree with around 4000 pdf files and I needed a page count - so I semi-rolled this. I swiped the counter code from the gem README: http://github.com/yob/pdf-reader/tree/master

It could be more contained - as is I run it from irb:

>> require 'total_pages' >> pagetotal = TotalPages.new >> pagetotal.count('/my/pdf/directory')

I added rescue to print info on a file if it fails to open or doesn't conform to to the PDF specification and causes pdf-reader to raise an error - without this the script will quit - that sucks when you're trying to count pages in thousands of files.

  1. require 'rubygems'
  2. require 'pdf/reader'
  3.  
  4. class TotalPages
  5.  
  6. def count(dir)
  7. @conv_directory = dir
  8. ## I output the directory argument as a test with the below line -
  9. ## mostly to make sure that passing '.' gets current dir
  10. # puts @conv_directory
  11. recurse_and_count
  12. end
  13.  
  14. def directory
  15. @conv_directory
  16. end
  17.  
  18. def directory_tree
  19. Dir["#{directory}/**/*"]
  20. end
  21.  
  22. def recurse_and_count
  23. total = 0
  24. directory_tree.each do |item|
  25. case File.stat(item).ftype
  26. when 'file'
  27. if File.extname(item).downcase == ".pdf"
  28. receiver = PageReceiver.new
  29. pdf = PDF::Reader.file(item, receiver, :pages => false)
  30. total += receiver.pages
  31. end rescue p item
  32. end
  33. end
  34. total
  35. end
  36.  
  37. end
  38.  
  39. # receiver = PageReceiver.new
  40. # pdf = PDF::Reader.file("somefile.pdf", receiver, :pages => false)
  41. class PageReceiver
  42. attr_accessor :pages
  43.  
  44. # Called when page parsing ends
  45. def page_count(arg)
  46. @pages = arg
  47. end
  48. end

Report this snippet  

You need to login to post a comment.