Recurse directory tree and count pages in all pdf files using pdf-reader gem


/ Published in: Ruby
Save to your folder(s)

I had a directory tree with around 4000 pdf files and I needed a page count - so I semi-rolled this. I swiped the counter code from the gem README:
http://github.com/yob/pdf-reader/tree/master

It could be more contained - as is I run it from irb:

`>> require 'total_pages'`
`>> pagetotal = TotalPages.new`
`>> pagetotal.count('/my/pdf/directory')`

I added rescue to print info on a file if it fails to open or doesn't conform to to the PDF specification and causes pdf-reader to raise an error - without this the script will quit - that sucks when you're trying to count pages in thousands of files.


Copy this code and paste it in your HTML
  1. require 'rubygems'
  2. require 'pdf/reader'
  3.  
  4. class TotalPages
  5.  
  6. def count(dir)
  7. @conv_directory = dir
  8. ## I output the directory argument as a test with the below line -
  9. ## mostly to make sure that passing '.' gets current dir
  10. # puts @conv_directory
  11. recurse_and_count
  12. end
  13.  
  14. def directory
  15. @conv_directory
  16. end
  17.  
  18. def directory_tree
  19. Dir["#{directory}/**/*"]
  20. end
  21.  
  22. def recurse_and_count
  23. total = 0
  24. directory_tree.each do |item|
  25. case File.stat(item).ftype
  26. when 'file'
  27. if File.extname(item).downcase == ".pdf"
  28. receiver = PageReceiver.new
  29. pdf = PDF::Reader.file(item, receiver, :pages => false)
  30. total += receiver.pages
  31. end rescue p item
  32. end
  33. end
  34. total
  35. end
  36.  
  37. end
  38.  
  39. # receiver = PageReceiver.new
  40. # pdf = PDF::Reader.file("somefile.pdf", receiver, :pages => false)
  41. class PageReceiver
  42. attr_accessor :pages
  43.  
  44. # Called when page parsing ends
  45. def page_count(arg)
  46. @pages = arg
  47. end
  48. end

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.