/ Published in: Ruby
I had a directory tree with around 4000 pdf files and I needed a page count - so I semi-rolled this. I swiped the counter code from the gem README:
http://github.com/yob/pdf-reader/tree/master
It could be more contained - as is I run it from irb:
`>> require 'total_pages'`
`>> pagetotal = TotalPages.new`
`>> pagetotal.count('/my/pdf/directory')`
I added rescue to print info on a file if it fails to open or doesn't conform to to the PDF specification and causes pdf-reader to raise an error - without this the script will quit - that sucks when you're trying to count pages in thousands of files.
http://github.com/yob/pdf-reader/tree/master
It could be more contained - as is I run it from irb:
`>> require 'total_pages'`
`>> pagetotal = TotalPages.new`
`>> pagetotal.count('/my/pdf/directory')`
I added rescue to print info on a file if it fails to open or doesn't conform to to the PDF specification and causes pdf-reader to raise an error - without this the script will quit - that sucks when you're trying to count pages in thousands of files.
Expand |
Embed | Plain Text
Copy this code and paste it in your HTML
require 'rubygems' require 'pdf/reader' class TotalPages def count(dir) @conv_directory = dir ## I output the directory argument as a test with the below line - ## mostly to make sure that passing '.' gets current dir # puts @conv_directory recurse_and_count end def directory @conv_directory end def directory_tree Dir["#{directory}/**/*"] end def recurse_and_count total = 0 directory_tree.each do |item| case File.stat(item).ftype when 'file' if File.extname(item).downcase == ".pdf" receiver = PageReceiver.new pdf = PDF::Reader.file(item, receiver, :pages => false) total += receiver.pages end rescue p item end end total end end # receiver = PageReceiver.new # pdf = PDF::Reader.file("somefile.pdf", receiver, :pages => false) class PageReceiver attr_accessor :pages # Called when page parsing ends def page_count(arg) @pages = arg end end