/ Published in: Perl

Expand |
Embed | Plain Text
#!/usr/bin/env perl # updatepdf.pl # parses a bibtex file and updates linked pdf file's metadata with bibtex data use strict; use warnings; use Getopt::Std; use Text::BibTeX; use PDF::API2; # option -x forces overwriting existing metadata # option -f provides the field name which contains the linked PDF file's path # default PDF file field name to "local-url" my %options; getopts('xf:', \%options); my $field = $options{'f'} || 'local-url'; # get input bibtex file's name from cl argument or read stdin my $bib = Text::BibTeX::File->new($in_file); # loops thorugh bibtex entries while (my $entry = new Text::BibTeX::Entry $bib) { # skip non-regular entries next unless $entry->parse_ok && $entry->metatype == BTE_REGULAR; # read local file field from the bibtex entry if ( $entry->exists($field) ) { my $pdf_file = $entry->get($field); # skip file if doesn't exist or not a .pdf warn "Skipping $pdf_file\n"; next; } # get PDF's info my $pdf = PDF::API2->open($pdf_file); my %info = $pdf->info(); # set authour and title fields $info{'Author'} = $entry->get('author') $info{'Title'} = $entry->get('title') # write $pdf->info(%info); $pdf->update(); } }
Comments

You need to login to post a comment.
Unfortunately, the script above depends on PDF::API2, which only supports Adobe PDF version 1.4. If you have a version 1.5 PDF, PDF::API2 will not be able to process the metadata.
To mitigate this problem, use another tool (like "pdfjam" from TeXLive) to downgrade the PDF to 1.4 (e.g., "pdfjam FILE --preamble '\pdfminorversion=4' --outfile OUTFILE") and then use the above script. Alternatively, the above script can be modified to shell out to pdfjam instead of PDF::API2 to update the metadata.
A similar tool is available at:
http://phaseportrait.blogspot.com/2010/12/tools-for-combining-bibtex-pdfs-and-e.html
It is more complex in that it traverses a directory tree looking for PDF's that have file names with prefixes that match each BibTeX key. The PDF's it finds, it updates. Moreover, it can use information from the directory tree to automatically generate collections on a Sony Reader. It can also automatically convert PDF's to the compatible 1.4 version that is most appropriate for the Amazon Kindle (and other readers with primitive PDF support?).