Posted By

iblis on 01/18/09


Tagged

pdf metadata bibtex


Versions (?)

Update PDF metadata from BibTeX data file


 / Published in: Perl
 

  1. #!/usr/bin/env perl
  2. # updatepdf.pl
  3. # parses a bibtex file and updates linked pdf file's metadata with bibtex data
  4.  
  5. use strict;
  6. use warnings;
  7.  
  8. use Getopt::Std;
  9. use Text::BibTeX;
  10. use PDF::API2;
  11.  
  12. # option -x forces overwriting existing metadata
  13. # option -f provides the field name which contains the linked PDF file's path
  14. # default PDF file field name to "local-url"
  15. my %options;
  16. getopts('xf:', \%options);
  17. my $field = $options{'f'} || 'local-url';
  18.  
  19. # get input bibtex file's name from cl argument or read stdin
  20. my $in_file = shift || "<&STDIN";
  21. my $bib = Text::BibTeX::File->new($in_file);
  22.  
  23. # loops thorugh bibtex entries
  24. while (my $entry = new Text::BibTeX::Entry $bib) {
  25. # skip non-regular entries
  26. next unless $entry->parse_ok && $entry->metatype == BTE_REGULAR;
  27. # read local file field from the bibtex entry
  28. if ( $entry->exists($field) ) {
  29. my $pdf_file = $entry->get($field);
  30. # skip file if doesn't exist or not a .pdf
  31. if ($pdf_file !~ m{\.pdf$}i || ! -e $pdf_file) {
  32. warn "Skipping $pdf_file\n";
  33. next;
  34. }
  35. # get PDF's info
  36. my $pdf = PDF::API2->open($pdf_file);
  37. my %info = $pdf->info();
  38. # set authour and title fields
  39. $info{'Author'} = $entry->get('author')
  40. if $entry->exists('author') && ( $options{'x'} || !defined $info{'Author'} ) ;
  41. $info{'Title'} = $entry->get('title')
  42. if $entry->exists('title') and ( $options{'x'} || !defined $info{'Title'} );
  43. # write
  44. $pdf->info(%info);
  45. $pdf->update();
  46. }
  47. }

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: tpavlic on January 19, 2011

Unfortunately, the script above depends on PDF::API2, which only supports Adobe PDF version 1.4. If you have a version 1.5 PDF, PDF::API2 will not be able to process the metadata.

To mitigate this problem, use another tool (like "pdfjam" from TeXLive) to downgrade the PDF to 1.4 (e.g., "pdfjam FILE --preamble '\pdfminorversion=4' --outfile OUTFILE") and then use the above script. Alternatively, the above script can be modified to shell out to pdfjam instead of PDF::API2 to update the metadata.

A similar tool is available at:

http://phaseportrait.blogspot.com/2010/12/tools-for-combining-bibtex-pdfs-and-e.html

It is more complex in that it traverses a directory tree looking for PDF's that have file names with prefixes that match each BibTeX key. The PDF's it finds, it updates. Moreover, it can use information from the directory tree to automatically generate collections on a Sony Reader. It can also automatically convert PDF's to the compatible 1.4 version that is most appropriate for the Amazon Kindle (and other readers with primitive PDF support?).

You need to login to post a comment.