Posted By

noah on 11/21/08


Tagged

pdf text document convert adobe


Versions (?)

Who likes this?

2 people have marked this snippet as a favorite

deepsoul
wirenaught


Convert a PDF to text with Perl


 / Published in: Perl
 

URL: http://search.cpan.org/dist/CAM-PDF/

Converts the PDF 'example.pdf' to plain text.

IIRC this only converts the first page of the document, but that can be changed by modifying the argument to getPageContentTree on line 8. Been a while since I've used this so ymmv.

  1. #!/perl/bin/perl -w
  2. use CAM::PDF;
  3. use CAM::PDF::PageText;
  4.  
  5. $filename = "example.pdf";
  6.  
  7. my $pdf = CAM::PDF->new($filename);
  8. my $pageone_tree = $pdf->getPageContentTree(4);
  9. print CAM::PDF::PageText->render($pageone_tree);
  10.  
  11. #Note: I had to install CAM::PDF::PageText by hand, it was not installed by CPAN when I installed CAM::PDF.

Report this snippet  

You need to login to post a comment.