Posted By

iblis on 08/27/09


Tagged

pdf batch crop


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

teoric


Split (crop) double page PDFs in two


 / Published in: Perl
 

Each page is split vertically in two pages. Crop and media views are set accordingly. Text layer should be preserved. Takes a path to a PDF file as argument and produces a cropped PDF in the same location.

  1. #!/usr/bin/env perl
  2. use strict; use warnings;
  3. use PDF::API2;
  4.  
  5. my $filename = shift || 'test.pdf';
  6. my $oldpdf = PDF::API2->open($filename);
  7. my $newpdf = PDF::API2->new;
  8.  
  9. for my $page_nb (1..$oldpdf->pages) {
  10. my ($page, @cropdata);
  11.  
  12. $page = $newpdf->importpage($oldpdf, $page_nb);
  13. @cropdata = $page->get_mediabox;
  14. $cropdata[2] /= 2;
  15. $page->cropbox(@cropdata);
  16. $page->trimbox(@cropdata);
  17. $page->mediabox(@cropdata);
  18.  
  19. $page = $newpdf->importpage($oldpdf, $page_nb);
  20. @cropdata = $page->get_mediabox;
  21. $cropdata[0] = $cropdata[2] / 2;
  22. $page->cropbox(@cropdata);
  23. $page->trimbox(@cropdata);
  24. $page->mediabox(@cropdata);
  25. }
  26.  
  27. (my $newfilename = $filename) =~ s/(.*)\.(\w+)$/$1.clean.$2/;
  28. $newpdf->saveas('$newfilename');
  29.  
  30. __END__

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: roninxyz on March 17, 2010

I have spent the last few hours learning about Perl and scripting on my Mac to be able to run this script, and I have to say that I was extremely skeptical that I'd be able to figure out anything as an almost complete newbie. Fortunately it didn't take long before I realized that PDF::API2 was a module, which sent me scrambling to figure out how to install those. But, with good instruction on the 'net, I've got this thing up and running. What a great experience and so useful! Thank you for this!

Posted By: iblis on June 28, 2010

You're welcome. I should have indeed made it clear that one needs PDF::API2...

On the Mac, you can also crop left and then right pages with Preview and re-assemble them with this script

Posted By: martienne on October 24, 2010

It works nicely for smaller files... but it doesn't seem to work on larger files, for example one with 24 pages didn't work but one with 4 pages worked. The pages all come out blank for the bigger files.
Any idea?

Posted By: iblis on November 9, 2010

@martienne I used the script on large files (300+ pages) without any problem (the only issue is increase in size). What platform are you running Perl on? Where do your PDFs come from (scan, OCRed scan...)? Do they already have the crop box set?

Posted By: andreapisac on July 22, 2011

Hi there,

I would really like to try this code. Someone on the Adobe Forum posted a step to step guide for total beginners (like myself) on how to install Perl Modules (for Mac users). I followed that and it worked fine with no problems. Then they said to fire the script (meaning this one) in order to get this great feature, but this was assuming people knew how to fire scripts. Can someone be so kind to explain to a beginner what to do once I have installed PDF::API2. Thanks a million! Andrea

Posted By: jlhkrans on December 10, 2011

Thank you very much.

If you have done some left and right cropping in the PDF before, you should change the lines 14 and 21 to: 14. $cropdata[2] = ($cropdata[2] + $cropdata[0]) / 2; 21. $cropdata[0] = ($cropdata[2] + $cropdata[0]) / 2;

Posted By: teoric on July 19, 2014

Nice snippet!

When I run it, PDF::API2 does not copy the cropbox on import, so that I have to take it from the old page:

my $original_page = $oldpdf->openpage($page_nb);
my @cropdata = $original_page->get_cropbox;
Posted By: Jordan_J on July 29, 2014

Thank you so much for creating this, iblis!

I had scoured the Internet for an easy way to split PDF files and was super happy when I found your script. I think I've almost gotten it to work but would be very grateful for your help decoding the problem. If I can get it working, I'll enthusiastically pass the script and instructions along to many others.

I knew nothing about perl at the outset but believe I managed to install it, along with the Command Line Tools and PDF::API2, correctly. (I definitely managed to generate the "Hello World" tutorial output.)

However, when I run your script on a PDF file, my Terminal pauses for several second, as though it's processing it, but then just generates a new line for me to enter another command in, just like the one I used, as though I had just clicked "enter" instead. And when I open the PDF file, nothing has changed.

Does this indicate something I'm doing wrong? Should there be some sort of output in Terminal, if I've done everything correctly? Or, does your script create a new PDF file that I need to track down somewhere?

If you're able to offer any feedback on this, I'd be super grateful as I have a ton of PDFs I'd like to use your script on if I can figure it out.

Thanks so much in advance, and for having made the script.

Cheers.

Posted By: Jordan_J on July 29, 2014

P.S. Never mind!

Your script works brilliantly! I was just too slow (or excited) to realize that your script creates a new file called "$newfilename" (which I see is in the script itself).

Apologies for the stupid question but maybe what I wrote will help others.

Thanks again for creating the script which is a huge help to me and I'm sure many others.

Posted By: Jordan_J on July 29, 2014

Hi again,

I wanted to post a quick follow-up on how I resolved two issues that arose as I used this script, in case it's helpful to others:

Problem 1: PDFs saved using more recent Acrobat versions (evidently 9 and on) evidently use something called a "cross-reference stream," which is not supported by PDF::API2. If this occurs, the output will inform you that this is a "Known Issue," which you can read about here: http://search.cpan.org/~ssimms/PDF-API2-2.022/lib/PDF/API2.pm#___top

Resolution: If you simply save your PDF as an earlier version of Acrobat, the script will work. For example, on mine, which is Acrobat XI, I go to File-->Save As Other...-->Reduce Size PDF...-->Save as Acrobat 5.0 and Later. That does the trick! (Saving the file as anything later than 5.0 doesn't seem to work for me.)

Problem 2: I had already cropped many of my PDFs that I now want to split. The original script (as jlhkrans and teoric also noted) does not carry over the cropping specifications into the new file.

Resolution: Teoric's additional script (see above) works for me, but given that I'm a newbie at this, it took me a while to figure out just where to paste it (and that, for me anyway, it required a very minor modification). But I eventually figured it out and will paste my full, modified script for others in a subsequent post.

Posted By: Jordan_J on July 29, 2014

I planned to paste the full script I use to maintain my crop specifications, but I'm not sure how to do so with the line numbers included, so I'll just explain it instead.

Again, I used Teroic's script (above), but with the very small modification that I deleted the "my" at the beginning of the second line (in front of "@cropdata").

I then pasted teoric's script in the following way (two steps):

Step 1: I pasted the first part (my $originalpage = $oldpdf->openpage($pagenb);) into line 10, without pasting over or eliminating iblis' original line 10 script. Thus, lines 9-11 in my modified script are now:

 9:    or my $page_nb (1..$oldpdf->pages) {
 10:  my $original_page = $oldpdf->openpage($page_nb);
 11:  my ($page, @cropdata);

Step 2: I pasted teoric's second line of script (but without the "my" in it) into line 14 AND line 21, this time pasting over and thus eliminating iblis' original script in those lines (i.e., line 13 and line 20 of iblis' original script). Thus, lines 13-15 in my modified script are now:

 13:   $page = $newpdf->importpage($oldpdf, $page_nb);
 14:  @cropdata = $original_page->get_cropbox;
15:   $cropdata[2] /= 2;

And lines 20-22 of my modified script are now:

20:   $page = $newpdf->importpage($oldpdf, $page_nb);
21:   @cropdata = $original_page->get_cropbox;
22:   $cropdata[0] = $cropdata[2] / 2;

As you can see, line 14 and line 21 are a pasted version of teoric's script (but without the "my" at the beginning).

As I said, this modified script works perfectly for me to preserve and carry over how I have cropped my pages from the original to the new, split PDF file.

Many thanks again to iblis and to teoric for their scripts!

Posted By: Jordan_J on July 29, 2014

One more quick problem and resolution I came across, in case it's helpful to other newbies like myself:

Problem: For Mac users, the default TextEditor Application is very problematic for perl because it changes the font and/or coding of a bunch of things, which Terminal then does not read and interpret accurately.

Resolution: Download and use TextMate instead. So far it works without a hitch for me. And it is free to download.

Also, for newbies, this page offers an excellent intro tutorial for Perl:

http://wardley.org/computers/perl/intro4mac.html

And this page offers very clear steps on installing perl modules using CPAN:

http://triopter.com/archive/how-to-install-perl-modules-on-mac-os-x-in-4-easy-steps/

Cheers.

Posted By: Jordan_J on August 2, 2014

One more quick follow-up on using iblis' script on PDFs that you're already cropped. Two things:

First, I realized after posting what I did above that jlhkrans' snippet IS important to have when you've cropped a PDF file left or right (thanks, jlhkrans!). So paste it over iblis' script just as s/he says, even when you also use teoric's snippet as I described above.

Second, in the end I think there's actually an easier, simpler work-around than doing any of that, which also allows you to use iblis' original script in full just as it is. This approach simply requires using the script below to permanently delete whatever you've cropped in your PDF file, before using iblis' script.

The main issue with cropped files is that, for some strange and infuriating reason, Acrobat doesn't actually provide a way to delete what you crop. It basically just hides it, which can cause a whole lot of headache for subsequent modifications, including this.

Fortunately, a friendly individual was kind enough to create and provide an AppleScript that instantly does just this. That is, it completely deletes whatever you've cropped and resets your cropbox settings to zero. Once you've done that, you can use iblis' script just as it is given above.

To use the AppleScript (which I'll paste below), you open your AppleScript Editor (in Applications-->Utilities). Next, paste the following script into the top half of the screen and it will do what I've described to whatever PDF file you have open:

**NOTE: So make sure *only the file you want to delete the croppings from is open when using this script.***


tell application "Adobe Acrobat Pro" tell active doc repeat with i from 1 to count of pages tell page i set cbox to crop box set media box to cbox end tell end repeat end tell end tell


Next, click "Run" and, voila. Your PDF should now be ready to be split using iblis' original script.

In short, this should solve any cropping issues in the easiest way possible.

Here's where I got the AppleScript from:

http://macproductionartist.wordpress.com/2009/04/01/really-cropping-pages-in-acrobat/

Cheers

Posted By: Jordan_J on August 2, 2014

Here's the AppleScript again, with the hope that it will paste more clearly what should go on each line:

tell application "Adobe Acrobat Pro" tell active doc repeat with i from 1 to count of pages tell page i set cbox to crop box set media box to cbox end tell end repeat end tell end tell

Posted By: Jordan_J on August 2, 2014

Ok, that last pasting didn't work, so best to look at the AppleScript here:

http://macproductionartist.wordpress.com/2009/04/01/really-cropping-pages-in-acrobat/

Posted By: Jawle on October 30, 2014

Hello, this should be exactly what I'm looking for, thanks!

However I'm very much a newb and am not sure exactly which pieces to replace with file and directory names on my computer. I always get:

Execution of splitPDF.pl aborted due to compilation errors.

Basically, I'm unsure which lines require me to replace part of the code with the pdf filename I want to split.

Thanks.

Posted By: Jawle on October 30, 2014

Hello, this should be exactly what I'm looking for, thanks!

However I'm very much a newb and am not sure exactly which pieces to replace with file and directory names on my computer. I always get:

Execution of splitPDF.pl aborted due to compilation errors.

Basically, I'm unsure which lines require me to replace part of the code with the pdf filename I want to split.

Thanks.

Posted By: Jawle on November 1, 2014

Figured it out! You can find a clear solution here on perlmonks: http://www.perlmonks.org/?node_id=1105521

You need to login to post a comment.