Posted By

markt22 on 10/12/17


Tagged

PDFBox


Versions (?)

PDFBox: Extract all text from a document


 / Published in: Java
 

Opens an existing PDF and extracts all the text from it

  1. /**
  2.  * @param args the command line arguments
  3.  * @throws java.io.IOException
  4.  */
  5. public static void main(String[] args) throws IOException {
  6. File file = new File("");
  7.  
  8. PDDocument document = PDDocument.load(file);
  9.  
  10. PDFTextStripper pdfStripper = new PDFTextStripper();
  11.  
  12. String text = pdfStripper.getText(document);
  13. System.out.println(text);
  14.  
  15. document.close();
  16. }

Report this snippet  

You need to login to post a comment.