PDFBox is an API for extracting and highlighting text from Adobe Acrobat documents (AKA PDF files).
Text extraction: (from their site)
PDFBox is an open source Java PDF library for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities.
PDF Highlighting (from their site)
There are cases when you might want to highlight text in a PDF document. For example, if the PDF is the result of a search request you might want to highlight the word in the resulting PDF document. There are several ways this can be achieved, each method varying in complexity and flexibility.
Comments