PDFBox is an API for extracting and highlighting text from Adobe Acrobat documents (AKA PDF files).
Text extraction: (from their site)
PDFBox is an open source Java PDF library for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities.
PDF Highlighting (from their site)
There are cases when you might want to highlight text in a PDF document. For example, if the PDF is the result of a search request you might want to highlight the word in the resulting PDF document. There are several ways this can be achieved, each method varying in complexity and flexibility.
FreewareFiles.com
We've recently discovered
may be able to do with PDF files in your custom pipeline stage, in the OPpenPipelin, or just for poking around.
These folks are offering quite a few free and low cost file conversion tools for PDF, which would be handy to have if you were writing a web spider that needed to handle PDF.