We've recently discovered PDFMiner, a PDF parser and analyzer tool written entirely in Python. Actually, PDFMiner is a suite of tools including a parser, a text renderer, and tools for extracting text. Yusuke Shinyama, who is at NYU, has brought together a number of capabilities in the form of two Python programs: pdf2txt and dumppdf. PDFMiner also supports multi-byte languages with an additional map file. The progams let you find the precise locaiton of test within a PDF, which is a major advantage over many other PDF tools you'll find.
PDFMiner even has a cool PDF to HTML conversion demo page to see what kinds of things you can do with PDFMiner, although some of the PDFs we tried did not display with true fidelity. Nonetheless, the tools provide insight into the kinds of things you
may be able to do with PDF files in your custom pipeline stage, in the OPpenPipelin, or just for poking around.
The Subversion repository is maintained at Google's code.google.com, and you can download the PDFMiner source from www.unixuser.com.