Krugle actually searches through your Java, C and C# source code, inside all of you company's source code control systems (like Visual Source Safe, CVS, Subversion, etc), to help programmers find specific or similar pieces of code, so that coders are encouraged to reuse existing code, vs. reinventing the wheel time after time. They have native parsers for dozens of programming languages, and the searcher can adjust how exact a match they want, with krugle understanding things like for-loops and if-then-else constructs from many languages. We saw their pitch at ESS West and thought it looked pretty useful, espcially for larger coding shops.
From their site:
Krugle Enterprise creates a comprehensive, searchable library of all the source code and related information in your organization. It provides answers to costly code maintenance and development problems previously unsolvable because of information boundaries around source code.
Krugle Enterprise eliminates unwanted code duplication and makes developers more proficient with existing code. This results in significant time to market, quality and cost advantages.
CATMaker is an interesting tool for medical text. It helps generate a summary of a case and a "bottom line" assessment. http://www.cebm.net/index.aspx?o=1216
Summary tools can be very helpful when used with search engines. They can help give a better summary in a results list, or when used at index time, give the search engine a smaller, more relevant set of text to search over.
The good folks at Parse-O-Matic are offering a free tool for file conversions that includes a powerful scripting language that enables you set up automatic conversions for any number of files and let the tool do its work while you move on to other tasks - or just go home early! Check out the sample script language.
Is your file in the wrong format? Instead of rekeying it, reformat it with Parse-O-Matic Free Edition — a flexible, programmable data file converter. Avoid the frustrating restrictions of "point and click" converters that almost do the job; with the Parse-O-Matic Free Edition, your scripts tell the program precisely what you want to do.
Sample applications: Edit a text file automatically. Copy valid data; repair or skip bad data. Rearrange a print file. Expedite migration of legacy systems. Export a comma-separated-value (CSV) file for import to a database (such as Access, SQL Server, FileMaker, Oracle, Paradox, ODBC) or a spreadsheet table (Excel, Calc, Quattro). Select and correct data from a mailing or customer list (first name, last name, street address, city, phone number and so on). Generate mail merge sets. Modify character strings into uppercase, lowercase or mixed case. Calculate totals up to 18 digits long.
Planning a data warehouse system? You can split or reorganize files per your specification (regular expressions supported), or mine printed reports for essential information.
Input formats: Read, extract, analyze and reorganize data fields from flat files such as: text (ASCII from Windows, Unix/Linux or Mac, EBCDIC from a mainframe, plus log files from web servers, process control devices and scientific instruments); binary ("hex"); fixed length and variable length records; tab/null/comma-delimited; Windows clipboard.
Working with large files? The Parse-O-Matic Free Edition can convert, filter and transform files of almost any size. If you have enough room on your hard disk to copy it, then you can probably parse it.
Output formats: Almost any record or file format, including HTML and XML. You can also write text to the Windows clipboard so it can be directly pasted into other applications.
We've recently discovered PDFMiner, a PDF parser and analyzer tool written entirely in Python. Actually, PDFMiner is a suite of tools including a parser, a text renderer, and tools for extracting text. Yusuke Shinyama, who is at NYU, has brought together a number of capabilities in the form of two Python programs: pdf2txt and dumppdf. PDFMiner also supports multi-byte languages with an additional map file. The progams let you find the precise locaiton of test within a PDF, which is a major advantage over many other PDF tools you'll find.
PDFMiner even has a cool PDF to HTML conversion demo page to see what kinds of things you can do with PDFMiner, although some of the PDFs we tried did not display with true fidelity. Nonetheless, the tools provide insight into the kinds of things you
may be able to do with PDF files in your custom pipeline stage, in the OPpenPipelin, or just for poking around.
This program converts between various multimedia audio and video file formats. This might be useful for search engines trying to do audio mining (speech to searchable text)
From their website:
FormatFactory is a multifunctional media converter. Provides functions below: All to MP4/3GP/MPG/AVI/WMV/FLV/SWF. All to MP3//WMA/MMF/AMR/OGG/M4A/WAV. All to JPG/BMP/PNG/TIF/ICO/.... Rip DVD to video file. MP4 files support iPod/iPhone/PSP format. Source files support RMVB.
FormatFactory's Feature: 1 support converting all popular video,audio,picture formats to others. 2 Repair damaged video and audio file. 3 Reducing Multimedia file size. 4 Support iphone,ipod multimedia file formats. 5 Picture converting supports Zoom,Rotate/Flip,tags. 6 DVD Ripper.