Concept Extractor Suite

Imagine having a full record built automatically.  You choose the fields or data elements you need. We do the rest.  The Data Harmony Concept Extractor suite includes Automatic Summarization, Metadata Extractor, Thesaurus Master, and M.A.I.  These tools work together, taking full text documents in PDF, a Microsoft Office format (Word, Excel, PowerPoint), or Sun Open Office, and convert them to fully metatagged records.  The resulting bibliographic citation with abstract and subject indexing from the thesaurus is then available as a full XML golden record for deposit in a database, web CMS or document management system. The resulting citation may have editorial review to clean it up to publication level, or it can be used as is for an accurate indication of the contents of an entire collection. It can be run in batch mode on large legacy sets of data or interactively as each item is submitted to the repository.