Metadata Extractor - Expert Metadata Management

The Data Harmony® Metadata Extractor is a metadata management tool that automatically builds a document record. Metadata extractor converts raw, unstructured, or semi-structured information into structured information. Any digital document can be used, such as HTML Web pages, office documents, and PDFs. Not only are common entities such as dates, names, and numbers extracted, but also custom, client-specific metadata fields such as titles, publication dates, and document types.

Allows for Customized Metadata Fields

Metadata Extractor uses innovative technology that enables users to define virtually any metadata field or entity to be extracted from a document. Positional and formatting information is fed into an inference engine that allows the program to logically extract the fields. 

Integrated with the existing Data Harmony tools for metadata management such as MAIstro and M.A.I., Metadata Extractor uses domain knowledge (from thesauri, ontologies, authority files) as well as positional inference. An author's name, for example, can be recognized in various versions (with middle initial or middle name) but recorded in the preferred standardized format.

Document Conversion Made Simple

Metadata Extractor can be used as a tool to convert legacy documents, after they are run through an OCR program, into structured records or to automatically populate a check-in form for a document repository. Metadata Extractor can also be combined with M.A.I.'s automatic document indexing capabilities to provide rich metadata descriptors for the document from a controlled vocabulary for the user to simply validate and upload, and then immediately move on to the next document.

Metadata Extractor is a must have tool to add to your metadata management strategy.