M.A.I.™ Components

Whether you’re running a human or automatic indexing system, optimal document indexing depends on the quality of the individual components (including
the human one) and the smooth interaction between them.

Machine Aided Indexer (M.A.I.) runs on four components that interact seamlessly to maximize the quality of document indexing from the start and to support its evolution as your vocabulary grows in scale and complexity.

  • Rule Builder
  • Rule Base
  • Concept Extractor
  • Statistics Collector

Rule Builder

The Rule Builder is an interactive module that allows the user to create rules for use in automatic indexing using a large selection of language terms denoting proximity (within three words, in the same sentence, within 250 words, etc.), format (all caps, truncate, wild cards, etc.), location (begin sentence, end sentence, in title, etc.) and match, among others. The user can edit, add to, and review rules in the Rule Builder or search for a set of terms or rules.

Rule Base

The Rule Base is a collection of the rules and the valid terms that are used in the automatic indexing of the data set. The Rule Base is built using the Rule Builder and used by the Concept Extractor to select suggested indexing terms.

Concept Extractor

The Concept Extractor compares the text in the document with the Rule Base and presents the suggested terms to the user for use in document indexing or discarding. (Alternatively, it can automatically apply indexing terms.) It recognizes all of the conditions set forth in the Rule Base and also recognizes data in tagged strings for special treatment.

Once the Concept Extractor selects the terms, it ranks them and presents the 20 (This number may be changed for individual applications) most frequently mentioned terms to the user for review.

Statistics Collector

The Statistics Collector keeps a record of documents that have been processed by the Concept Extractor, together with the M.A.I.-suggested terms and those chosen by the editor. It stores, then gathers the information into M.A.I. suggestions, terms chosen by the editor for indexing that M.A.I. did not suggest (misses), and terms suggested by M.A.I. but not selected by the editors (noise).

The “make statistics” function allows the user to create a list of “miss” and “noise” terms to review for revising rules and building new rules. By employing the Statistics Collector in tandem with the Rule Builder, the user can gather terms needing a rule and review the relevant text for meaning and context, continually increasing the accuracy of the system.


MAI Batch GUI is the graphical user interface for M.A.I. batch processing, included in M.A.I. and MAIstro™ software installations beginning with Data Harmony® Version 3.9. See the interface details for information.

Written by

Data Harmony