MAIChem™

The MAIChem program finds chemical names in text documents, such as patents or journal articles—a challenge because there are an unlimited number of potential chemical compounds and a variety of ways that a particular compound can be named. It is thus impractical to simply match names against text.

MAIChem's approach is to process the text against regular expressions that match typical chemical morphemes, such as "hydro" or "amine" to see if they occur in words. This works well as a first approximation, but additional algorithms are needed to distinguish between non-chemical words and legitimate chemical names.

Following this initial analysis additional algorithms are applied, for example, to differentiate between the morpheme "hydro" in non-chemical words such as "hydrophobia" and in legitimate chemical names, such as "hydrogen sulfate." With all potential chemical morphemes in a document identified, MaiChem uses the morphemes as building blocks to ascertain chemical names from non-chemical text strings. The system also generates a list of synonyms or variations on the names. Your knowledge workers get technical documents tagged with chemical names, for:

• Review and analysis

• Automatic categorization

• Discovery

• Connecting to structure diagramming software

• Or loading to a search system

The program can be set up to run as a stand-alone program or over a network with clients using a web browser.