Semantic Fingerprinting – Author Name Disambiguation Service

Semantic Fingerprinting is a managed Web service offered to scholarly publishers to disambiguate author names and affiliations by leveraging semantic metadata in an existing publishing pipeline. Plus, it’s an indexing tool to capture the research profiles of authors, for improving descriptive metadata attached to their documents.

What the software does

This new Web service extension suggests the most appropriate subject terms from a publisher’s controlled vocabulary (a taxonomy or thesaurus) to describe a contributing author, their affiliated institutions and other relevant information about the person – based on analysis of document text.

Semantic Fingerprinting generates keyword metadata that’s been enriched with an author’s research profile for documents moving through the content management system (CMS) – it adds the author’s ‘semantic fingerprint.’ The Data Harmony semantic fingerprint captures specific scientific/academic research topics, accurately reflecting subject areas covered by the author’s publications.

Semantic Fingerprinting interface (click to expand)

Powered by M.A.I.™ (Machine Aided Indexer) and driven by the source files

The process is powered by M.A.I.™ and driven by a publisher’s source files – research articles, journals, conference proceedings, thesis papers or other scholarly/academic documents.

The Semantic Fingerprinting Web service data mines a publisher’s document collection to build a database of named authors and affiliated institutions, and then expands those authority lists over time. The author/affiliation database determines the semantic algorithms that are deployed by M.A.I. for matching names in incoming content objects.

Software features – leveraging semantic patterns to optimize metadata information

Human review is required to resolve remaining author names after the Semantic Fingerprinting application completes its initial entity disambiguation pass on a document. The user interface supports an interactive approach that’s boosted with clues extracted from the input text.

How the process operates: The Semantic Fingerprinting interface presents a list of unresolved entities identified by Data Harmony as probable names, that didn’t support a correct match in the author/affiliation database. A human reviewer resolves the list, using the ‘Search authors’ pane to find the correct person’s name. There are several kinds of search parameters.

When you click on a questionable entity from the list displayed in the ‘Remaining authors’ pane, the ‘Author Review’ field displays corresponding information for that entity. Semantic Fingerprinting has captured related information about questionable entities so you can view those clues in ‘Author Review’ for effective name resolution.

When the M.A.I. Concept Extractor finds clues during one disambiguation pass, the Semantic Fingerprinting application retains the related information, increasing the accuracy for correctly resolving that author’s name in a future search! Technically, with resolution of a questionable entity, the semantic fingerprint attaches pertinent subject terms to a single author name entity.

Moving beyond metadata! Your organization can use the semantic fingerprints developed for different author groups to build a knowledge base supporting smarter search and retrieval, CMS applications, research communities and marketing campaigns.

Access Innovations customizes implementation of the Web service extension

Access Innovations provides customization and administration services during configuration for the Semantic Fingerprinting Web service extension.

The graphical user interfaces (GUIs) and entity-matching algorithms are adjustable, because every data set requires a targeted approach. It’s recommended that regular monitoring of the output is established, to maintain an optimal accuracy level for name entity disambiguation as new semantic patterns appear in the data stream.

Author disambiguation flowchart

Written by

Data Harmony