Common Tasks Using Metaphrase® JavaTM API

[Up to Overview]

Contents


Introduction [top]

This document provides an introduction to the Java-based API to Metaphrase, and some suggestions for building an application which uses it.

All classes are in the package COM.Lexical.Metaphrase.

The examples given below all use <>'s to indicate the Concept with a given name, as distinguished from the name itself (a String); e.g. <common cold> for the Concept in the UMLS Metathesaurus with Concept ID "C0009443" and named by the string "common cold".

Getting Started [top]

Metaphrase Guide's primary function is to support user-directed navigation from informal phrases (strings) to authoritative vocabulary, towards achieving semantic normalization of data in clinical information systems. As explained in the Metaphrase User Guide, this is accomplished by a combination of lexical and semantic navigation.

The information in the Metaphrase Thesaurus is organized around the Concept class. A Concept is uniquely identified by a Concept ID (java.lang.String conceptID()). Concepts are used to represent a semantic unit, or "meaning," in the Metaphrase Thesaurus. The particular names associated with a given concept will change over time, and concepts may occasionally be removed; but as long as the meaning of the concept does not change, there will be only one Concept ID for each meaning named in the thesaurus.1


The first step in using this API is to instantiate an implementation of the abstract class Metaphrase--currently RMIMetaphrase. For example:
 metaphrase =new RMIMetaphrase("//"+address+"/RemoteMetaphrase", database, username, password);
For evaluation purposes, address should be set to demo.lexical.com. In addition, you can specify a specific database by name as the second argument or set it to null to access the default database.

Within the constructor, a call is made to the server to load some commonly used objects, such as all the Sources and SemanticTypes.

A Typical Interaction [top]

A typical Metaphrase interaction will begin with a lexical search. Lexical navigation is supported by the Enumeration matches(java.lang.String, int) method of Metaphrase, which takes a String and an [int] limit, and returns an Enumeration of Matches, each of which identifies a matching Concept (Concept concept()).

Depending on the intent of individual applications, different types of information about each Match can be displayed to the user. For applications in which a user is trying to locate a Concept, either all the Matches to a given Concept should be presented together, or only the first (the preferred Term) should be shown. For example, in the '98 Metaphrase Thesaurus, a search for "munch." will return two different Matches for Concept C0026785, <Munchausen's syndrome>--one for the preferred Term and one for the spelling with two h's. Rather than displaying them as separate items in a list, either one should be dropped or both forms should be shown together as a single item. Also, users may or may not wish to see an indication of the semanticTypes() or semanticClass() of the Concept for each Match.

For many applications, users will also want to be able to bring up the definitions() for each of the Concepts.

Using Relationships [top]

Semantic navigation can be supported in a number of different ways, but the most common means is the use of Relationships. The Relationship[] relationships() method of Concept returns an array of the relationships this Concept has to other Concepts, each of which can be accessed by the Concept concept2() method of Relationship (inherited from Association).

For example, a user who enters "diabetes" will often be looking for a "narrower" Concept, such as <Diabetes insipidus> or <Diabetes mellitus>. In the '98 Metaphrase Thesaurus, these two concepts can be obtained via "RN" Relationships of the Concept .

In fact, the user who enters "diabetes" may be looking for an even narrower Concept, such as <Insulin dependent diabetes mellitus>, which is narrower than <Diabetes mellitus>. This must be provided for in the user interface.

It's even possible that a user could be trying to find something for which they either don't know, can't remember, or can't spell the name, but know a related Concept. For example, such a user might type "diabetes" to get to <scleredema>, which has an "RO" Relationship from <Diabetes mellitus>. An interface with rich navigational capabilities makes such interactions possible.

Accessing Sources [top]

Some applications require that the user, or perhaps even a software program, navigate to one or more names or codes in a particular Source. For example, a "coding" application could be written to help navigate from clinical language or concepts to ICD9/CM codes for billing purposes. While Metaphrase is not sufficient for automating this particular task, an appropriate Metaphrase Enabled interface could help people perform it more efficiently and consistently.

If starting from free text, the first step would be a lexical search to get to an appropriate Concept, or set of Concepts, as above.

In other cases, the starting point will be coded items from some other vocabulary source, such as SNOMED International. In this case, one could use the source code(s) to get to the nearest concept(s). For example, given the SNOMED code "DB-61000", the following method would take the code and a Metaphrase instance and return the relevant thesaurus Concept:

  Concept fromSNMI(String code, Metaphrase m)
      throws MetaphraseException {
    Source SNMI=m.source("SNMI97");
    Partition partition=SNMI.partition(code);
    return partition.concept();
  }
Or, more succinctly (if less clearly):
  Concept fromSNMI(String code, Metaphrase m)
      throws MetaphraseException {
    return m.source("SNMI97").partition(code).concept();
  }
Note the use of the Partition class. A code is simply a String; a Partition is the informational and semantic unit which the code names in a particular Source. One way of obtaining a Partition instance is through the Partition partition(String code) method of Source. Thus, a Partition has two primary attributes: a source() and a code(). From a Partition, the concept() method is the most reliable method for obtaining a semantically proximate Concept.


After obtaining a Concept(s), a source code can be obtained by navigating to the named Partition (introduced above). The most straight-forward method for obtaining a Partition(s) from a Concept is, oddly enough, Partition[] partitions(Source source). Unfortunately, this method often returns nothing (i.e., an empty array). In general, one needs to deal with Atoms.

An Atom is an occurrence of a name (a String) in a Source. For Sources which group multiple names under codes, an Atom can also be described as an association between a name and a Partition. Most sources also differentiate between different types of names (often called "terms" within the context of a single vocabulary), e.g., Preferred Terms, Synonyms, Abbreviations, Main Headings, and Entry Terms. These different types of Atoms are identified in the thesaurus by a two-letter coded attribute called termgroup, or termgp(). The examples above are given termgroups "PT", "SY", "AB", "MH", and "ET", resp. The atoms() method of Partition returns the Atoms which fall under the Partition's code.

Each Atom is also associated with a single concept(). Different Atoms from the same Partition may be associated with different Concepts; and a single Concept may occasionally be associated with Atoms from more than one Partition within a single Source. This is because most sources, such as ICD-9, CPT, and MeSH, are classifications, and frequently group more than one meaning under the same code. And even those that try not to, such as SNOMED and Read, will sometimes have a different view of "conceptness" from each other or from the Metaphrase Theasaurus.

The full list of Atoms that are associated with a given Concept can be accessed by the method Atom[] atoms(). A method Atom[] atoms(Source source) is also provided; it naturally returns the subset of atoms() which are from the given Source. It is actually from these associations that the partitions() association is derived--i.e., partitions(source) returns exactly the set of Partitions associated with the Atoms in atoms(source).

In cases where atoms(source) is empty for a given Source, an additional method Atom[] neighborhoodAtoms(Source source) is provided, which returns the Atoms which are associated with a Concept which is in the "semantic neighborhood" of this Concept. The semantic neighborhood is the set of Concepts to which a given Concept has relationships. Of course the Atom[] neighborhoodAtoms(Source source) method is much easier to write, and more efficient, than a loop over the relationships from a concept.

There is also a method Atom[] neighborhoodAtoms(Source source, String[] termgps), which returns the subset of Atoms of neighborhoodAtoms(source) which have one of the given termgroups. E.g., if trying to navigate to MeSH, one might want search for just the termgroups {"MH","HT","EP","EN"}, eliminating the inversions, lexical variants, and supplementary chemicals.

Finally, nearestAtoms(Source source) is provided as shorthand for "atoms(source) if any, otherwise neighborhoodAtoms(source)".

[top] [Up to Overview]


1 To the best of the ability of the thesaurus maintainers.

If you cannot find the answer to your API question, or have a comment or suggestion, please send email to metaphrase-support@lexical.com.

Copyright © 1999, 2000 Lexical Technology, Inc. All rights reserved.

Lexical, Metaphrase, and the Metaphrase Enabled logo are registered trademarks of Lexical Technology, Inc.