Variations and Limitations of the Lexicon

The following shows a sentence being entered in the combo-box at the top of the analyses form.

When that sentence is entered, it is transmitted to the natural language processing services.  Normally, the NLP services respond with a number of derivations (aka, full parses) of the sentence.  In this case, however, the word “pachyderms” is not recognized as being in or derived from any word in the NLP services lexicon.

Unknown but Common or Domain-Specific Words

If there are words in your text that you feel should be in the general lexicon or which you would like in a domain-specific lexicon, please contact Automata, Inc.

Such experiences are not uncommon and easy to accommodate, so please feel free to suggest additions to the lexicon if the unknown word handling described here is not sufficiently convenient or if others will benefit from the inclusions you would like to suggest.

The lexicons use include many thousands of proper names, including forenames and surnames, names of cities and countries, and various other proper noun.  By default, they are part of the general lexicon.  There are also vocabularies that are more specific to biology, medicine, the law, and other domains, several are which are included in the general vocabulary.  Additional words (including multiple words, such as in compound proper nouns) are easy to accommodate in the general or domain-specific vocabularies.

Thus, you are not limited to unknown word handling as described here, but it is an immediate solution that suffices for rare words.

Interactive Unknown Word Handling

Assuming the setting for clarifying unknown words is checked in the settings dialog:

The following dialog will be presented to request part of speech information about the unrecognized word:

We assist NLP (and provide some information that will benefit others) by specifying some information about the parts of speech that this word plays in the sentence at hand.

In this case, we could select that it is a noun or, more precisely, that the word is either common or plural:

 

Assuming we selected one of common or plural, we could then select that it is both common and plural:

When we say OK, parsing continues:

If permitted, the software will accumulate responses to this dialog and reuse them. 

For example, if another sentence had the word “pachyderms”, the dialog might be presented as follows:

The difference between checked and filled but not checked indicates that something is a possibility rather than it being definitively the case.  In this case, the selections will function equivalently.

Best Practices

Checking the most detailed combination of boxes that apply to a use of a word tends to reduce the number of derivations for a given sentence.  This can simplify disambiguation if the specifications are precisely correct, not only for the word, but for how it is used in the sentence.

Generally, less experienced users should work with shorter sentences and choose a single checkbox, especially one of the top-level, coarse parts of speech.


Copyright © 2018 by Automata, Inc., all rights reserved.  The Linguist and KnowBuddy are trademarks of Automata, Inc. and are patent-pending.