Disambiguation with Semantic Discriminants

The image below shows the analyses tab of KnowBuddy with the settings dialog button to the right.

The analyses form contains one or more analysis forms, each on its own tab.  The above image shows a single analysis form for the sentence “the nucleus of a cell is inside its membrane”.

Within the analysis form are two tabs and a toolbar.  The tabs include 8 derivations (aka, full parses) for the sentence.  The tab being displayed within the analysis form is showing 14 of 94 clauses which discriminate among the 8 parses.

The toolbar has two combo-boxes on the right, one of which indicates that these 14 clauses are “semantic” rather than syntactic or lexical.  Typically, we recommend disambiguating using semantic clauses.[i]

“Reading” Clauses

Here is one way in which the above may be interpreted.  To disambiguate the sentence, we decide which of these makes the most or least sense.  We left click to approve and right-click to “veto”.

1.        “a” means that the cardinality “cell” is 1

2.        “inside” refers to a place

3.        “cell” is used as a nominal

4.        “membrane” is used as a nominal

5.        “nucleus” is used as a nominal

6.        “its” indicates that “membrane” is possessed by “its”

a.       which reads awkward due to the use of a possessive pronoun (the meaning is that whatever “its” refers to possesses the membrane).

7.        “inside” is used prepositionally

8.        “inside” complements “is” and is qualified by some “membrane” (i.e., “inside” relates “membrane” to the situation described by “is”)

9.        “inside” complements “nucleus” and is qualified by some “membrane” (i.e., “inside” relates some “membrane” to some “nucleus”)

10.    “inside” is qualified by “membrane” but what it complements is elided (e.g., unexpressed or otherwise undetermined)

11.    “of” relates some “cell” to some “nucleus”

12.    “its” is used as a pronoun

13.    “is” is used verbally as a form of ‘be’, the subject of which is some “nucleus” and the direct object of which is some “membrane”

14.    “is” is used verbally as a form of ‘be’, the subject of which is referenced by “nucleus” and the direct object of which is unexpressed or otherwise undetermined

These expressions are intended to be somewhat precise but they are not completely precise.  For example, we have not clearly distinguished “membrane” from what is intended as the semantic reference of membrane.  Nonetheless, they hopefully assist with interpretation.

Simplifying Disambiguation

As mentioned above, semantic constraints are recommended for disambiguation.  In the image above, however, almost half of the lines presented are checked.  Checked items are implied given the derivations that survive at any point in disambiguation.  In the case shown above, there are 8 derivations and for each checked clause the number of derivations that use them is shown as 8 in the usages column.  To removed this distraction we can uncheck the checkbox icon on the analysis form’s toolbar (to the left of the combo-box showing “semantic”).

Disambiguating using Clauses

Generally, disambiguating proceeds by skimming the set of clauses and selecting the first one you see in which you are utterly confident.  The first pass down the list should not take more an a small number of seconds.  If it takes longer than 5 or 10 seconds, contemplate your approach further as discussed below.

In this case, skimming down the list, cardinality is not intuitive, location seems suspiciously obvious, inside without a complement is clearly wrong (a veto by right clicking would be fine, but vetoing is typically less productive than affirmation).

The choice as to whether “inside” complements “is” or “nucleus” takes too long to ponder (for a first pass), but it’s clear that the one with the unexpressed or undetermined 1st argument could be safely vetoed.

The verbal relating nucleus and membrane is the most obvious and safest constraint to approve by left-clicking.

Here we are left to contemplate only whether the meaning of “a” in this sentence is the same as “one” in the sentence “one cell has a nucleus”.

The answer is clear, so we veto the remaining clause by right-clicking.

As a result, we have eliminated grammatical ambiguity, as shown by the derivations tab having only 1 of 8 original parses remaining.

The tautology (truth) symbol (⊤) indicates what we selected, the negation symbol (¬) indicates what we vetoed, and the implication sign (⇒) indicates what was inferred from our actions.

Structure of Clauses

Each of the clauses above has the following structure:

1.        a “functor” followed by a number arguments within parentheses

2.        the functor typically is composed of a type and a predicate

3.        the arguments are quoted text, question marks, or an ellipsis symbol

The functor types used above include:

·           cardinality

·           location

·           nominal

·           possessive

·           prepositional

·           pronoun

·           verbal

There is no more detailed predicate than the functor type for the following (each of which has the text of the word that motivated the predicate as their 1st argument):

·           cardinality

·           possessive

·           pronoun

The other clauses have a predicate within parentheses after the functor:

·           location(place) qualifies the type “location” with “place”

o   another possibility is “temporal”,

o   or even “unspecified”, as in something being “at X”, where Xcan be a place or time

·           the 3 nominals appear redundant because the singular form of the simple noun is used in all 3 cases

o   the quoted text shown as their first argument would be different than the predicate if they were plural, for example

·           the 2 prepositional clauses, as is most frequently the case, have the unquoted text that occurred in the utterance as their predicate

o   this may not be the case depending on case or punctuation

o   it may also not be the case if there are distinct predicates for multiple senses of the preposition

·           the verbal clause has the infinitive form of the verb as its predicate

For most predicates, the text of the input which suggested the clause is shown as the 1st argument.  There are exceptions, however.

Types of Arguments

The arguments to clauses are generally quoted text or one of the following:

·           ? indicates an unexpressed or undetermined argument

·           ?? indicates an expected argument that is missing

o   ?? is relatively rare and even more so among the clauses of a properly disambiguated sented

o   ?? may indicate something odd about the sentence or that permissive parsing (e.g., robust, fragmented text) is in effect

·           ... indicates something more complex than ? typically refers to

o   e.g., conjunctions frequently have ... arguments that correspond to a sentence or clause

Types of Clauses

The following table shows the variety of clauses that may appear.  Those marked basic are all that are important for most users.

Those with unspecified usage commonly arise in practice but do not need to be considered to be productively successful with the software.

Those marked intermediate or advanced occur infrequently and require more skill to interpret. They are rarely needed during disambiguation but may be useful in downstream applications.

[i] Sometimes, it may be helpful to look at lexical discriminants.  And, if no parses are found, such as due to a time-limited effort to parse a particularly long and complex sentence, perhaps, only lexical and syntactic discriminants will be available.  This is more advanced usage than addressed here, however.

