Language Analysis

Now that we have a basic understand of the units of language, we can begin to examine how a computer can process them. Luger and Stubblefield (1998) identify several key analysis methods for language understanding:

Linguists have classically preferred the use of rigid structured analysis techniques such as grammar, and word order to study language. Computer scientists have found that this technique does not allow enough flexibility to process "ungrammatical sentences", slang, and garbled input. Thus AI researches have established other approaches.

They have introduced more flexible data structures and parallel parsing techniques that allow several analysis techniques to be run concurrently, while pooling their results.

Production rules (IF-THEN rules based on logic that enable some understanding of input text to be derived) and semantic networks have been used to achieve greater processing opportunities.

Semantic networks networks are a general representational technique and they are used in NLP for several different purposes (Beardon et al, 1991). One of the most powerful is the representation of type hierarchies (or knowledge hierarchies) which allow us to capture the properties of other objects through a process of inheritance. See a graphical example.

All these techniques lead to the same focus: the need to be able to process input language and ascertain as many facts as possible. Some common processing goals are determining:

  • what objects were involved

  • what occurred

  • when it occurred

  • what was the outcome

Morphology

Morphology analysis helps determine the use of a word in a sentence by analysing the effect of prefixes and suffixes, thus giving information about tense, number, and part of speech.

Morphological analysis

A morphological analysis means processing word forms without considering context. Word form is defined by Popov as "that part of a text which lies between two blanks (punctuation marks are also considered word forms)".

Normal steps in MA

  1. searching for a word form in the dictionary
  2. distinguishing the stem of the word
  3. the search for the stem in the dictionary of stems
  4. word-combination processing
  5. pre-syntax

With most European Languages, sentence analysis is traditionally divided into morphological, syntactic and semantic analyses. Analysis of Asian language is a very different and difficult process due to the structure of those languages.

The processor is given goals or objectives for analysis. Common goals include:

  1. identifying words
  2. determining those which correspond to events
  3. distinguishing and processing nominal groups

Grammar and Syntax

The rules of grammar can give us information about the events taking place. We can determine how many objects were affected and whether the action took place in the past, will take place in the future or only has a chance of happening. Because language is fuzzy, the classical language analysis techniques cannot provide the depth of understanding that humans achieve. Grammar is but one way to for a machine to get closer to that understanding.

Immediate Constituent Analysis (IC)

This type of analysis was pioneered by Bloomfield (Crystal, 1971) who illustrated how you can take a sentence and split up it into two immediate constituents. For example, he used the sentence Poor John ran away. He first split this up into a subject and a predicate:

Subject: Poor John
Predicate: ran away

In turn there were split up into Poor and John, and ran and away. Thus he was one of the first to see the sentence not as a sequence, but as a series of layers on constituents. Thus tree diagrams began to be used for visual reference to language structure.

Strengths: gives a beginning look at the structure of language
Weaknesses: it does not consider grammatical relationships.

Cannot tell between active and passive sentences, does not show that "That man saw John’s mother" and "John’s mother was seen by that man" are almost the same.

‘Deep’ Syntax

Deep syntax is a much better way to represent a sentence. Deep syntax trees (see below) allow storage in a more systematic way and flexible way. Their structure makes it possible for easy conversions between passive and active, between different tenses, and they also facilitate translations to other languages.

tree.GIF (5774 bytes)

A deep syntax tree for the sentence - "John seems to know the answer"

Semantics

In general, semantics is the study of meaning. A machine will have to analyse in great detail, any input data in order to deduce some meaning from it. It needs to split up the sentence into syntactical components, layer by layer. Often there is more than one possible meaning from the sentence and so a machine will either have to guess by using experience, heuristics or by determining the most appropriate meaning according to the sentences before and after it. Thus because a machine needs to take into account not only the meaning of the sentence but also of the more broad discourse, it would need to support multiple-parsing.

Pragmatics

In broad terms, pragmatics is the way that the setting of the sentence in a discourse is used to determine its correct interpretation. The key features of pragmatics are context and reference. These will be discussed later under Inference.

< Back | Next >