[Note. This is a project proposal that was written in 1989. Published here for the first time because it has some historical relevance ... which is to say, some of the ideas I am working on today were already contained in this 1989 paper.]
The eventual goal of this research is to construct a PDP model of high-level aspects of cognition. This is clearly an ambitious target to aim for, so there are a number of sub-goals within this overall plan. One intermediate goal is to provide an account of the process of sentence comprehension, and the lowest-level goal is to address some issues relating to the use of lexical, syntactic and semantic information during the comprehension of a sentence. Frazier, Clifton and Randall (1983) conducted some experiments on the processing of sentences containing filler-gap dependencies, which they see as bearing crucially on these issues. In particular, they interpret their results as showing that people initially use a heuristic (the "Most Recent Filler" strategy) to build a viable phrase marker for a sentence containing multiple filler-gap dependencies and that they postpone the use of semantic information until a later stage.
It is my intention to construct a PDP system that can comprehend sentences of the sort used in the Frazier et. al. experiments, and which demonstrates the same effects that they observed, but without making any of the assumptions about the rule-governed nature of parsing that lie behind the analyses given by Frazier et. al. Before expanding on these claims, it is necessary to give a detailed outline of these filler-gap experiments and the theoretical considerations that follow from them.
Frazier, Clifton and Randall (1983) on Filler-Gap Dependencies.
Consider the following sentence:
(1) The mayor is the crook who the police chief tried ____ to leave town with ____ .
In a deep structure analysis of this sentence, the underlined noun phrases (the "fillers") would be located at the positions indicated by the underlined spaces (the "gaps"). In this case the basic actions that the sentence describes are that the police chief leaves town and that she leaves town with the crook, so the second filler belongs to the first gap and the first filler belongs to the second. This would be described by Frazier et. al. as a "Recent Filler" sentence because when the first gap is encountered, in a left to right scan, it needs to be coindexed with the most recently encountered filler. It is not always the case that the most recent filler is the appropriate one; take, for example:
(2) The mayor is the crook who the police chief forced ____ ____ to leave town.
Both gaps in this sentence take the same filler, namely "the crook", and this is not the most recent one. This example would be classed as a "Distant Filler" sentence.
Frazier et. al. presented their subjects with sentences such as these and asked them to make a judgment, as soon as possible after the appearance of the last word, as to whether or not they had comprehended the sentence. They indicated to their subjects that they should make a "no" response if they felt that they would normally have gone back and scanned the sentence a second time, and that a "yes" response should indicate that they had "got it" immediately. The words were shown one at a time in rapid succession on a computer screen (the description given by the authors implies that each word was overwritten by its successor), with the final word distinguished by a full stop.
What this task is designed to probe is an initial stage in the sentence comprehension process in which a surface structure representation is computed for the sentence, before any deeper analysis is made of its meaning or significance. Frazier et. al. regard it as a plausible assumption that such a surface structure parse takes place, given that when we read a sentence like
(3) John didn't deny that Sam left.
we can easily make a snap judgment about whether or not the sentence was syntactically correct, without having to go through what seems to be a much more time-consuming process of sorting out the interaction of the negatives.
The experimental sentences all involved multiple filler-gap dependencies, so that when the processing mechanism encounters the first gap, it has more than one potential filler for it. In all cases it is not until the final word of the sentence arrives that the mechanism can be sure about the proper assignment of fillers to gaps. Frazier at. al. measured the time taken from the onset of the final word to the point at which the subject responded, and they also examined the percentage of "got it" responses that were made. In this way they hoped to find out what effect the different sorts of filler-gap constructions had on the processing that had to be done, after the disambiguation point, in order to complete a surface structure parse of the sentence. In particular, the experimental data was meant to address the following two questions:
If, when the parser is part of the way through a sentence, it is faced with more than one possible analysis of the string of words received so far (as is the case when a gap is encountered in a sentence with multiple filler-gap dependencies), what does it do? Does it (i) pursue all of the rival analyses in parallel thereafter, (ii) stop trying to analyse the phrase structure until such time as disambiguating information arrives, or (iii) pick one possible analysis and pursue that until it is confirmed, or until disconfirmatory information forces the processor to backtrack and recompute the analysis?
Structure in the Timing of Information Use.
Is it the case that, when a parsing decision is to be made, all of the different categories of information that may bear upon that decision are available to the processor, or are different sorts of information used at different points in the process of sentence comprehension?
With regard to the first question, it should be noted that there is already evidence that the processor makes an interim choice of phrase structure analysis when faced with an ambiguity, so that it sometimes has to backtrack when the choice later turns out to be wrong (c.f Garden-path sentences). There are also indications that when the processor comes to a gap for which there are two potential fillers, it uses the heuristic of assigning the most recent filler to the gap. Frazier et. al. hoped to ascertain from their experiments whether this Recent Filler Strategy was executed in an on-line fashion, during the initial surface structure parse of the sentence. If this were the case, then it would be further evidence that the processor chooses to pursue one of the possible analyses of the sentence, when faced with an ambiguity, even though the analysis may later have to be discarded.
The experiments helped to decide this issue in the following way. If the subjects initially assigned the most recent filler to the first gap in a Distant Filler sentence (such as (2)), then the arrival of the final (disambiguating) phrase should cause a good deal of processing to take place as the assignment is recomputed. This should have an effect on both the comprehension time (which should be longer) and the percentage of "got it" responses (which should be smaller) in these Distant Filler sentences, as compared with the Recent Filler sentences, where the strategy would be consistent with the proper assignment of fillers and gaps. This effect was indeed found in the data.
Turning now to the second question, that of the timing of information use, Frazier et. al. noted that the verb that comes just before the first gap in their sentences can be of three sorts. Some verbs are consistent only with a recent filler construction (try, agree, start - as in example (1)); some are consistent only with a distant filler construction (force, persuade, allow - as in (2)), whilst others (want, expect, beg ) can countenance either form, as in the example below:
(4) Recent Filler:
The mayor is the crook who the police chief wanted ____ to leave town with ____ . (5) Distant Filler:
The mayor is the crook who the police chief wanted ____ ____ to leave town.
In the context of this experiment, they dub the latter sort of verb ambiguous , whilst the other sorts are labelled unambiguous - all three types were used in the sentences presented to subjects. If the sentence processing mechanism encounters an ambiguous verb just before the first gap, then this would represent a genuine ambiguity (what Frazier et. al. refer to as a “horizontal ambiguity” ), and we would expect the Most Recent Filler strategy to be used under these circumstances, if it was being used at all. If, on the other hand, the verb just before the gap is an unambiguous one, then the processor could, in principle, make use of control information about the verb to make the proper assignment to the gap, without having to resort to the Most Recent Filler strategy. If the processor were not to make use of that information, then this would be a case of what they call a “vertical ambiguity” - a situation where the processor behaves as though it does not have enough information when in fact the information is present in the sentence.
Frazier et. al. compared the size of the Recent Filler Effect (the difference in comprehension times, and in percentage indicated comprehension measures, between the Recent Filler and Distant Filler sentences) when the verbs were ambiguous and unambiguous, and found no significant difference. They concluded that a vertical ambiguity did occur, and hence that there was evidence that at least some information (semantic control information about the verbs) was delayed in its use in the process of sentence comprehension.
There was another aspect of the experiment that touched on the question of the use of semantic control information in the unambiguous verbs. On a randomly selected one third of those trials when subjects indicated that they had grasped the sentence, they were asked a question about the content of the sentence. The responses were significantly more accurate when the verb was an unambiguous one, thus indicating that on a longer time-scale the control information associated with the verbs did help subjects to comprehend the sentences.
One final factor included in the design of the experiment was the presence or absence of a relative pronoun just after the first potential filler:
(6) The mayor is the crook (who) the police chief tried ____ to leave town with ____ .
This was done in order to test what Frazier et. al. put forward as a generalised interpretation of the Most Recent Filler strategy. They suggest that it is not simply the most recent filler that is chosen, but, rather, the most salient , and that recency is only one (albeit perhaps the most important) of a number of factors that contribute to salience. They argue that the presence of a relative pronoun could make the first filler more salient than it might otherwise have been, because it marks it as an obligatory filler. By increasing the salience of the Distant Filler in this way, it might be expected to be more often chosen to fill the first gap, contrary to the Most Recent Filler strategy. It was in fact found that the size of the Recent Filler effect was significantly reduced when the relative pronoun was present, thus helping to confirm the more general formulation of the strategy in terms of salience rather than recency.
There is a slight problem with this last result, however. In footnote 4 (p.207) the authors acknowledge that they looked (post hoc) at some other factors that ought to have influenced the salience of the distant filler: they argued that introducing the distant filler with a demonstrative there -phrase ("There goes the police chief who...") or with an equative copula ("The mayor is the crook who...") should make it more salient than when it is not focussed ("Everyone likes the woman who..."). Comparing the size of the Recent Filler effect in these "focussed" and "unfocussed" sentences, they found no difference. Attractive as the salience interpretation of the Recent Filler effect is, it is clear that further work would have to be done in order to establish exactly what factors determine the choice of filler.
Current Explanations of the Recent Filler Strategy.
What needs to be accounted for is the fact that some information that could be useful to the processor in determining filler-gap assignments is not deployed at the time that an initial structural analysis of the incoming sentence takes place, and that instead the processor first of all uses a superficial heuristic to make a rough-and-ready assignment of fillers to gaps.
Frazier et. al. mention two previously advanced explanations for the Most Recent Filler strategy before going on to discuss an explanation of their own. The two previous explanations are (i) that the structure of the language processor is such that verb control information is simply not available to the mechanism that produces the initial surface structure parse, or (ii) that verb control information simply takes a long time to look up or compute, and so arrives too late for the surface structure parser to make use of it.
The alternative offered by Frazier et. al. amounts to this: that it is computationally more costly to sort out a filler-gap assignment rigourously at the time that the problem arises, than to make a quick assignment on the basis of superficial information already available. To do the job properly would involve considering a wide variety of different types of information, and even then the answer might only be that more information (not due until later) was required in order to sort out the problem. Contrast this with the fact that in English the majority of sentences in which multiple filler-gap dependencies occur are of the Recent Filler type, and it becomes clear that if the processor initially assumes that the recent filler is the appropriate one for the first gap (and, perhaps - should the salience version of the strategy turn out to be correct - if the processor takes any stress or marking of the first potential filler to be an indication that the usual strategy should be overridden) then it will make a correct assignment most of the time. If the decision is later contradicted - and this need only require superficial information about the sentence, such as the fact that an obligatory filler has been left unassigned - it can simply reverse its earlier decision. This kind of backtracking is still computationally rather cheap, compared with the cost of a wholesale examination of all of the conceivable grammatical constraints that could be relevant to a filler-gap assignment.
They are not suggesting that the semantic information is never used, of course. They characterise their proposal as a "superficial-assignment-with-filtering" model, to emphasise the idea that the more complex grammatical constraints are, in the course of time, used as a kind of "filter" on the superficially generated representation that emerges from the early stage of sentence processing. If the grammatical constraints contradict some aspect of the structural representation, backtracking activity takes place in order to re-assign the fillers.
An Outline of a Generalised PDP System
The basic elements of the PDP model that will be used to account for these effects are outlined in this section (a more detailed account will be available shortly: see the schedule in section 5). Briefly, mental representations are modelled as configurations of "elements" that move around in a network of processors. The processors' only function is to support the elements - the processors themselves are not specialised for different types of task - and the network is both locally connected and densely populated with processors. The consequence of this architecture is that from the point of view of the elements, the network effectively looks like a "space" within which they can move: the fact that it is constructed out of discrete computing units is of little consequence. This space is referred to here as the "foreground". Information from the outside world is first of all pre-processed, and then delivered to the periphery of the foreground. An analogous chain of connections takes information away from a separate region of the foreground periphery towards the muscles that effect motor output.
The properties of the system are best described by looking at processes going on at three different levels: basic interactions, dynamic relaxation and the learning mechanisms.
At the basic level, the elements of the system have three important properties. First, an element is a program that can move around the foreground, rather than being fixed in one position, as is the case in a "regular" connectionist system. Second, elements can appear and disappear. There are many more elements than processors, but one processor can only host one element at a time, so only a small subset of the total is "active" at any time. Active elements are continually being "killed" (returned to the dormant pool) and replaced by new elements. The third property is that these elements can effectively form "bonds" with their neighbours. If two elements come within one link of one another, they exchange information, and as a result may try to repel, alter or destroy one another, or they may couple together so as to move in synchrony thereafter. These aspects of the element behaviour give the model something of a "molecular soup" flavour. (For another example of a PDP system with this kind of structure, see Hofstadter, 1983).
A high level semantic representation corresponds to a complex of elements; by the analogy just introduced, a "molecule". When the system is presented (at its peripheral nodes) with a sensory input - for example, a sequence of words - it tries to "understand" what it sees. This happens by a process of "dynamic relaxation" - the elements move around, alter their state and cause other elements to be activated, in an attempt to settle into a configuration that minimises one or more parameters (this is "relaxation" proper). The process is properly described as "dynamic" relaxation because a completely stable configuration never actually develops: new complexes of elements are always emerging from old, and then relaxing towards further configurations. This approach, then, characterises the emergence of a semantic representation of a sentence as a dynamic relaxation from an initial configuration of elements representing words (on the network periphery), through intermediate states involving the activation of elements that capture syntactic information, to a final (more or less stable) configuration that corresponds to a "model" (in the sense used by Johnson-Laird (1983)) of the significance of the sentence.
The process of dynamic relaxation is driven by the activity of individual elements in the network. An element represents some aspect of the world - an object, a relation or some other regularity - out of which an internal "model" of the world can be built. What drives the relaxation process is the fact that each element has "preferences" about the kind of other elements that it would like to see in its vicinity, and perhaps also about the sequence of events going on around it. These preferences are not static: they are modified by the experiences that the element has each time that it comes into the network. The processes that shape the character of elements in this way are responsible for the system's ability to learn. Two such processes are generalisation and abstraction. Generalisation involves an element which develops as follows: it is initially happy with a certain configuration of elements; then, on some later occasion it finds that it also fits well with another configuration of elements; it will accordingly bias its preferences towards any elements that the two configurations had in common. Abstraction has to do with groups of elements that recognise that they have been together on previous occasions: as a consequence, they create a new element that has preferences that approximate to the "outward-looking" preferences of elements on the periphery of the group.
These are necessarily somewhat sketchy characterisations of abstraction and generalisation: more detailed investigations of these learning mechanisms will be carried out as part of the proposed work.
A PDP Account of the Recent Filler Strategy.
What do syntactic rules correspond to in a PDP system of the sort described above? The (short) answer is that they are transient, intermediate elements that help to group the lexical elements together into configurations that can then go on to form the appropriate semantic structure. They are rather like catalysts that manipulate the linear chain of lexical elements into a particular folded shape, so that the appropriate further transformation can take place. There is a subtle but important difference between this view of syntactic rules and the more widespread one in which they are regarded as a way of specifying the unique phrase marker that sanctions the "grammaticality" of the sentence.
In this kind of system the elements that correspond to the meanings of words are quite separate from the elements that are associated with the written (or spoken) forms of the words. The meaning of a sentence is a configuration built from semantic elements, but this can only be constructed after two preliminary phases have occurred: first, the string of lexical elements must be assembled, then the appropriate syntactic structure must become attached to that lexical string. As the syntactic configuration develops some stable structure, so the semantic elements can be called into the foreground and assembled into a unique model of the situation portrayed by the sentence. What is important is to note that there is a natural sequencing involved here, whereby semantic information does not play much of a role in the construction of the syntactic structure, even though the two are not located in modular "compartments" of the sentence comprehension mechanism. Under most circumstances, then, it seems likely that semantic information would not be used until relatively late in the process of sentence comprehension.
With regard to the Most Recent Filler strategy, two things can be said. First, it should be noted that the function of these elements is to detect "microfeatural regularities" in the sensory input. Thus, if were noticed that the most recent filler was usually the appropriate one to put in a gap, an element would develop to characterise that regularity, and its future role, once it had become strong enough, would be a) to get called in to the foreground when multiple gaps were detected (or suspected) and b) to try to "bind" the first gap to the nearest filler.
A second point of interest is that the concept of "nearness" is an important one in the formalism being proposed. The foreground is a locally connected network of processors, and this local connectivity represents one of the basic assumptions on which the whole approach rests - without it any analysis of the properties of the system would be even harder than it already is. With this in mind, notice that there are two senses in which the most recent filler is "nearer" to the gap: it is nearer in the direction of the line of lexical items itself, and it is also nearer along the time dimension (the network is effectively four- dimensional). When bonds form between elements they preferentially involve nearby elements, simply because an element trying to form a bond is most likely to bump into a near neighbour than a far one. If, on the other hand, some new syntactic factor (the presence of a relative pronoun, say) is introduced, it may alter the situation by actively looking to form a bond with the first gap that appears. The contrast between the two situations is this: without the relative pronoun, it is the gap-element that most urgently goes out looking for a bond, and in that case the bond tends to be with the nearest potential filler. With the relative pronoun, on the other hand, the first filler bonds to a relative-pronoun- element, which in turn extends a potential bond forward toward the later part of the sentence, so that when the gap is identified it is much more likely to bond to the relative- pronoun-element than to the recent filler (see figure 1). Subsequently, the relative pronoun serves to bring the distant filler element and the first gap element together directly - having done this it goes out of the foreground.
Frazier, L., Clifton, C., & Randall, J. (1983). Filling Gaps: Decision principles and structure in sentence comprehension. Cognition, 13, 187-222.
Hofstadter, D.R. (1983). The Architecture of Jumbo. Proceedings of the International Machine Learning Workshop. Monticello, Illinois.
Johnson-Laird, P.N. (1983). Mental Models. Cambridge University Press.