Clocks

As a class, interval clocks can be defined as processes that change monotonically as a function of the passage of time and that can act as discriminative stimuli for the animal. In most of the models described below, the clock is entirely internal, receiving no input from the outside world. In fact, information from the outside world is specifically considered to be a contaminant to the pure timing function, in spite of the fact that humans and other animals will use such information at any opportunity.

State-based clocks are perhaps the type of clock closest to standard mechanical clocks. The clock steps through a fixed series of discrete states, with the current active state serving as an indicator of how much time has passed since the clock started. One metaphor for this state-based timing is a counter moving along a chessboard. If the counter always starts from the same place and moves at a consistent rate, the counter's position can be used as a measure of time.

The best-known exemplar of a state-based clock is the behavioral theory of timing (BET) (Killeen and Fetterman, 1988). Explicitly proposed as an alternative to the more cognitive model of scalar expectancy theory (SET) (Gibbon, 1977; Gibbon and Church, 1990), BET hypothesizes that animals keep track of time during an interval by going through a series of discrete behavioral states. For example, a rat might touch its nose, run to the back corner of the cage, stand up, and turn around, and the right amount of time would have elapsed to press the lever and receive a reinforcer. While this example uses large motions, it is perfectly possible that these behavioral states are such that they are not visible to the outside observer (e.g., tense the left calf muscle, tense the right, clamp jaw muscles, etc.). Studies involving observing animals in order to map out these behavioral states have achieved only indifferent success (Lejeune et al., 1998), finding consistent probabilistic patterns but not the consistent deterministic patterns of behavior required for timing under BeT. These experiments are also excruciatingly time-intensive for the experimenters, or more truthfully, for their students and lab assistants.

Two interesting variants of the state-based models are Armando Machado's revised connectionist variant of BET (Machado, 1997) and the diffusion timing models (Higa et al., 1991). The Machado model puts BET on a sounder mathematical footing and extends the model into a wider range of experimental results. It is discussed extensively in the next section.

The diffusion models produced by Staddon and his associates are an extension of sensory generalization models (e.g., Guttman and Kalish, 1956). The animal progresses along a fixed series of states over the course of the interval, just as in other state-based timing models. When the animal is reinforced, the current state gains a quantity of activation that then diffuses along the series of states. This diffused activation causes the model to increase its responding as it nears the reinforcement interval, as seen in animal data. In this model, early or late responding is due to temporal generalization rather than error. These models offer detailed predictions of transient timing phenomena, including learning effects and changes in schedules.

It should be noted that all of these state models include the hidden assumption of a process that bumps the model to the next state at a consistent rate. In a state-based model, the current time is represented by which state the model is in at the moment. For that internal representation to change, the model must move to the next state. This process might be considered similar to the pacemaker process discussed below. In some state-based models, the rate at which the model advances to the next state is a result of the time between reinforcements and is therefore responsible for the scalar property (Killeen and Fetterman, 1988). Experiments to test this assumption have been done, focusing on varying the time between reinforcements without changing the interval to be timed (Bizo and White, 1995). Changes in the interre-inforcement interval were found to have no effect on interval timing, and this is generally taken as evidence against state-based timing models. Of course, it may be that the mechanism for adjusting the pace of changing between states is simply more complex than described here and is not fooled by this methodology.

These state-based clocks are perhaps the least likely to be an accurate description of the biological basis of interval timing. These models require a very complex, preexisting special-purpose architecture, and like all previous timing models, they ignore the issue of how the animal learns the relationship between the time marker stimulus and the series of behavioral states.

The connectionist SET model of Church and Broadbent (1990) can be considered a type of complex state-based model, but because it is based on a system of neural oscillators, it will be covered in a later section.

Trace-based clocks are the equivalent of an hourglass, with time measured as a continuous smooth change in a single value. This is most commonly instantiated as a decay process; the time marker causes the value to jump up and then it decays over time, and the model learns to associate a certain level of the stimulus with the reward. A physical metaphor for this process is a leaky bucket. Each time the time marker is presented, the bucket is filled to a preset level and the water begins to leak out. If one notes the level of water in the bucket when the reward is presented, it provides a measure of how much time has passed between the timing signal and the reward. These buckets are sometimes referred to as leaky integrators.

This single-bucket model has been implemented in several versions (e.g., Schma-juk, 1999; and a single-integrator variant of the multiple-timescale model (MTS) presented by Higa and Staddon, 1999b), but a single integrator has several inherent problems. Most importantly, it does not scale properly. At longer durations, the change in the level of the trace becomes extremely small and difficult to measure. This means that a single-trace model would not be able to explain how animals time very long intervals (Gallistel, 1999).

Secondly, there is the issue of the relationship between the strength of the time marker and the strength of the trace. If the animal was trained to start timing when it heard a 30-dB beep, and was later tested with a 60-dB beep, one could reasonably suppose the resulting timing trace would be more intense and thus would take longer to decay to the previously learned level, as shown in Figure 2.1. Experimental evidence suggests that the opposite is true, that increasing the intensity of the signal causes the animals to time faster. In Leonard and Monteau (1971) subjects were presented with a tone, and then a second later, a puff of air was directed at the subjects' eyes. The subjects learned to blink at the correct interval after the tone in

Timestep

FIGURE 2.1 Activity traces as a function of response strength. If subjects were trained on an interval schedule at one signal strength (the lower curve) and then tested with a more intense signal (the upper curve), a simple trace model of timing would predict that it would take longer for the activity trace to decay to the previously learned level, causing the subjects to respond too late.

Timestep

FIGURE 2.1 Activity traces as a function of response strength. If subjects were trained on an interval schedule at one signal strength (the lower curve) and then tested with a more intense signal (the upper curve), a simple trace model of timing would predict that it would take longer for the activity trace to decay to the previously learned level, causing the subjects to respond too late.

order to shield their eyes from the puff of air. If trained with one tone intensity and then tested with a louder tone, subjects blinked sooner than they did normally. This can be taken as evidence that the traces are built up by the timing stimulus over the course of the interval, rather than presented in a single lump and then decaying over the course of the interval. This is a strong point in favor of neural network models of timing, since that is the natural dynamic of neural network models.

These troubles are solved by a more sophisticated type of trace model incorporating multiple traces, such as the spectral timing model (STM) (Grossberg and Schmajuk, 1989) and the multiple-timescale model (Higa and Staddon, 1999b). Each trace decays at its own rate and the model learns to associate a pattern of traces with reinforcement. Because different parts of the model time at different rates, the model never has to adjust its basic functioning to match the timescale of the current task. The portion of the model best suited to the current timescale becomes the primary clock, without any explicit change of scale.

However, the above solution creates a new concern, one more of elegance and parsimony than of functionality. In both STM and MTS, there is an array of integrators with different decay rates. The range of timescale invariance shown by these two models is a function of the range of these decay rates, which are preset by parameter before the model is run. If the model is to be able to learn to time 4-h intervals, it must already have an integrator with an appropriate decay rate ready to go from the beginning. This is usually gotten around by having a large array of these integrators and a correspondingly wide range of decay rates. This is not necessarily a bad thing, but it requires a great deal of redundancy, and in some models (such as the spectral timing model), these rate parameters have to be preset to a very precise set of values, making the model less robust.

At first glance, trace models would seem to automatically solve the assignment-of-credit problem. Whenever any stimulus is presented, the stimulus creates a memory trace that then decays. Only the correct time marker stimulus will consistently decay to the same point in each trial, and therefore it will be the stimulus most strongly associated with reward. However, there is also the issue of when the traces decay. Water running out of a leaky bucket is a constant process, limited only by the absence of water in the bucket. It cannot leak out faster or slower or hold. A study by Roberts (1981) showed that animals could pause their timing during the interval and then resume without any decay if that was what was required by the task. A trace model could be envisioned with an additional learning process that acts as the equivalent of a thumb over the hole in the bucket, but that thumb would then need to solve the assignment-of-credit problem in order to be able to perform its task. Trace models are particularly suited to neural network modeling and thus can be considered to be the most biologically plausible of the various breeds of timing models.

Pacemaker-accumulator systems lie somewhere in between trace- and state-based timing models. Like a trace-based system, they have an accumulator whose contents are a measure of time. However, a pacemaker adds time to this accumulator in discrete pulses, in a fashion reminiscent of a state-based model. A common metaphor for this type of clock is a leaky faucet dripping into a bucket. If the bucket starts out empty and the faucet drips at a consistent rate, the level of water in the bucket can serve as a measure of time.

The most popular pacemaker-accumulator model is the scalar expectancy theory (Gibbon, 1977; Gibbon et al., 1984; also see Treisman, 1963). In this model, the timing signal closes a switch that allows pulses from the pacemaker to begin building up in the accumulator. When the model is reinforced, the level in the accumulator is recorded. In subsequent trials, the model responds whenever the current level in the accumulator approximates the stored value.

SET has proven to be an enduring and flexible model of interval timing, due to its information-processing approach. By separating the timing mechanism into discrete subunits with clearly defined roles and properties, SET provides a theoretical framework for reducing timing behavior to manageable components (see Church, 1999; this volume; Church et al., 1991; Gibbon, 1992; Gibbon and Church, 1984).

2.2.2 Previous Work 2.2.2.1 Connectionist Models

The current project has two sets of direct predecessors. First, there are previous neural network timing models, which offer insights into possible neural timing mechanisms, and the concept that timing and learning can both be accomplished within a single substrate. These models typically use neural nodes linked together in complex and rather specialized ways, a theoretical entanglement that the current project will hopefully avoid.

The spectral timing model of Grossberg and Schmajuk (1989) is certainly the closest model to the current project. It consists of a four-layer neural network in which there is a single input node, a single output node, and two middle layers that produce a predesigned cascade of timing signals.

This cascade is a series of approximately normal curves, each peaking at a different point during the interval. These curves are distributed logarithmically to provide equally accurate timing all along the interval. The model learns to associate the proper curves in the cascade with the reward.

STM's primary strength lies in the use of multiple independent timing traces. This model's design avoids the problems of a single trace described earlier, the inherent scale of a single trace and the relationship between signal strength and timing. The concept of multiple parallel timing traces means the system gains a measure of robustness, because any individual trace could fail or be damaged while the timing system as a whole continues to operate.

The spectral timing model's weaknesses lie in two closely related problems. Most importantly, STM depends on the middle layers having precisely distributed parameter values. Each of the hidden nodes is created with a unique fixed parameter that determines the scale at which the node works. The model is never learning anything about scale; the timing traces exist at their preset scales regardless of the model's experiences and training.

Secondly, in STM learning only occurs in the associational weights between the timing traces and the output node. The model does not learn the association between the input node and the cascade of timing traces; that is also preset. Essentially, the input layer and the cascade serve as a fixed clock, without learning or adjusting to current circumstances. In order for the cascade system to work, there can be only one timing signal that triggers the cascade. In order to time two independent signals, the entire system has to be duplicated.

The connectionist set model (Church and Broadbent, 1990) offers a very different use of neural mechanisms for timing. In this model, the clock is represented as a series of oscillators that alternate back and forth between -1 and 1. These oscillators range from very fast (changing every few milliseconds) to very slow (days, weeks, or even years), allowing the model to work at a wide range of timescales. These oscillators are reset at the onset of each trial, and therefore the total state of all the oscillators serves as a measure of the interval since the trial started.

One of the primary strengths of this model is the fact that oscillators do exist in the animal body. Heartbeats, breathing rhythms, circadian cycles — there are all sorts of rhythms within the body that can be used to help make timing more accurate. Other timing models try to ignore these cycles, considering them contaminants to the pure timing function. Timing experiments do their best to make these cycles less salient. The connectionist SET model embraces these cycles and incorporates them into the timing function. Supporting this idea are experiments that suggest that timing may advance in a sinusoidal fashion, rather than a linear one (Crystal et al., 1997).

While these oscillators are very neural, the learning and memory systems of this model are not. The state of the array of oscillators is stored, retrieved, and acted upon in representational rather than associational manner. This is a product of the original SET's information-processing approach to timing, and the alternative has not been explored. This is not to say that neural learning and memory systems could not be used with an oscillator-based clock. One could easily imagine a three-layer neural network performing all the functions of the Church and Broadbent model. An input layer triggers the oscillator layer, and a series of weights between the oscillators and the output layer learns to associate the appropriate oscillators with reward.

To the best of my knowledge, this modified oscillator model has not been proposed before. This approach was not used in the current project because it requires the oscillators to be special-purpose timing neurons that function according to different rules than the rest of the network. This is both less elegant and a digression from the central concept of this project, that of using the most general model possible.

Another closely related model is Armando Machado's connectionist behavioral theory of timing (CBeT) (Machado, 1999). This model is made up of a series of nodes linked in a one-directional fashion. Activation is introduced into the first node by the timing signal and spreads along the series over the course of the interval. Each node has a degree of association with reward, and the model's output is the sum of each node's activation multiplied by its association. At reinforcement, the nodes' associations are strengthened proportionally to their activation, such that the most active nodes gain the most association with reward. CBeT can time because the activation always diffuses across the simulation at the same rate; thus the same nodes tend to be active at the same points in each trial. The nodes most associated with the reward are those that tend to be active at the reinforcement interval; therefore, the model's output is greatest at that point. The distinctive bell-shaped response curve is a natural product of the diffusion process. A similar process is responsible for the success of the diffusion-based timing models (e.g., Higa and Staddon, 1997).

The CBeT model solves many of the fundamental objections to BeT. Its firm mathematical basis allows it to make quantitative predictions that match the existing animal data very well. Unlike the original BeT model, this model does not need a separate process that bumps it from state to state. The states (nodes) are not exclusive, and activation constantly diffuses across them. This also explains the mixed results of attempts to observe a consistent series of behavioral states, as described above in the analysis of BeT. The fact that we have never been able to observe a consistent monotonic chain of behaviors leading up to reinforcement is not a problem for this version of BeT. The current time is encoded by the relative activation of many behavioral states, not just one. Which particular behavior is exhibited at any given time can be probabilistic without affecting the model's sense of where it is in the interval. This is a step away from the strict behaviorist position that informed the creation of BeT, but not a particularly controversial one.

The primary flaw with CBeT is that there is no good way to solve the assignment-of-credit problem. The system really only works if activation is presented only to the first node and then proceeds uninterrupted across the series. There is no direct connection between the presentation and the reinforcement, so the learning algorithm required to assign credit to the proper timing signal would need to be quite complex. Does every signal presented to the model diffuse across the nodes independently? Does every signal have its own series of nodes? This flaw is closely related to the model's difficulty in timing multiple sources independently. For multiple signals to be timed, there must be either multiple series of states or multiple "flavors" of activation spreading across the same series. Either way, it requires that the timing system be duplicated as a whole. This duplication seems unlikely in the face of research indicating that all signals are timed to some degree. Davis et al. (1989) showed evidence that there was temporal learning on the very first pairing of a light and a shock. For that to be true, the subjects must have started timing the light before they knew that any consequence would be associated with it.

2.2.2.2 Learning Models

There are existing attempts to model timing as a special case of a more general learning system. Arcediano and Miller (2002) attempted to connect classical conditioning theory to interval timing and made a strong case for the idea that timing is a fundamental part of all conditioning, not just the more complex timing procedures. They argue that for any sort of conditioning to occur, the animal must have a sense of the temporal relationships between stimulus and stimulus or stimulus and response. While persuasive, they do not provide a coherent mathematical model for their ideas, and so their ideas are not readily testable in a quantitative fashion. Balsam et al. (2002) support this idea and provide evidence that animals' knowledge about temporal properties exists well before it is demonstrated in their behavior.

Dragoi et al. (in press) propose a theory of timing that is both clockless and mathematical and takes a unique approach to modeling timing. Like the current project, their model is adapted from a general learning model (e.g., Dragoi and Staddon, 1999). Rather than start with a clock or even a particular mechanism such as neural networks, their theory rests on the relationship between two fundamental principles that would be true in a wide variety of learning mechanisms and shows that any model that reflects those principles will be able to demonstrate basic timing phenomena.

The first principle is that responses compete, with the strongest response being the only one emitted by the animal. Second, the competitive strength of a response is a function of its recent reinforcement history. In this model, the common peak-interval timing function is not produced by anticipation of the reinforcement, but by the fact that the operant response becomes more successful in competition over the course of the trial. Unfortunately, this model is somewhat undercut by the fact that it adjusts its parameters as a direct function of the time between reinforcements. This adjustment is necessary for the model to scale, but it is a significant blemish when compared to the elegance of the rest of the model. Significantly, the model does not include any stimulus learning. It does solve one aspect of the assignment-of-credit problem in that it can determine which of its responses is responsible for the rewards it receives. However, it cannot use any time marker but the previous reinforcement, and this prevents it from modeling a number of important interval timing procedures. The model also requires many versions of itself operating in parallel in order to produce animal-like response functions.

2.3 PROPOSAL 2.3.1 Principles

The following principles define the goals and design parameters for this project.

2.3.1.1 An Adapted General Learning Model

The starting point for the model should be a well-known general learning model. Instead of first creating a clock and allowing it to dictate the shape of the rest of the model, this project will start with a preexisting model of animal learning.

2.3.1.2 Minimal Changes from the General

If changes are necessary for the model to show timing behavior, they should be as small as possible. The model should remain a learning model used for timing rather than a clock model that learns.

2.3.1.3 Maintain General Learning Performance While Also Allowing Timing

The general learning aspects of the model must be preserved through any changes that are required. If these are compromised, it obviates the entire point of the project.

2.3.1.4 Robust Parameters

The model should be robust, functioning well under a wide range of parameter values. While it is inevitable that some particular set of values will generate the best performance, the model should still produce basic timing behavior within a reasonable range of values.

2.3.2 Architecture

For my starting point I choose to use the most common general learning model, a three-layer neural network trained by backpropagation. The choice of using a neural network model was made because of its ubiquity, not because of any particular property of neural networks or of backpropagation. The purpose of this work is to discover if it is possible for timing to occur in a general learning model, and a neural network trained by backpropagation is the single most common exemplar of that category.

As to whether this project could be repeated using other learning mechanisms, there is no a priori reason why another learning mechanism such as a reinforcement learning model (Sutton and Barto, 1998) could not also learn to time. Some learning models may do better or worse at timing, but I suspect (but do not assert) that any sufficiently general model could produce the basic results. The key properties necessary to timing are dynamic changes within the model over the course of the interval and the ability to perceive and respond to those changes.

Neural networks are made up of extremely stylized neurons that are referred to as nodes or units. Each node receives inputs, either from other nodes or the outside world, and those inputs contribute to the node's level of activation. The node then passes on some portion of that activation to the nodes it is connected to. The effect of one node upon another is regulated by connection weights. It is these weights that are adjusted as the network learns, eventually reaching a configuration that allows the network to respond to each stimulus with an appropriate output.

Two primary modifications were required to allow the model to time in a fashion similar to that of existing timing models. First, the level of activation in the nodes is carried over from time step to time step. It is common in neural network models for each unit's new state to be purely a function of its inputs, without reference to its current state. For timing to be possible, activation needs to build over time. This is not an uncommon change, but it is a step away from the prototypical artificial neural network. A similar modification was used in the spectral timing model and in many other mainstream neural network models.

This modification is similar to a type of neural network called a recurrent neural network, as shown in Figure 2.2. In these networks, the hidden layer is fed back into itself, its state from the previous time step acting as part of the input layer for the next. This grants the network the ability to respond to its own internal state.

Second, on each time step the typical neural network node is on or off, firing or not firing. Its output is filtered through a sigmoid function of varying steepness. In this model, the neuron's output is directly proportional to its activation, which

FIGURE 2.2 A recurrent neural network. In a recurrent neural network, the previous activity levels of the hidden nodes are used as inputs. This allows the network to use its own internal state as a stimulus and can serve as a form of short-term memory.

provides the next layer with much more finely grained information. Again, an identical modification was used in the spectral timing model.

The general timing model consists of three layers: an input layer, a hidden layer, and an output layer. Each layer consists of one or more nodes, and each node is connected to every node in the next layer. The layout of the network and sample activation curves for the nodes in each layer over the course of an interval are shown in Figure 2.3. Nodes in the input layer are each directly connected to an outside stimulus, the node's activation being 1 when that stimulus is present and 0 when it is not, as illustrated in Figure 2.4, Equation 1. In a typical simulation, there were two nodes in the input layer, one of which represented the timing signal and was 1 during the interval and 0 otherwise. The other represented background stimuli and was always 1.

Input

0 0

Post a comment