The Journal of Neuroscience, July 30, 2003, 23(17):6713-6727
Previous Article | Next Article 
A Two-Stage Unsupervised Learning Algorithm Reproduces Multisensory Enhancement in a Neural Network Model of the Corticotectal System
Thomas J. Anastasio1,2 and
Paul E. Patton2
1Department of Molecular and Integrative
Physiology and 2Beckman Institute, University of
Illinois at Urbana/Champaign, Urbana, Illinois 61801
 |
Abstract
|
|---|
Multisensory enhancement (MSE) is the augmentation of the response to
sensory stimulation of one modality by stimulation of a different modality. It
has been described for multisensory neurons in the deep superior colliculus
(DSC) of mammals, which function to detect, and direct orienting movements
toward, the sources of stimulation (targets). MSE would seem to improve the
ability of DSC neurons to detect targets, but many mammalian DSC neurons are
unimodal. MSE requires descending input to DSC from certain regions of
parietal cortex. Paradoxically, the descending projections necessary for MSE
originate from unimodal cortical neurons. MSE, and the puzzling findings
associated with it, can be simulated using a model of the corticotectal
system. In the model, a network of DSC units receives primary sensory input
that can be augmented by modulatory cortical input. Connection weights from
primary and modulatory inputs are trained in stages one (Hebb) and two
(Hebb-anti-Hebb), respectively, of an unsupervised two-stage algorithm.
Two-stage training causes DSC units to extract information concerning
simulated targets from their inputs. It also causes the DSC to develop a
mixture of unimodal and multisensory units. The percentage of DSC multisensory
units is determined by the proportion of cross-modal targets and by primary
input ambiguity. Multisensory DSC units develop MSE, which depends on unimodal
modulatory connections. Removal of the modulatory influence greatly reduces
MSE but has little effect on DSC unit responses to stimuli of a single
modality. The correspondence between model and data suggests that two-stage
training captures important features of self-organization in the real
corticotectal system.
Key words: superior colliculus; multisensory integration; unsupervised learning; corticotectal system; neural network model; self-organization; Hebbian learning; anti-Hebbian learning
 |
Introduction
|
|---|
Integration of input from multiple senses is critical to survival in a
complex environment. Research in sensory neurobiology is shifting from a focus
on single sensory systems to consideration of interactions between sensory
systems (Stein and Meredith,
1993
; Stein,
1998
). The most studied loci of multisensory convergence in
mammals are the deep layers of the superior colliculus (DSC). Findings on DSC
neurons raise intriguing questions about multisensory integration.
The DSC functions to detect sensory targets and initiates orienting
movements toward them (Robinson,
1972
; Wurtz and Goldberg,
1972
; Sparks and
Hartwich-Young, 1989
). DSC neurons are organized topographically
according to the location of their receptive fields
(Middlebrooks and Knudsen,
1984
; Meredith and Stein,
1990
; Meredith et al.,
1991
). Many DSC neurons receive sensory input of multiple
modalities (Wallace and Stein,
1996
), and the receptive fields of the same neuron for different
modalities overlap (Meredith and Stein,
1986b
,
1996
;
Kadunce et al., 2001
).
Multisensory DSC neurons can exhibit multisensory enhancement (MSE), which is
the augmentation of the response to stimulation of one modality by stimulation
of a different modality (King and Palmer,
1985
; Meredith and Stein,
1986a
; Wallace et al.,
1996
,
1998
).
The DSC receives visual, auditory, and somatosensory input from a variety
of subcortical and extraprimary cortical sources
(Sparks and Hartwich-Young,
1989
; Wallace et al.,
1993
). MSE depends on input from specific regions of parietal
cortex. Inactivation of these regions reduces MSE, while often only minimally
affecting the responses of DSC neurons to stimulation of a single modality
(Wallace and Stein, 1994
;
Jiang et al., 2001
). It would
be parsimonious to suppose that the cortical signal required for MSE is
multisensory. Paradoxically, the parietal projections necessary for MSE
originate from unimodal neurons (Wallace
et al., 1993
). The question of how unimodal cortical projections
could produce multisensory enhancement in the DSC remains unanswered.
Multisensory integration apparently improves the ability of DSC neurons to
detect targets (Stein et al.,
1988
,
1989
;
Wilkinson et al., 1996
;
Jiang et al., 2002
). However,
not all DSC neurons are multisensory. In the cat, approximately one-half of
DSC neurons are multisensory, and, in the monkey, only approximately
one-quarter, despite the availability of input of multiple modalities
(Wallace and Stein, 1996
). The
question of why some DSC neurons are multisensory while others are not also
remains open.
This paper describes a model of the corticotectal system that simulates MSE
and provides possible answers to these questions. It consists of an array of
DSC units receiving unimodal primary and modulatory inputs and is trained in
two stages that are unsupervised, local, and neurobiologically plausible.
Primary inputs are trained first using the (Hebbian) self-organizing map
algorithm. Modulatory inputs are trained second, using a novel Hebb-anti-Hebb
rule. Training produces a mixture of unimodal and multisensory DSC units and
causes the DSC to extract information concerning simulated targets from its
inputs. The trained network can simulate MSE, with unimodal modulatory inputs
preferentially augmenting DSC unit responses to stimulation of multiple
modalities. The correspondence between model and data suggests that the
self-organization of the real corticotectal system may involve mechanisms
analogous to the two-stage algorithm.
 |
Materials and Methods
|
|---|
The corticotectal network model represents neurons in the DSC and the
sensory inputs they receive from subcortical and cortical sources (see
Introduction). Inputs to DSC units are of two types, primary and modulatory.
Primary inputs provide weighted connections to DSC units, and modulatory
inputs augment the values of primary weights. Primary and modulatory inputs
are specific for visual, auditory, or somatosensory modalities and can be
activated by targets having the corresponding sensory attributes. Primary and
modulatory input weights are trained in two separate stages of unsupervised
learning. Training of primary connections produces a mixture of unimodal and
multisensory DSC units, whereas training of modulatory connections produces
MSE. Two-stage training causes the DSC to extract information from its sensory
inputs concerning targets. The two sequential stages of training are inspired
by DSC development, in which multisensory neurons first appear and later
become capable of MSE, with onset of MSE corresponding to onset of parietal
cortical influence (Wallace and Stein,
1997
,
2000
,
2001
). A diagram of the
corticotectal model is shown in Figure
1. Pseudocode for the two-stage algorithm is given in
Table 1. A list of all
variables and mathematical notation used in the paper is given in
Table 2.

View larger version (21K):
[in this window]
[in a new window]
|
Figure 1. Schematic of the corticotectal model that produces multisensory enhancement
in the DSC. A, The DSC is represented as a 10 x 10 grid of
units. Primary inputs represent unimodal, excitatory projections from the
visual (V), auditory (A), or somatosensory (S) systems. Modulatory inputs
represent unimodal visual, auditory, or somatosensory projections from
parietal cortex. Before stage-one training, each DSC unit receives primary
input of all three modalities. Stage-one training causes DSC units to become
specialized for specific modalities or modality combinations. As an example, a
unit that receives primary input from the visual and auditory systems after
stage-one training is shown in B. B, Before stage-two training, each
primary connection may potentially receive modulatory input of all three
modalities (solid and dashed lines), but stage-two training is restricted by
the modality-matching and cross-modality constraints (see Materials and
Methods). After stage-two training under these constraints, the unit shown can
receive only visual and auditory modulatory input, with the primary visual
connection modulated by the auditory modulatory input, and the primary
auditory connection modulated by the visual modulatory input (solid
lines).
|
|
Architecture and activation of the network
The DSC units (nz = 100) are arranged in a
square 10 x 10 grid representing a small patch in the DSC. Neurons in
the DSC have large overlapping receptive fields
(Middlebrooks and Knudsen,
1984
; Meredith and Stein,
1986b
,
1990
,
1996
;
Meredith et al., 1991
;
Wallace et al., 1996
;
Kadunce et al., 2001
). We
therefore assume that the units in the model patch have overlapping receptive
fields and that the entire DSC patch receives input from the same small region
of external space. Primary and modulatory inputs can be driven by targets
appearing in this region.
Characterization of the target. The target T is
arbitrarily assumed to be present one-half of the time and absent one-half of
the time. When present, a target can exhibit any combination of visual (V),
auditory (A), and somatosensory (S) attributes. The target has eight states
t (t = 0, 1, 2,..., 7), corresponding to the target-absent
state of no sensory attributes plus the seven possible attribute combinations.
The target-absent state (V = 0, A = 0, S = 0) has probability
P(T = 0) =
. For simplicity, all modality-specific
(single modality) targets are assigned the same probability
ps, whereas all cross-modal (multiple modality)
targets are assigned probability pc, where
ps + pc =
.
For present targets, the three modality-specific states (V = 1, A = 0, S = 0),
(V = 0, A = 1, S = 0), and (V = 0, A = 0, S = 1) have probabilities
P(T = 1) = P(T = 2) =
P(T = 3) = ps/3, and the four
cross-modal states (V = 1, A = 1, S = 0), (V = 1, A = 0, S = 1), (V = 0, A =
1, S = 1), and (V = 1, A = 1, S = 1) have probabilities P(T
= 4) = P(T = 5) = P(T = 6) =
P(T = 7) = pc/4. The eight
target-state probabilities sum to one.
Activation of primary and modulatory inputs. Primary inputs are
represented by a set of nx = 3 random variables
Xj(j = 1, 2, 3). Modulatory inputs are
represented by a set of ny = 3 random variables
Yk(k = 1, 2, 3). Each of the three
primary or modulatory inputs is specific for one of the three sensory
modalities: visual, auditory, or somatosensory. Variables
xj and yk denote
specific instances of Xj and
Yk. The variables Xj
and Yk represent whole populations of sensory
neurons.
For simplicity, each discrete random variable
Xj or Yk is the sum
r over a different set of n = 20 binary random variables.
Each of the binary variables in a set is specific for the same sensory
modality as the random variable Xj or
Yk that represents it. The individual binary
variables take value zero or one depending only on their activation
probabilities. Activation probabilities are either driven or spontaneous,
depending on whether or not the target presents the modality specific to the
set. For simplicity, all 60 binary variables represented by the three primary
inputs Xj have the same driven and spontaneous
activation probabilities of px1 and
px0 (where
px1 >
px0). Likewise, all 60 binary
variables represented by the three modulatory inputs
Yk have the same driven and spontaneous
activation probabilities of py1 and
py0 (where
py1 >
py0).
Characterization of primary and modulatory inputs. Because they
each represent the sum of n = 20 binary random variables, the
Xj and Yk can assume
any discrete value between 0 and 20. Because the individual binary variables
in each sum are independent, the Xj and
Yk are binomially distributed. The general
formula for the binomial distribution b(n, p) is as follows
(Appelbaum, 1996
):
 | (1) |
where p is the probability that any binary random variable takes
value one. To use Equation 1 to describe the likelihood distributions of the
primary and modulatory inputs, probability p can be replaced by the
corresponding activation probability. Thus, the target drives primary input
Xj when it presents the modality specific to
Xj, and the driven likelihood for
Xj is distributed as b(n,
px1). Similarly, the spontaneous
likelihood for primary input Xj is distributed as
b(n, px0). The driven and
spontaneous likelihoods for modulatory input Yk
are distributed as b(n, py1)
and b(n, py0), respectively.
All of the Xj and Yk
are distributed independently of one another given the state of the target.
The binomial distribution affords a simple way to model a sensory input that
represents the combined contribution of many individual inputs. The binomial
approximates the Poisson distribution when n is large and p
is small (Hoel et al., 1971
).
The Poisson distribution has been used to represent sensory inputs of
different modalities in previous models of MSE
(Anastasio et al., 2000
).
For any given primary input Xj or modulatory
input Yk, the difference between the driven and
spontaneous likelihoods can be quantified using the Kullback-Leibler
divergence measures Dx and
Dy, respectively, as follows
(Cover and Thomas, 1991
):
 | (2) |
 | (3) |
Dx or Dy is used to
quantify the amount of separation between the driven and spontaneous
likelihoods of the primary and modulatory inputs, respectively. When
Dx or Dy is small,
the corresponding input is ambiguous with respect to the presence of a target.
When Dx or Dy is
large, the input is better able to indicate the presence of a target. A
spontaneous likelihood and a series of driven likelihoods for the primary
input are illustrated in Figure
2. The divergence measures associated with the spontaneous and
driven activation probabilities used for the primary and modulatory inputs are
enumerated in Table 3.

View larger version (23K):
[in this window]
[in a new window]
|
Figure 2. Input likelihoods P(r) modeled as binomial distributions b(n,
p) (Eq. 1), where r is the number of the n = 20 binary
variables that are active. The primary input spontaneous likelihood (solid
curve) has activation probability p =
px0 = 0.1. The three primary input driven
likelihoods have activation probabilities p =
px1 of 0.3, 0.6, or 0.9 (dashed, dot-dashed, or
dotted curves, respectively). For the modulatory input, the driven likelihood
has activation probability p = py1 = 0.1
(solid curve), whereas the spontaneous likelihood has activation probability
p = py0 = 0. Thus, a modulatory input of
zero has probability one under spontaneous conditions.
|
|
Mutual information between target and primary or modulatory
inputs. The mutual information between the target and the input provides
a measure of the amount of target information contained by the input. Mutual
target-input information can be compared with the information content of the
target alone. The information content of the event T = t is
defined as I(T = t) =
-log2[P(T = t)]
(Cover and Thomas, 1991
). The
average information content of the target, equal to target entropy
H(T), is given as follows:
 | (4) |
where T = t is written simply as T for notational
clarity. When the proportion of modality-specific to cross-modal targets is 2,
target entropy H(T) equals 2.32 bits
(Table 3).
The DSC receives target information from its primary and modulatory inputs
of all three modalities. The entire primary or modulatory input can be
represented by random vectors X = [X1,
X2, X3] or Y =
[Y1, Y2, Y3]. The
ability of the entire primary or modulatory input to convey target information
to the DSC can be quantified as mutual information I(T;
X) or I(T; Y) as follows
(Cover and Thomas, 1991
):
 | (5) |
 | (6) |
Mutual information measures associated with the spontaneous and driven
activation probabilities used for the primary and modulatory inputs are
enumerated in Table 3. When the
primary input spontaneous and driven activation probabilities are widely
separated, as when px0 = 0.1 and
px1 = 0.9,
Dx is large and I(T;
X) = H(T), indicating that the input
contains complete information about the target
(Table 3).
Activation of model DSC units. Activities of the
nz = 100 DSC units are represented by random
variables Zi (i = 1, 2,..., 100).
Variables zi denote specific instances of
Zi. The activity of a specific DSC unit
zi is computed as the weighted sum of the primary
inputs to the DSC unit, passed through the sigmoidal squashing function:
 | (7) |
The
symbol indicates the update that occurs with each new target
presentation. The variable ø = 10 is a tonic bias input. It represents
inhibitory influences on the DSC from structures such as the substantia nigra
(see Discussion). The sigmoid squashing function simulates the threshold and
saturation properties of biological neurons. The parameter
=
adjusts the sensitivity of the squashing function. The variable
wij represents the modulated weights of the
connections to each DSC unit zi from each primary
input xj. These weights are computed using the
following formula:
 | (8) |
Each uij is the unmodulated weight of the
connection from primary input j to DSC unit i. Each
vijk is the modulatory weight onto connection
uij, from modulatory input k with
activity yk.
The learning algorithm
Each of the three primary inputs initially projects to every DSC unit. Each
of the three modulatory inputs has a potential connection to every primary
weight. The primary and modulatory weights are trained in two separate stages
of unsupervised learning (Table
1). For both stages, training occurs only when the target is
present.
Stage-one unsupervised learning. In the first stage, the primary
weights uij are trained using the self-organizing
map (SOM) algorithm (Willshaw and von der
Malsburg, 1976
; Kohonen,
1982
,
1988
;
Haykin, 1999
). First-stage
training causes the model DSC to represent the primary inputs (see Results).
The SOM involves selection of DSC units via competition and cooperation,
followed by Hebbian modification of primary weights. The SOM is used here in
its standard form.
The primary weights uij are initially set to
small random values drawn from a uniform distribution in the range 0 to 0.1.
The modulatory weights are fixed at vijk = 0
during stage-one training. At each iteration, a new target T is
chosen according to target-state probabilities P(T =
t) for t = 1, 2, 3,..., 7. The target state determines
whether each primary input Xj will be spontaneous
or driven, and values xj are each drawn randomly
from a binomial distribution with activation probability
px0 or
px1, respectively. The DSC unit
responses zi to the primary inputs are then
computed using Equation 7, where the wij are
unmodulated, so that wij =
uij. The DSC unit with the largest response is
identified as the winner. The index of the winning DSC unit and of each of its
25 neighbors in the 10 x 10 grid forms subset h. The
neighborhood zh is constructed such that the
winning DSC unit has activity 1, the eight nearest neighbors have activity
0.3, and the 16 neighbors once removed have activity 0.1. The neighborhood
respects the boundaries of the 10 x 10 grid, so that only DSC units near
the center of the grid have the full compliment of 25 neighbors. The primary
weights to each DSC unit in zh undergo the
following Hebbian update:
 | (9) |
where
is the learning rate that is decremented after each iteration.
The primary weights to each DSC unit zh are then
normalized so that
.
Training for 5000 iterations with a learning rate of 0.1 decrementing to 0.01
was found to produce stable results. The primary connections are pruned after
completion of stage-one training. Any weight uij
<
u is set to zero, where
u = 0.4. The weights are renormalized after
pruning. Primary weight pruning is necessary because the normalization
procedure used does not set weights to zero.
Stage-two unsupervised learning. The modulatory weights
vijk are trained in the second stage. The
modulatory inputs represent neurons in parietal cortex that project to the DSC
and produce MSE (see Introduction). Because the parietal inputs are unimodal
but only enhance cross-modal DSC neuron responses
(Wallace and Stein, 1994
;
Jiang et al., 2001
), they are
assumed to modulate inputs from other sources rather than to excite DSC
neurons directly (see Results and Discussion). Electrophysiological evidence
indicates that the modalities of the parietal inputs to a given DSC neuron
usually match the modalities received by that neuron from other sources
(Wallace et al., 1993
). These
findings can be used to infer two constraints on parietal-DSC connectivity:
modality-matching and cross-modality. According to the modality-matching
constraint, a DSC neuron may only receive modulatory inputs of the same
modalities as those of its primary inputs. According to the cross-modality
constraint, a modulatory input may only affect primary inputs of modalities
different from its own (Fig. 1
B). Stage two is designed to train the modulatory
connections to produce MSE while respecting the constraints inferred from
corticotectal neurobiology.
In the second stage, the modulatory weights
vijk are trained using a novel algorithm based on
correlation and anti-correlation between modulatory inputs, primary inputs,
and DSC units. The modulatory weights are initially set to zero. The state of
a new target is chosen at each iteration (t = 1, 2, 3,..., 7), and
the primary inputs are activated according to target state as described for
stage-one training. During stage two, the modulatory inputs are also activated
according to the sensory attributes of the chosen target. DSC unit responses
to primary and modulatory input are determined using Equations 7 and 8.
The second-stage training algorithm requires a determination of whether
each primary input, modulatory input, and DSC unit is active or inactive. A
primary input xj is active when
xj >
x, a
modulatory input is active when yk >
y, and a DSC unit is active when
zi >
z, where
x,
y, and
z are thresholds. Primary and modulatory thresholds
x and
y are set at the
integer nearest to the intersection point of the corresponding spontaneous and
driven likelihood distributions (Fig.
2). Because the likelihoods are determined by the spontaneous and
driven activation probabilities, each pair of activation probabilities is
associated with a different threshold. Thresholds for the primary inputs are
as follows: px0 = 0.1,
px1 = 0.3,
x = 4; px0
= 0.1, px1 = 0.6,
x = 6; and
px0 = 0.1,
px1 = 0.9,
x = 10. For the modulatory inputs, where
py0 = 0 and
py1 = 0.1,
y is 0. Because the probability distributions for
the Zi cannot be specified, the threshold
z is set empirically.
The modulatory weights vijk are trained
according to the following stage-two rules. If a DSC unit and a modulatory
input are both active, then increment the modulation of inactive primary
inputs and decrement the modulation of active primary inputs by an amount
. If a modulatory input is active but a DSC unit is not, then decrement
the modulation of all primary inputs, active and inactive, by 2
. The
value of
depends primarily on the number of stage-two iterations.
Increments and decrements are cumulative, but the modulatory weights
themselves are constrained to be positive. Accumulation is accomplished by
means of dummy variables dijk that take positive
or negative values. The needed operations are summarized as follows:
 | (10) |
 | (11) |
 | (12) |
 | (13) |
Although these rules train modulatory rather than direct connection weights,
Rules 10 and 11 are essentially anti-Hebbian (see Discussion). Rules 10 and 11
enforce the cross-modality constraint, whereas Rule 12 enforces the
modality-matching constraint. Rule 13 simply specifies that a modulatory
weight is left unchanged if the associated modulatory input is inactive. The
actual modulatory weights take the values of the corresponding dummy variables
if they are positive and take the value zero otherwise:
 | (14) |
 | (15) |
Modulatory weights can continue to grow as stage-two training proceeds, so
they are bounded at an upper limit of one.
Assessing the trained model
To assess the effects of training on the model DSC, the responses of DSC
units to their primary and modulatory inputs are measured, and the amount of
target information extracted by DSC units from their inputs is estimated.
Quantifying MSE in the model. To simulate experiments on
multisensory DSC neurons, the responses of DSC units are examined as the
levels of primary inputs are increased systematically from zero to n
= 20 or held fixed at the mean of their spontaneous likelihood
npx0. The modulatory input levels are
varied in a similar manner, but they are scaled to reflect their smaller
dynamic range. The responses in the bimodal case can be used to compute
percentage multisensory enhancement (%MSE) values using the following formula
(Meredith and Stein, 1986a
):
 | (16) |
where CM is the cross-modal response and SMmax is the larger of the
two modality-specific responses. By this definition, enhancement occurs
whenever CM > SMmax. Enhancement is subadditive when the
cross-modal response is smaller than the sum of the two modality-specific
responses, and supra-additive when it is greater than the sum. The effects of
the modulatory inputs can be examined by comparing %MSE values at specific
levels of primary input, after various modulatory connections have been
removed. This is intended to simulate the effects of experiments in which
cortical input to the DSC from specific regions is selectively inactivated
(Wallace and Stein, 1994
;
Jiang et al., 2001
).
Mutual information between target and DSC units. DSC unit
responses vary sigmoidally between zero and one. Even if the
nz = 100 DSC units are binarized by thresholding,
they would still have 2100 different states potentially available
to convey target information. Complete characterization of the mutual
information between DSC units and the target is impossible because it would
require determination of the joint probability of each of these DSC states and
each target state. Instead, the information gain attributable to training is
measured between the target and the number
of DSC units whose responses
zi exceed threshold
I
= 0.3:
 | (17) |
The joint probability between the target and the number of suprathreshold
DSC unit responses P(T,
) is estimated computationally
by presenting many targets of various states, computing
for each of
them, and binning the
values. The joint probability distribution is
estimated from the resulting histogram by scaling. The marginal distributions
P(T) and P(
) are computed from the estimated
joint distribution. The estimated probabilities are used to calculate an
estimate of the target information gained by the DSC network:
 | (18) |
This information gain measure is a crude estimate of the true mutual
information between the target and the DSC network. However, it is adequate
for the purpose of showing that the two-stage, unsupervised learning algorithm
does train the model DSC to extract information concerning the target from its
inputs.
 |
Results
|
|---|
The two-stage learning algorithm is used to train the corticotectal model
in an unsupervised manner as simulated targets are presented to it. Pseudocode
for the training algorithm is given in
Table 1, and a schematic of the
model is shown in Figure 1. DSC
units receive two types of random inputs, primary and modulatory. Primary
inputs make direct, excitatory connections onto DSC units, whereas modulatory
inputs can augment the primary weights (Eq. 8). Primary and modulatory weights
are trained during stage one and stage two of the algorithm, respectively.
Both stages of training are associated with interesting emergent properties. A
mixture of unimodal and multisensory DSC units arises spontaneously from
stage-one training. Multisensory enhancement arises spontaneously from
stage-two training, and removal of modulatory connections has a much greater
effect on DSC unit responses to cross-modal (combined modality) stimuli than
on responses to modality-specific (single modality) stimuli.
Simulating modality specialization in the DSC
The simulations in this section use stage one of the two-stage algorithm to
show how the proportion of modality-specific to cross-modal targets, and the
information content of the inputs, could influence the percentage of
multisensory neurons in the DSC. In the model, the modality selectivity of an
individual DSC unit depends on the weights of its primary input connections. A
visual-auditory DSC unit, for example, would receive nonzero weights from the
visual and auditory primary inputs and zero weight from the somatosensory
primary input (Fig.
1B). The modality selectivity of individual DSC units in
the model is determined entirely by stage one of the two-stage algorithm,
because stage two respects the modality selectivity established by stage
one.
Stage one is based on the SOM algorithm (see Materials and Methods). The
SOM is a neurobiologically plausible and well established computational tool
for modeling the activity-dependent refinement of sensory maps in the nervous
system. This algorithm naturally produces a mixture of unimodal and
multisensory DSC units. DSC units of similar modality selectivity are
colocalized by the SOM, and it is possible that such an arrangement is
superimposed on the broader spatial map in the DSC. The overall organization
of the DSC is beyond the scope of this study. The focus here is on the
percentage of multisensory DSC units produced by the SOM in a small patch of
the DSC. The proportion of unimodal to multisensory DSC units depends on
several factors, including the primary weight threshold
u, the proportion of modality-specific to
cross-modal targets, and the information content of the primary inputs.
The SOM in stage one causes the primary weight vectors to represent the
primary input vectors by distributing the weight vectors approximately evenly
among the input vectors. There are 100 primary weight vectors
[ui1,
ui2,
ui3], one vector for each of the 100
DSC units with activities zi(i = 1,
2,..., 100). The primary input vectors X are simply the values
[x1, x2, x3] that
are chosen randomly, on each trial, for the visual, auditory, and
somatosensory primary inputs X1, X2,
and X3. Because the primary weights determine the modality
selectivity of DSC units, factors that influence the distribution of primary
input vectors can affect the modality selectivity of DSC units established by
stage one.
Changing the information content of the primary inputs can affect the
distribution of primary input vectors, even if the proportion of
modality-specific to cross-modal targets is held constant. Primary input
vectors used to train the model during stage one and the resulting primary
weight vectors are illustrated in Figure
3. For A-C in Figure
3, the proportion of modality-specific to cross-modal targets is
two to one (ps = 0.34 and
pc = 0.17). The spontaneous activation
probability px0 equals 0.1 in
A-C, and the information content of the primary inputs is altered by
changing only the driven activation probability. The driven activation
probability px1 increases from 0.3
(Fig. 3A) to 0.6
(Fig. 3B) to 0.9
(Fig. 3C). Primary
input information content increases and ambiguity decreases as the driven
activation probability increases (Table
3). This affects the clustering of the primary input vectors.

View larger version (42K):
[in this window]
[in a new window]
|
Figure 3. Stage-one training causes primary weight vectors to cluster with primary
input vectors. The distinctiveness of the clusters depends on primary input
ambiguity. In A-C, there are twice as many modality-specific as
cross-modal targets, and the spontaneous primary input activation probability
px0 equals 0.1. The primary input becomes less
ambiguous as the driven activation probability
px1 is increased from 0.3 (A) to 0.6
(B) to 0.9 (C) (Table
3). Clusters of primary input vectors (circles) become
progressively more distinct. This causes more primary weight vectors (plus
signs) to adopt a distinctly unimodal pattern. V, Visual; A, auditory; S,
somatosensory.
|
|
The primary weight vectors, which are normalized as part of stage-one
training, are plotted as plus signs in
Figure 3. For comparison, the
primary input vectors are also normalized before they are plotted as circles.
In Figure 3A, in which
the primary inputs are the most ambiguous and have the lowest information
content, the input vectors are evenly distributed. Predominantly unimodal
inputs are located in the corners, in which primary input of one of the three
modalities is near one, whereas the other two are near zero. They are rare in
Figure 3A. The primary
weight vectors are scattered approximately evenly among the input vectors, and
almost all are located in multisensory regions. As the primary inputs are made
less ambiguous (Fig.
3B,C) and their information content goes up, the input
vectors form distinct clusters. Clusters of unimodal primary inputs are pushed
out farther into the corners. DSC units with primary weight vectors drawn into
these clusters would have predominantly unimodal response characteristics.
Figure 3 illustrates that
stage-one training produces a greater percentage of predominantly unimodal DSC
units when primary input information content is high. These results suggest
that the percentage of unimodal DSC neurons in a given species could, at least
in part, reflect the information content of the inputs it receives during the
formation and refinement of its sensory maps.
It is clear from Figure 3
that primary weight vectors fall into clusters that are predominantly
unimodal, bimodal, or trimodal. Although the primary weights from some inputs
may be very small, none are zero. The DSC units could all be considered
trimodal, because their weights are nonzero from all three primary inputs.
Designating all DSC units as trimodal, however, would obscure the fact that
primary weights are distributed throughout the input space in predominantly
unimodal, bimodal, and trimodal regions. To alleviate this problem, small
weights are pruned after stage-one training. Pruning is accomplished by
setting to zero all primary weights uij that are
less than the primary weight threshold
u. Pruning
corresponds to a process of activity-dependent synapse elimination, such as
that described for the formation of retinotopic maps in the superior
colliculus (and optic tectum) and in other processes
(Katz and Shatz, 1996
;
Lichtman et al., 1999
). As a
result of the removal of weak primary weights, many DSC units become unimodal
or bimodal. The effects of pruning vary depending on both
u and modality-specific target probability
ps (where pc =
- ps). The effect of changes in these
two variables on the percentage of multisensory (bimodal and trimodal) DSC
units produced by stage-one training is shown in
Figure 4. Multisensory DSC
units are those that have nonzero weights from two or three primary inputs
after pruning.
Figure 4 shows that the
percentage of multisensory DSC units decreases as
u
increases. This happens simply because more primary weights are eliminated as
the threshold increases. The DSC is 100% multisensory for low values of
u, regardless of modality-specific target
probability. However, the percentage of multisensory DSC units falls faster
with increases in
u as modality-specific target
probability increases (and as cross-modal target probability decreases). This
result confirms the expectation that stage one will establish more
multisensory connections when training involves a greater number of
cross-modal targets.
The data in Figure 4 are
generated using primary inputs of intermediate ambiguity
(px0 = 0.1 and
px1 = 0.6). The results are
qualitatively similar when the primary inputs are made more ambiguous by
decreasing the driven activation probability
px1 to 0.3 or made less ambiguous by
increasing it to 0.9 (data not shown). The main difference is that the
decrease in the percentage of multisensory DSC units as
u increases, at all levels of modality-specific
target probability, is somewhat slower with more ambiguous primary input and
faster with less ambiguous primary input.
The results suggest that the percentage of multisensory DSC neurons in a
particular species may depend on several factors, including the proportion of
cross-modal targets it encounters in its particular environmental niche. Cats,
which hunt at night, may encounter more cross-modal targets than monkeys,
which forage during the day. This likely difference in the proportion of
cross-modal targets between cats and monkeys may explain why cats have a
higher percentage of multisensory DSC neurons than monkeys
(Wallace and Stein, 1996
;
Wallace et al., 1996
,
1998
).
It is also possible that sensory systems are noisier in cats than in
monkeys. As such, sensory input to the DSC would be more ambiguous, and carry
less target information, in cats than in monkeys. The model suggests that such
a difference, if present, could contribute to the difference in the percentage
of multisensory DSC neurons between cats and monkeys. In the model, any
relative proportion of unimodal to multisensory DSC units can be obtained by
appropriate choice of primary weight threshold, primary input activation
probabilities, and proportion of modality-specific to cross-modal targets. In
the brain, self-organization of the corticotectal network probably involves
activity-independent, genetically prespecified molecular mechanisms, as well
as the activity-dependent processes modeled here. Presumably, the percentage
of multisensory DSC neurons produced by these combined processes confers
behavioral advantage to a species, considering such factors as its
environmental niche and the properties of its sensory systems.
Simulating the parietal projection to the DSC
The corticotectal circuitry that gives rise to MSE in the model is
consequent on model architecture and on training during stage two of the
two-stage algorithm. Stage two is based on a novel
correlation-anti-correlation rule. Experimental observations on MSE guided the
design of the model and of the correlation-anti-correlation rule.
The parietal neurons that produce MSE in the DSC are themselves unimodal
(Wallace et al., 1993
). If
unimodal parietal neurons directly excited DSC neurons, then inactivation of
the relevant parietal neurons should substantially reduce modality-specific as
well as cross-modal DSC neuron responses. For many DSC neurons, however,
parietal inactivation reduces MSE with little or no effect on
modality-specific responses (Wallace and
Stein, 1994
; Jiang et al.,
2001
). The model therefore postulates an indirect, modulatory
mechanism whereby inputs representing unimodal parietal projections could
produce enhancement of cross-modal but not modality-specific responses. In the
model, primary inputs directly excite DSC units and represent inputs from a
variety of subcortical and cortical structures. Modulatory inputs,
representing parietal projections only, do not directly excite DSC units but
act by augmenting primary inputs. In some studies, parietal inactivation was
found to reduce modality-specific responses in some DSC neurons
(Clemo and Stein, 1986
;
Meredith and Clemo, 1989
;
Wallace and Stein, 1994
). This
can easily be accounted for by postulating that parietal cortex is a source of
primary, as well as modulatory, input to some DSC units.
Constraints on learning, in addition to constraints imposed by model
architecture, ensure that the performance of the trained model conforms to
experimental observation. For modulatory inputs to enhance cross-modal but not
modality-specific DSC unit responses, modulatory connections should be made
only when a modulatory input and a primary input are of different modalities.
This is the cross-modality constraint. Imposing the cross-modality constraint
alone would not be sufficient to ensure that modulatory connections in the
model are consistent with experimental observations on parietal projections to
DSC. Under the cross-modality constraint, all DSC units could still receive
every one of the three modalities, either as primary or modulatory input. All
DSC units would be trimodal, which is inconsistent with observation.
Maintenance of the DSC unit modality selectivities established during
stage-one training requires that the modalities of the modulatory connections
received by a DSC unit should match the modalities of the primary inputs
received by that unit. This is the modality-matching constraint, which is
supported by orthodromic activation studies
(Wallace et al., 1993
).
Together, the cross-modality and the modality-matching constraints ensure that
a primary input connection onto a multisensory DSC unit will receive a
modulatory connection only if the modality of the modulatory input is
different from that of the primary input but the same as that of another
primary input connection onto the DSC unit
(Fig. 1B). Modulatory
connections, established by the correlation-anti-correlation rule as designed,
are successfully restricted by these constraints over broad ranges of model
parameters.
The correlation-anti-correlation rule (see Materials and Methods) can be
summarized as follows. If a DSC unit and a modulatory input are both active,
then decrease the modulation of active primary inputs and increase the
modulation of inactive primary inputs. If a modulatory input is active but a
DSC unit is inactive, then decrease the modulation of all primary inputs. The
critical parameters for stage-two training include the threshold
x for the primary inputs,
y for the modulatory inputs, and
z for the DSC units. These thresholds are needed
for the algorithm to decide whether or not the associated model elements are
active. For the primary and modulatory inputs, thresholds are set at the
integer nearest the intersection points of the corresponding spontaneous and
driven likelihoods (see Materials and Methods). Stage-two training depends on
the spontaneous and driven activation probabilities of the primary
(px0,
px1) and modulatory
(py0,
py1) inputs, both because they
determine input likelihoods and because they affect correlations among inputs
and DSC units that in turn affect the behavior of the
correlation-anti-correlation rule. The DSC unit threshold
z cannot be set on the basis of likelihoods because
the likelihood distributions of DSC unit responses are not known. Stage two
also depends on modality-specific target probability
ps. These factors interact in a complex way, but
certain regularities in the operation of stage two can be identified.
Figure 5 plots numbers of
DSC units receiving nonzero modulatory weights for those trained networks in
which all modulatory connections respect the cross-modality and
modality-matching constraints. The primary input spontaneous activation
probability px0 is fixed at 0.1 in
A-C. The primary input is made less ambiguous by increasing the
driven activation probability px1 from
0.3 (Fig. 5A) to 0.6
(Fig. 5B) to 0.9
(Fig. 5C). The primary
input threshold
x is correspondingly increased from
4 to 6 to 10. Stage two produces large numbers of allowed modulatory
connections for
z values
0.2, regardless of
primary input ambiguity. For the less ambiguous primary inputs
(px1 = 0.6 and
px1 = 0.9), allowed modulatory
connections fail to develop when modality-specific target probability
ps is lower than
0.2. The dependency on
ps is not as critical for the most ambiguous
primary input (px1 = 0.3). The
unavoidable errors in deciding primary input activation in the ambiguous case
may actually work to advantage, but only for values of
z
0.2. The region over which stage two
produces large numbers of allowed modulatory weights (those that respect the
constraints) grows larger as the ambiguity of the primary input decreases.
This is attributable to an improved ability to decide DSC unit activation in
the less ambiguous networks.

View larger version (28K):
[in this window]
[in a new window]
|
Figure 5. The abundance of correct modulatory weights resulting from stage-two
training depends sensitively on DSC unit threshold
z and on modality-specific target probability
ps. In A-C, the spontaneous primary
input activation probability px0 equals 0.1. The
primary input becomes less ambiguous as driven activation probability
px1 is increased from 0.3 (A) to 0.6
(B) to 0.9 (C) (Table
3). The primary input threshold x is
increased from 4 (A) to 6 (B) to 10 (C). In
A-C, the modulatory input spontaneous and driven activation
probabilities py0 and
py1 are 0 and 0.1, respectively, and the
modulatory input threshold y is 0.
Modality-specific target probability ps is varied
from 0 to 0.5 in steps of 0.025. Ten networks receive stage-one training for
5000 iterations at each ps value. Primary weights
are pruned at u = 0.4. DSC unit activity
threshold z is varied from 0 to 1 in steps of 0.05.
Each of the 10 stage-one trained networks, at each
ps value, receives stage-two training for 5000
iterations at each z value. This yields 10 trained
networks for each combination of ps and
z. Sets of 10 containing any misdirected modulatory
weights (i.e., modulatory weights not respecting the modality-matching and
cross-modality constraints) are excluded. For sets of 10 containing no such
errors, the mean number of DSC units receiving modulatory connections is
computed. Each panel plots the number of units, in error-free networks, that
receive modulatory input. For px1 = 0.3
(A) stage-two works best when 0.2  z
0.3 and ps 0.15, for
px1 = 0.6 (B) when 0.2
 z 0.55 and ps
0.23, and for px1 = 0.9 (C) when
0.2  z 0.8 and
ps 0.23. The number of error-free networks
is greater for unambiguous than for ambiguous primary inputs.
|
|
In Fig. 5A-C, the
spontaneous and driven activation probabilities for the modulatory inputs are
py0 = 0 and
py1 = 0.1, and the modulatory input
threshold is
y = 0. The ability of stage two to
produce allowed modulatory connections is insensitive to the actual values of
the modulatory input spontaneous and driven activation probabilities, so long
as the modulatory likelihoods are well separated and decisions concerning
modulatory input activation are reliable. These results demonstrate that the
production of allowed modulatory connections using the
correlation-anti-correlation rule depends on reliable decisions concerning
input and DSC unit activation. The correlation-anti-correlation rule is robust
when reliable activation decisions can be made.
The ability of the correlation-anti-correlation rule to produce modulatory
connections that are consistent with experimental observations on the
projection from parietal cortex to DSC is illustrated in
Table 4. This table presents
the modulatory connectivity produced by the model under two conditions
(Table 4, top, middle) and
compares it with experimental results on descending parietal connections to
DSC neurons (Table 4, bottom)
from an orthodromic activation study
(Wallace et al., 1993
). The
columns of each section, labeled at the top, indicate the seven possible
modality selectivities of DSC units (or neurons), as classified by the
modalities of their primary inputs (or by the modalities to which the neuron
responds, for the experimental data). The rows of each section, labeled at the
left side, indicate the eight possible sets of unimodal modulatory inputs (or
descending parietal inputs, for the experimental data). The number of units
(or neurons) receiving the designated combinations of input are indicated as a
percentage of the total number of DSC units in the model (or of the total
number of neurons recorded, for the experimental data). For the model, DSC
unit numbers are determined on the basis of 10 runs. The total percentage of
units (or neurons) of each modality selectivity is indicated in the last row
of each section. The total number of units (or neurons) receiving each set of
modulatory (or descending) inputs is indicated in the rightmost column of each
section. Wallace et al. (1993
)
reported the descending projections to DSC from two visual parietal
structures: the lateral suprasylvian sulcus (LS) and the anterior ectosylvian
visual area (AEV). To facilitate comparison with model results, data from
these two structures have been grouped together as visual in
Table 4, bottom.
For the model results (Table
4, top, middle), the primary input activation probabilities are
px0 = 0.1 and
px1 = 0.6, and the modulatory input
activation probabilities are py0 = 0
and py1 = 0.1. The modality-specific
target probability is set at ps = 0.34
(pc = 0.17). The stage-one pruning threshold is
set at
u = 0.4, and the stage-two thresholds are
set at
x = 6,
y = 0, and
z = 0.2. In the first network
(Table 4, top), stage-two
training is run for 5000 iterations and produces all of the allowed modulatory
connections with no errors. In the second network
(Table 4, middle), stage-two is
run for only 50 iterations, and not all of the allowed modulatory connections
are made. In some cases, bimodal DSC units receive a modulatory connection
from only one or the other of the modulatory inputs that could connect to them
or receive no modulatory connection at all. Likewise, trimodal DSC units
sometimes receive modulatory connections from only two of the three modulatory
inputs that could connect to them. This pattern of absent connections is
consistent with experimental observation
(Table 4, bottom). There is one
notable difference between the modeling results of
Table 4, middle, and the
experimental results of Table
4, bottom. Some unimodal DSC neurons apparently receive input of
the same modality from parietal cortex. As suggested above, it is possible
that these inputs would be primary rather than modulatory.
In the model, a modulatory input that fails to provide an allowed
modulatory connection is often associated with a weak primary input of the
corresponding modality. This results because the weak primary input usually
fails to activate the DSC unit (i.e., bring its activity over threshold
z) when it alone is activated by a
modality-specific target. The consequence is that the DSC unit and the
modulatory input of that modality are not consistently active together, and
that modulatory input cannot establish connections to inactive primary inputs
of other modalities. Bimodal DSC neurons have been studied that cannot be
activated by input of one modality but show enhancement if input of that
modality is presented with input of a different modality
(Meredith and Stein, 1986a
;
Stein and Meredith, 1993
). The
presumption might be that the weaker modality provides the modulatory input.
The model does not exclude this possibility. Instead, it predicts the
existence of bimodal DSC neurons for which the stronger modality provides both
strong primary and strong modulatory input, and enhancement occurs as a result
of strong modulation of the weaker primary input in the event of a cross-modal
stimulus. This prediction should be testable using available experimental
techniques.
Information gain attributable to stage-one and stage-two
training
It has been shown theoretically that the SOM algorithm not only forms maps
but also causes output units to extract information from their inputs
(Linsker,
1988a
,b
).
Training with stage one (the SOM), and to a lesser extent stage two, causes
the DSC to extract a substantial amount of target information from its inputs.
Information gain by the DSC depends on the percentage of multisensory DSC
units and actually decreases as the percentage of multisensory DSC units
increases past a certain level.
The DSC response to a target is characterized simply as the number
of
DSC units that, on target presentation, show activity exceeding threshold
I (see Materials and Methods). A value of
I = 0.3 is chosen, although the results are similar
over a range of
I. The target information gain, or
the mutual information I(T;
) between the target and
the number of suprathreshold DSC unit activities, is computed (Eq. 18) and
compared for various network configurations.
The corticotectal model can be used to explore the relationship between
target information gain and the relative proportion of unimodal to
multisensory DSC units. The model is retrained 10 times from a random initial
condition. Manipulating the primary weight threshold
u varies the percentage of multisensory DSC units
as a result of stage-one training. The threshold
u
is increased in steps of 0.05 to produce percentages of multisensory DSC units
ranging from 0 to 100%. Stage-two training follows but does not alter the
modality selectivity established for DSC units during stage-one (see above).
Target information gain at the DSC is computed before and after stage-two
training. The results are plotted in Figure
6.

View larger version (27K):
[in this window]
[in a new window]
|
Figure 6. Two-stage training causes the DSC to extract most of the target information
content of the primary inputs, especially when the DSC contains a mixture of
unimodal and multisensory units. The percentage of multisensory DSC units is
varied by manipulating the primary weight threshold
u. Ten networks receive stage-one training for 5000
iterations (px0 = 0.1,
px1 = 0.6, ps =
0.34, and pc = 0.17). Each network is then pruned
using u varying from 0 to 1 in steps of 0.005. This
produces mean percentages of multisensory DSC units over the 10 networks
ranging from 0 to 100%. Each pruned network receives stage-two training for
5000 iterations (py0 = 0,
py1 = 0.1, x = 6,
y = 0, and z = 0.2). For
each of the 10 networks associated with each u
value, both before and after stage-two training, the mutual information
between the target and the number of suprathreshold DSC unit responses is
computed (Eqs. 17 and 18; I = 0.3). The mean
information gain after stage-one and stage-two training is plotted against the
mean percentages of multisensory units. The mutual information between the
target and the primary inputs (2.27 bits; dashed line; Eq. 5) is nearly as
high as the information content of the target (2.32 bits; dot-dashed line; Eq.
4). Stage-one training causes the DSC to extract a large amount of target
information (triangles), and stage-two (stars) causes a small increase in this
amount. The increase is significant when the percentage of multisensory units
is 60% or larger (t test, 0.05 significance level). The mutual
information between target and DSC is nearly as large as the mutual
information between target and primary inputs, but only for percentages of
multisensory DSC units between 10 and 50%. The mutual information between
target and DSC decreases steadily as the percentage of multisensory DSC units
increases above 50%. This decrease in mutual information between target and
DSC approaches that of a uniformly trimodal DSC, with (0.80 bits; square) and
without (0.77 bits; circle) modulatory connections. Variability in the DSC
response after two-stage training keeps DSC information content above that of
the uniformly trimodal DSC.
|
|
For more than one-half of the range of percentage multisensory DSC units,
target information gain is almost as high as the information content of the
primary inputs. For the example explored in
Figure 6, the primary inputs
are of intermediate ambiguity, with spontaneous and driven activation
probabilities of px0 = 0.1 and
px1 = 0.6, respectively. The mutual
information between the primary inputs and the target is 2.27 bits
(Table 3). The information
content of the target is 2.32 bits. Thus, the primary input in this case
contains almost complete target information. The modulatory inputs have
spontaneous and driven activation probabilities of
py0 = 0 and
py1 = 0.1, respectively. The mutual
information between the modulatory inputs and the target (1.74 bits) is lower
than that of the primary inputs in this case. Modulatory inputs can increase
the estimate of DSC information gain by producing MSE and helping DSC unit
activities exceed threshold
I. Stage two provides a
small increase in information gain that is significant when the percentage of
multisensory DSC units is 60% or larger (t test, 0.05 significance
level).
The most striking feature of the plot in
Figure 6 is that target
information gain at the DSC is highest for percentages of multisensory DSC
units between 10 and 50% and falls steadily as the percentage of multisensory
DSC units rises above 50%. Insight into this result can be obtained through
comparison with a DSC network in which all of the units are trimodal and
receive primary connections of identical weight of all three modalities. To
make a uniformly trimodal DSC, all primary weights are set to
(
for all i and
j). This sets the lengths of the primary weight vectors to one, to
match the lengths of the normalized primary weight vectors produced by stage
one. Stage-two training can start from the uniform, trimodal primary weight
configuration. The target information gain of the uniformly trimodal DSC
network is only 0.77 bits without modulatory input. It increases to only 0.80
bits with modulatory input. The target information gain of the DSC network
trained from a random state with the two-stage algorithm approaches this low
level as the percentage of multisensory DSC units increases.
These results demonstrate that a uniformly trimodal DSC network,