To explore the timing and the underlying neural dynamics of visual perception, we analyzed the relationship between the manual reaction time (RT) to the onset of a visual stimulus and the time course of the evoked neural response simultaneously measured by magnetoencephalography (MEG). The visual stimuli were a transition from incoherent to coherent motion of random dots and an onset of a chromatic grating from a uniform field, which evoke neural responses in different cortical sites. For both stimuli, changes in median RT with changing stimulus strength (motion coherence or chromatic contrast) were accurately predicted, with a stimulus-independent postdetection delay, from the time that the temporally integrated MEG response crossed a threshold (integrator model). In comparison, the prediction of RT was less accurate from the peak MEG latency, or from the time that the nonintegrated MEG response crossed a threshold (level detector model). The integrator model could also account for, at least partially, intertrial changes in RT or in perception (hit/miss) to identical stimuli. Although we examined MEG–RT relationships mainly for data averaged over trials, the integrator model could show some correlations even for single-trial data. The model predictions deteriorated when only early visual responses presumably originating from the striate cortex were used as the input to the integrator model. Our results suggest that the perceptions for visual stimulus appearances are established in extrastriate areas [around MT (middle temporal visual area) for motion and around V4 (fourth visual area) for color] ∼150–200 ms before subjects manually react to the stimulus.
Although our understanding of visual perception is progressing rapidly, we still do not know exactly when and where visual perception is established in the human brain. Psychophysically, the timing of human visual perception has been studied by observing simple reaction time (RT), defined as the interval between the onset of a target stimulus and the initiation of the subject's immediate motor response to the target appearance. Importantly, RT reflects the completion of a perceptual process in the sense that a sensory organ reaches a decision to trigger a motor response. However, the decision timing is not easy to specify, because the time required for postperceptual processing (including motor response) is generally unknown. A promising way to overcome this limitation is to associate RT with visually evoked human brain activities, which can be measured noninvasively with electroencephalography (EEG) or with magnetoencephalography (MEG).
Previous attempts to relate RT with EEG or MEG responses generally failed in quantitatively predicting RT from the peak latency of the evoked response (Osaka and Yamamoto, 1978; Musselwhite and Jeffreys, 1985; Mihaylova et al., 1999; Patzwahl and Zanker, 2000; Kawakami et al., 2002; Vassilev et al., 2002). These results are not disappointing, however, given that there is no theoretical reason to expect the peak latency to correspond to the completion of a perceptual process. The present study newly investigated the two models of neural decision with regard to their predictability of the variations in the RT of human subjects from simultaneously measured MEG responses. One is the level detector model (Grice, 1968), in which the motor response is triggered when the amplitude of the neural activity exceeds a threshold level. The other is the integrator model (Carpenter, 1981; Hanes and Schall, 1996; Cook and Maunsell, 2002), in which the motor response is triggered when the temporally integrated magnitude of the neural activity exceeds a threshold level.
We examined the neural activities presumably originating from different cortical sites evoked by different visual attributes—the onset of coherent motion from incoherent motion of random dots, which evokes late MEG component (latency, 200–300 ms) localized in the extrastriate cortex in the vicinity of the middle temporal visual area (MT)/fifth visual area (V5) (Bundo et al., 2000), and the onset of a chromatic grating from uniform field, whose early MEG component (latency, 100–140 ms) has been localized in the striate cortex, presumably the primary visual area (V1) (Fylan et al., 1997) and in the second component in V4 (the fourth visual area) (Kuriki et al., 2005). We analyzed not only the variability of RT depending on stimulus parameters (motion coherence and chromatic contrast) but also the intertrial variability across different responses (e.g., hit and miss) for identical stimuli. Although the MEG–RT relationship was examined mainly for data averaged over trials, single-trial correlations were also tested.
We found that the integrator model could almost perfectly account for the stimulus-dependent variability in RT and at least partially for the intertrial variability. Our results suggest that the accumulated activities in the visual areas (especially extrastriate areas) give the plausible timing of visual perception, which is ∼150–200 ms before the subjects manually react to the stimulus.
Materials and Methods
RT and the MEG response for coherent motion onset and chromatic grating onset were measured in experiments 1 and 2, respectively.
Eight subjects (aged 24–30 years) participated in experiment 1, and four different subjects (aged 21–44 years) participated in experiment 2. All subjects were healthy and had normal or corrected-to-normal vision.
The visual motion stimuli of experiment 1 were generated using DirectX 9.0 SDK (Microsoft, Redmond, WA), and projected by a digital light processing (DLP) projector (V-1100Z; PLUS, Tokyo, Japan) onto a translucent screen (40 × 30°) located 140 cm from the subjects. On the dark background, white random dots (80 cd/m2, each subtending 0.16 × 0.16°) were presented in the left hemifield. The right edge of the 18 × 30° stimulus area was 2° left of fixation (see Fig. 1a). The dot density was 10%. A stimulus sequence consisted of random motion followed by coherent motion. During the random motion period, all dots moved randomly in one of eight directions. During the coherent motion period, a given proportion of total dots moved horizontally, whereas the remaining dots moved randomly in any of the seven remaining directions. The proportion of coherent dots (coherence level) was 20, 40, or 80%. The direction of coherent motion was randomly chosen between rightward and leftward with equal probability to avoid adaptation to a specific direction. The speed of each dot was 8°/s (8 min jump for every 60 Hz frame). The lifetime of each dot was 50 ms (three frames). A fixation marker (+) was continuously presented at the center of the screen.
The visual stimuli in experiment 2 were vertical red–green gratings presented in the quadrant visual field (see Fig. 5a). They were generated using a VSG 2/3 stimulus generator (Cambridge Research Systems, Cambridge, UK) and projected by a DLP projector (U2-1130; PLUS) onto a translucent screen located 150 cm from the subjects. The stimulus (8.3 × 8.3°) was centered at 5.15° (horizontally and vertically) from the center of the display (26.6 × 19.7°). The red and green components of the grating and the background were kept photometrically isoluminant at 120 cd/m2. The CIE chromaticity coordinates of the background were (0.310, 0.316), and those of the grating were sinusoidally modulated along the L-M cone contrast axis (Derrington et al., 1984). The chromatic contrast of the stimulus was 20, 30, 45, 67, or 100%, defined as a percentage of the maximum value attainable with our projector [(0.361, 0.292) for red and (0.250, 0.344) for green]. The luminance and the chromaticity of the stimulus were calibrated using a spectrophotometer (CS1000; Minolta, Osaka, Japan); the average difference between the target values of the luminance and the chromaticity (see above) and the actual values used in the experiments was 1% (SD, 0.75). The spatial frequency was 1 cpd. A fixation marker (+) was continuously presented at the center of the display. For the individual subjects, the visual field in which the MEG response was the largest (determined in a preliminary experiment) was tested.
MEG recordings and measurements of RT.
In experiment 1, brain magnetic fields were recorded in a magnetically shielded room using a whole-head MEG system (PQ2440R; Yokogawa, Tokyo, Japan) with 230 axial gradiometers (∂Bz/∂z) and 70 × 3 vector sensors with one axial (∂Bz/∂z) and two planar gradiometers (∂Bx/∂z, ∂By/∂z). Data were sampled at 625 Hz with a 200 Hz low-pass filter and a 0.3 Hz high-pass filter. In a trial, random motion was presented for a duration randomly varied from 1500 to 2500 ms between trials, and then coherent motion was presented for 300 ms. The stimulus sequence of the next trial immediately started without pause. MEG responses were recorded from 200 ms before until 1000 ms after the coherent motion onset. Subjects were instructed to block a laser beam guided by an optical fiber with their fingertip and to react to the onset of coherent motion as fast as possible after they detected a stimulus by raising the finger. RT was measured as the time between the stimulus onset and the subject's response with a precision of 1 ms. The motion coherence of the stimulus was varied randomly across trials. For each coherence level, each subject ran 200 trials.
In experiment 2, the magnetic fields were measured in a magnetically shielded room using a whole-head 201-channel MEG system (SBI-200; Shimadzu, Kyoto, Japan). The MEG system and procedures have been described in detail previously (Ohtani et al., 2002a). Data were sampled at 1024 Hz with a 100 Hz low-pass filter and a 1.0 Hz high-pass filter. In a trial, after a uniform prestimulus period lasting for 1500–2500 ms, a grating stimulus was presented for 500 ms. MEG responses were recorded from 100 ms before until 923 ms after the stimulus onset. Subjects started each trial by covering the end of the optical fiber (used as a switch) with their fingertip and responded by opening the optical fiber as fast as possible after they detected a stimulus. RT was measured as the time between the stimulus onset and the subject's response with a precision of 1 ms. The chromatic contrast of the stimulus was varied randomly across trials. For each contrast level, each subject ran 100 trials.
Neither experiment used strict procedures to prevent subjects from predicting onset timing, such as using catch trials or an exponential signal onset distribution. Therefore, there may be a concern that, when the prestimulus period was longer than expected, subjects would attempt to make manual responses as fast as possible by guessing the probable timing of the stimulus onset without waiting for the actual motion onset. However, because the false alarm rate (proportion of trials in which subjects made a manual response before motion onset) was very low (see Results), it was very unlikely that the subjects took such a strategy.
For each subject, the median RT of each condition was compared with the latency predicted from the models applied to the MEG response of the corresponding trials. We tested four models: the peak detector model (a stimulus is detected when the MEG response reaches its peak amplitude); the level detector model (a stimulus is detected when the MEG response exceeds a given threshold); the full integrator model (a stimulus is detected when the fully integrated MEG response exceeds a given threshold); and the leaky integrator model (a stimulus is detected when the leaky integrated MEG response exceeds a given threshold). The models assume that changes in stimulus amplitude do not influence the mean time required for the postdetection process (Miller et al., 1999). There was no free parameter in the peak detector model. For the latter three models, we estimated the threshold value for each subject that best accounted for the simple RTs in the sense that the slope of the regression line between the RT and detection latency was closest to 1.
The models used the time course of the root mean square (RMS) of all sensor channels or the visual response component extracted by the signal space projection (SSP) analysis (Tesche et al., 1995). For SSP, two signal space components (spatial distributions of MEG responses) were used, each of which was expected to dominantly represent visual and motor responses. Unless otherwise stated, the former was defined by the spatial distribution of the MEG responses averaged over trials with respect to the stimulus onset, evoked by the strongest stimulus, and obtained at the RMS peak latency or at 160 ms only for chromatic stimuli. The latter was defined by the spatial distribution of the MEG responses averaged with respect to finger movement [response (or reaction)-averaged MEG] and obtained at its RMS peak latency (around the time of manual response). The time course of each component was calculated by projecting a MEG response in each stimulus condition onto the signal space defined by the two components.
To minimize the contribution of small noisy fluctuations appearing before the onset of stimulus-evoked response, the full and leaky integrator models integrated response values (RMS or SSP component) only when the response exceeded the average plus 1 SD of the response during the pretrigger period. In addition, to minimize the contribution of ongoing noise superimposed on the signal, the model integrated [RMS(t)2 − RMS(pretrigger average)2]1/2 or [SSP(t) − SSP(pretrigger average)].
Predicting stimulus-dependent variation of RT to coherent motion onsets
In experiment 1, we measured simple manual RTs and MEG responses of eight human subjects to the onset of coherent motion from incoherent random motion (Fig. 1a). Previous studies suggest that a motion stimulus of this type selectively activates higher visual areas, including MT/V5 (Bundo et al., 2000; Lam et al., 2000; Maruyama et al., 2002; Nakamura et al., 2003).
When the subjects made a manual response between 200 and 800 ms from the onset of a coherent motion, we considered that they correctly detected the stimulus (hit). The average and SDs across subjects of the percentage of the hit trials were 31.5 ± 16.5% for 20% coherence, 83.5 ± 11% for 40% coherence, and 94.5 ± 7.8% for 80% coherence. The error type was mostly “miss,” and the false alarm rate was <1%.
Figure 1b shows typical superimposed waveforms and RT histograms (subject K.A.). Although the MEG response for the 20% coherence was relatively small, those for the 40 and 80% coherences had two or three prominent peaks. The first peak, whose latency was ∼300 ms, can be regarded as the response evoked by motion onset, whereas the second peak, whose latency (430–500 ms) nearly corresponded to the median RT, probably reflects the neural activities related to finger movement.
Because the number of hit trials was significantly smaller for 20% coherence than for 40 or 80% coherence, the averaged MEG response for this condition was noisier than the others. This was found to degrade the prediction of RT when we used the time course of RMS values (shown by black curves below the superimposed waveforms) as the model input (supplemental Fig. 1, available at www.jneurosci.org as supplemental material). To reduce the effect of noise, we applied SSP (Tesche et al., 1995) on the MEG response obtained for each coherence level. Another merit of using this technique is that it enables us to selectively extract visually evoked component from the total evoked MEG response. As basis vectors of SSP, we used two signal space components, one for visual response and the other for motor response. The former component was defined as the spatial distribution of the MEG response for 80% coherence at the first RMS peak latency. The iso-contour map (in the red frame in Fig. 1b) suggests activation in the temporo-occipital areas of both hemispheres, which is consistent with previous MEG (Bundo et al., 2000; Lam et al., 2000; Maruyama et al., 2002; Nakamura et al., 2003) and functional magnetic resonance imaging studies (Rees et al., 2000). The latter component was defined as the spatial distribution of the peak MEG response averaged with respect to manual responses. The iso-contour map (in the blue frame) suggests activation in the left motor or somatosensory area (subjects reacted using a right-hand finger). In Figure 1, red and blue curves show the time course of visual and motor components, respectively.
To link RT with the time course of MEG response, the four models were applied to the visual response extracted by SSP. Figure 2a shows the time course of the visual response for each coherence level. In the peak detector model, the detection latency is defined as the latency of the first peak observed between 200 and 400 ms, which is indicated by short vertical lines around peaked responses in Figure 2a. In the level detector model (Grice, 1968), coherent motion onsets are detected when the visual response crossed a threshold. A horizontal blue line shows the threshold of the level detector model, which was optimized to best account for the variation in RTs. In the full integrator model (Carpenter, 1981; Hanes and Schall, 1996; Cook and Maunsell, 2002), coherent motion onsets are detected when temporally integrated visual responses without any leaks cross a threshold. Figure 2b shows the time courses of the fully integrated visual response for each coherence level, with a red horizontal line showing the optimized threshold of the integrator model. We also tested the leaky integrator model (Usher and McClelland, 2001; Cook and Maunsell, 2002), in which MEG responses were convoluted with a low-pass exponential filter exp(t/τ). We fixed τ at 100 ms (Cook and Maunsell, 2002), but the model performance was not so sensitive to this parameter. Note also that the level detector and full integrator models correspond to the two extreme cases of the leaky integrator model (τ = 0 and ∞ ms, respectively). Figure 2c shows the time courses of the output of the leaky integrator, with a purple horizontal line showing the optimized threshold.
Figure 2d shows the relationship between RT and detection latencies estimated by the four models. For all of the models, we assume that a manual response follows stimulus detection with a postdetection delay that is not affected by stimulus amplitude. Therefore, if a model could successfully predict RT, the slope of RT versus the predicted latency function should be 1, and the postdetection delay should not be a negative, nor unrealistically small or large positive, value. Figure 2d shows that, for this subject, the slope of the regression line was smaller than 1 for the peak detector model (0.231) and for the level detector model (0.540), whereas it was close to 1 for the full integrator model (1.01) and the leaky integrator model (0.861).
Figure 2, e and f, shows the slopes of the regression line and postdetection delays for all subjects. The delay value was estimated as the difference between the actual RT and the detection latency estimated by each model, averaged over all thee coherence levels. For the peak detector model, the slopes were much smaller than 1 (0.190 ± 0.092), which is in agreement with previous reports (Osaka and Yamamoto, 1978; Musselwhite and Jeffreys, 1985; Mihaylova et al., 1999; Patzwahl and Zanker, 2000; Kawakami et al., 2002; Vassilev et al., 2002). For the level detector model, the slopes were increased but were not sufficiently close to 1 (0.810 ± 0.122). In contrast, for the full integrator model, the slope was nearly 1 (0.977 ± 0.060), and the estimated postdetection delays converged to a reasonable range (169 ± 17.5). The leaky integrator model was also successful (slope, 0.960 ± 0.132). These results indicate that the full or leaky integrator models can successfully predict the variation in RT to motion onset with changing stimulus strength (motion coherence) from the MEG response.
Predicting intertrial variation of RT to coherent motion onsets
The RT changed from one trial to another even when the stimulus parameter (motion coherence) was kept constant. To test whether such an intertrial variation can also be predicted from the MEG response, we classified MEG responses for each coherence level according to the behavioral response. For 20% coherence, which was detected in ∼30% of the trials, the trials in which subjects correctly detected the stimulus onset (hit trials) and those in which subjects could not detect the onset (miss trials) were separately averaged. For 40 or 80% coherence, which was detected in most of the trials, the MEG responses in trials in which RTs were shorter than the median RT and those in trials in which they were longer than the median RT were separately averaged. The SSP-extracted visual component of the classified MEG response for subject K.A. is shown in Figure 3a. It was analyzed by the full integrator model (Fig. 3b) and by the leaky integrator model (Fig. 3c). The threshold for each model (indicated by horizontal lines) was the same as that estimated to account for the stimulus-dependent variation in RT. The prediction of the intertrial variation can be improved, although only slightly, if we allow readjustment of the threshold parameter. However, we used a common threshold to show how the models with the same parameter values predict the stimulus-dependent and the intertrial variability of RT.
Figure 3, b and c, suggests that the integrator models can explain intertrial variation in the RT of subject KA. As for the response to 20% coherence, the visual response was larger for the hit trials (Fig. 3a, solid purple curve) than for the miss trials (broken purple curve), and for both integrator models, the response for the hit trials exceeded the threshold, whereas that for the miss trials did not. This relationship was found in the results of five of the eight subjects, including K.A. (The model could account for the results for another subject if we allowed readjustment of the threshold parameter.)
As for the response to 40 or 80% coherence, the MEG amplitude was larger and the detection latencies estimated by the full or leaky integrator models shorter for the shorter RT trials (solid yellow or green lines) than for the longer RT trials (broken yellow or green lines). Although the estimated latency differences (short solid and dotted vertical lines of the same color) were quantitatively smaller than those for actual RT (long solid and dotted vertical lines of the same color), this does not necessarily mean a failure of the models. Because the intertrial variation in RT arises not only in the detection process but also in the postdetection process, it is theoretically impossible for the models to fully account for the difference between the shorter and longer RT groups. To be more specific, one cannot tell whether a trial classified into the longer RT group indeed had a long detection latency or just a long postdetection delay. Because our models do not take into account the variation in the postdetection delay, they tend to underestimate the intertrial variation as the relative contribution of the postdetection variation increases.
To evaluate the amount of intertrial variation that the models can account for, we calculated the ratio of the estimated perceptual latency difference between the shorter-RT group and the longer-RT group to the actual RT difference between the two groups. As noted above, perfect estimation of the perceptual time variation does not necessarily result in a unit value. Figure 3d shows that, for both the full and leaky integrator models, the ratios averaged over subjects were ∼0.45 for 40% coherence and ∼0.3 for 80% coherence. This implies that the integrator models can account for, at least partially, the intertrial variation in RT from MEG responses. Reduction of the ratio for the stronger stimulus might be ascribable to the reduction of the variation in the perceptual time, which in turn increases the relative contribution of the variation in postdetection delay, given that the variation of postdetection delay is not strongly affected by stimulus amplitude. Note that the total RT variation decreased as the stimulus coherence was increased. (See distributions of RT in Fig. 1b.)
Another factor that might have affected the intertrial variability of RT was a change in the hazard rate (the probability that the signal was about to occur) across trials. Because we randomly determined the onset timing of each trial from a uniform distribution without using catch trials, the hazard rate increased over time during the prestimulus (random motion) period. This could result in an increase in the degree of subjects' expectancy of stimulus occurrence for trials that happened to have relatively long prestimulus periods. The intertrial variation in the hazard rate could have introduced additional variations in manual RT (Luce, 1986), as well as that in cortical responses (Janssen and Shadlen, 2005), which the present models may not be capable of dealing with. Note, however, that the effect of the hazard rate is expectedly small (Luce, 1986; Janssen and Shadlen, 2005). In addition, because there was no difference in the hazard rate among different coherence levels, this factor did not affect the stimulus-dependent change in median RT discussed in the last section.
Predicting total variation in RTs to coherent motion onsets
We also tried predicting the total RT variation by binning MEG responses from all the coherence conditions based on RT, following the analysis by Cook and Maunsell (2002). Specifically, the four models were used to predict the RT variation across five bins from the visual response extracted by SSP. The results (shown in supplemental Fig. 2, available at www.jneurosci.org as supplemental material) indicate that averaged slopes of regression line for full and leaky integrator models (0.793 and 0.729, respectively) were slightly lower than those of stimulus-dependent variation, but higher than those of intertrial variation. This is a sensible result because the total variation consisted of these two components. The reason we could not obtain prediction performance as good as that reported by Cook and Maunsell (2002) is not clear at present, but we suspect it is not only because we used MEG rather than single unit recording but also because RT variation of our human subjects to 20–80% coherence change was only about one-half of that of monkey to a 12.5–25% change, which might have reduced the relative contribution of the predictable stimulus-dependent variation in the total variation in the present case.
Predicting RTs of single-trial responses to coherent motion onsets
So far, we have been analyzing the MEG responses averaged over trials. Although the MEG response in a single trial was very noisy, we found that the full and leaky integrator models were powerful enough to reveal clear correlations between the RT and MEG response even for single-trial data. The extraction of the visual component by SSP effectively improved the signal-to-noise (S/N) ratio of single-trial MEG responses. The relationship between RT and detection latency for each single trial is shown in Figure 4, a (full integrator model) and b (leaky integrator model). The results show that both the stimulus-dependent variability and intertrial variability of RT could be predicted from the integrator models, and the slope of the regression line was significantly larger than zero not only for all trials but also for each motion coherence (Fig. 4c,d). Figure 4, e and f, shows the proportion of hit trials in which integrated MEG responses crossed a threshold against the proportion of miss trials in which integrated MEG responses crossed a threshold, for each motion coherence. For this analysis, we used the same threshold used for the analysis of RT. Most of the points are above the line of y = x, indicating that the MEG response obtained in each single trial could also predict whether subjects would make a manual response (hit/miss), although the power of discrimination was not high. Calculation of the probability that an ideal observer could accurately discriminate hit/miss of each trial on the bases of MEG responses (cf. Britten et al., 1996) also suggested some predictability of subjects' responses (for details, see supplemental Fig. 3, available at www.jneurosci.org as supplemental material).
The successful prediction of RT and hit/miss from single-trial (not-averaged) MEG is an important step toward the application of this technique to the brain–computer interface (BCI). (See Discussion for details.)
Predicting stimulus-dependent variation of RT to chromatic grating onsets
In experiment 2, we measured the manual RTs and the MEG responses of four human subjects to the onset of a chromatic grating (Fig. 5a). It has been suggested that color is processed by neural substrates separate from those processing motion (Zeki, 1978, 1993). It is also known that a chromatic grating evokes a stable MEG response (peak at 100–140 ms) localized in the striate cortex (Fylan et al., 1997; Anderson et al., 1999; Ohtani et al., 2002b; Kuriki et al., 2005). Because the chromatic grating onset evokes cortical activity that is different in both its pathway and anatomical hierarchy from that evoked by the coherent motion onset, it is a good stimulus for testing the general applicability of the models.
For all of the chromatic contrast levels, the subjects detected the grating onsets (i.e., made a manual response between 200 and 800 ms from the onset of a stimulus) in almost all trials (>98%). The false alarm rate was <1%.
Figure 5b shows the typical averaged MEG responses (superimposed waveforms of all sensors, and the time course of RMS (black) and SSP components (red and blue) and RT histograms for each contrast level (subject Y.O.). The most prominent peak (M1) of the RMS value was found at the latency of 100–120 ms. The dipole analysis suggests that this activity mainly reflects neural responses in the primary visual area (Ohtani et al., 2002b). Subsequent peaks appearing at 150–300 ms presumably reflect neural activities in higher visual areas (Kuriki et al., 2005).
As in experiment 1, we tested how well the four models can predict RT from MEG response. The model input was the time course of the RMS values. Unlike in experiment 1, because the hit rate was nearly perfect even for the lowest chromatic contrast, the number of averaged trials, and thus the S/N of RMS values, was similar for different stimulus conditions.
Figure 6a shows the typical time course of the RMS value (subject Y.O.). The first prominent peak corresponds to the perceptual latency of the peak detector model. Figure 6, b and c, shows the cases in which the full and leaky integrator models were applied. In Figure 6d, the detection latency estimated by each model is compared with the actual RT. For this subject, the slope of the regression line was much smaller than 1 for the peak and level detector models, but it was very close to 1 for the full and leaky integrator models.
Figure 6, e and f, shows the slopes of the regression line and postdetection delays for all subjects. For the peak detector and level detector models, the slopes were too shallow for most of the subjects. For the full and leaky integrator models, the slope was close to 1 for all subjects. The postdetection delay for the leaky integrator model (164 ms) was close to the value estimated in experiment 1, whereas that for the full integrator model was significantly smaller (80.7 ms). This indicates that the leaky integrator model best accounted for the variation in RT to chromatic gratings.
We also examined the case in which the model input was the initial visual component of the MEG response extracted by SSP. As basis vectors of projection, we used the spatial distribution of the stimulus-averaged MEG response for 100% contrast at RMS peak latency of ∼100 ms and that of the peak MEG response averaged with respect to manual response. Figure 7a shows the time course of the SSP-extracted response (subject Y.O.). The result for each model (Fig. 7b,c) suggests that even for the best model (the full integrator model), the slope was smaller than 1. Given that early visual response originating mainly in V1 was extracted from the evoked MEG response (Fylan et al., 1997; Anderson et al., 1999; Ohtani et al., 2002b), this result suggests that V1 responses alone cannot fully explain the variation in RT.
This suggestion is further supported by the second SSP-based analysis in which the basis vector for the visual component was sampled at 160 ms from the stimulus onset, not at the response peak at ∼100 ms. The extracted visual component (Fig. 7d) changes more slowly than the initial peak component (Fig. 7a). The result (Fig. 7e,f) shows a great improvement in RT prediction for both the full and leaky integrator models. The prediction performance was comparable with that obtained using RMS values as model inputs (Fig. 6). It is likely that the sampling delay of ∼60 ms significantly reduced the contribution of rapid striate responses to the SSP-extracted visual responses, whereas it increased the contribution of the following extrastriate responses. Indeed, for three of four subjects, the spatial distribution obtained at 160 ms was clearly different from that obtained at the peak latency at ∼100 ms, and a previous study (Kuriki et al., 2005) localized the dipole of the chromatic response at ∼160 ms in the extrastriate area. We chose the latency of 160 ms simply because some subjects show the secondary MEG response peak, but we confirmed that the accuracy of the model predictions was not greatly affected by a small change of this parameter.
In summary, the integrator models can accurately predict the stimulus-dependent change in RT to chromatic grating onsets from the time course of the MEG responses given it reflects activities mainly in the higher visual areas related to color processing.
Predicting intertrial variation of RT to chromatic grating onsets
To examine the correlation of the MEG response with intertrial RT variation, we separately averaged the MEG responses in trials in which RTs were shorter than the median RT and those in trials in which RTs were longer than the median RT. Figure 8, a–c, shows the raw, fully integrated, and leaky integrated visual responses, respectively, obtained in the faster and slower trials for subject Y.O. The visual responses were extracted using SSP in which the response at 160 ms was used as a basis vector. The horizontal lines indicate the thresholds estimated to account for the stimulus-dependent RT variation. For this subject, the integrated responses for the chromatic contrasts higher than 20% were larger for shorter RT (solid lines) trials than for longer RT trials (broken lines), and the detection latencies of the full and leaky integrator models were faster for the shorter RT trials. Figure 8d shows the averaged ratio of the estimated perceptual latency difference between the shorter RT group and the longer RT group to the actual RT difference between the two groups. The ratio was larger for the 20 and 30% contrast conditions. The ratio for the higher chromatic contrasts was ∼0.2. A possible interpretation of this relatively low prediction performance is that intertrial variation in this experiment arose mainly in the postdetection process.
Predicting total variation in RTs to chromatic grating onsets
The results of predicting total RT variation from the late SSP component indicated that the leaky integrator model could predict ∼60–70% of the RT variation across five bins (supplemental Fig. 4, available at www.jneurosci.org as supplemental material). The prediction ratio was lower than that of stimulus-dependent variation but higher than that of intertrial variation, which is consistent with the results of experiment 1. With regard to this analysis, the prediction was improved when the model integrated RMS response (data not shown).
Predicting RTs of single-trial responses to chromatic grating onsets
Finally, we applied single-trial analysis to the chromatic responses. The detection latency predicted from the RMS value was not correlated with RT, presumably because of the low S/N of the single-trial RMS value. When the initial visual component at ∼100 ms was extracted using SSP based on the response peak (Fig. 7a–c), the prediction showed a small but significant correlation between RT and the detection latency when the responses for all stimulus contrasts were included in the correlation analysis (supplemental Fig. 5, available at www.jneurosci.org as supplemental material). We also analyzed the visual component extracted using SSP based on 160 ms activity (Fig. 7d–f), but the result was worse, presumably because the later component is relatively small (compare Fig. 7a,c) and the S/N was not high enough.
As the strength of the visual stimulus increased, the manual RT of the human subjects to the onset of the stimulus decreased. We showed for the first time that this variation is quantitatively predicted from the difference in the time course of MEG responses, both for the onset of coherent motion (Fig. 2) and for the onset of a chromatic grating (Figs. 6, 7). We also found that variations in the MEG response are correlated with intertrial variations in RT to the same stimulus strength (Figs. 3, 8). Although we examined MEG–RT relationships mainly for data averaged over trials, we found some correlations even for single-trial data (Fig. 4 and supplemental Fig. 5, available at www.jneurosci.org as supplemental material). The key to our successful prediction is the use of a model in which the perceptual timing is defined as the time when the temporally integrated MEG response crosses a threshold. The model is similar to that used by Cook and Maunsell (2002) to account for the behavioral monkey RT to the onset of coherent motion from the time course of the population response of neurons in MT and ventral intraparietal (VIP) areas. However, given a number of differences in neural activities measured by MEG from those measured by single unit recordings, it was not obvious from previous single unit studies whether the temporal integration models were able to account for RTs from MEG responses. Successful prediction of RTs from a noninvasive measurement of human cortical activity is an important advancement for understanding the neural correlates of the perceptual decision process in the human brain. In addition, it is potentially useful for developing brain–computer interfaces based on noninvasive measurement of cortical activities.
The full and leaky integrator models in the present study predict RTs more accurately than the level detector model, which defines the perceptual latency as the time when the raw (nonintegrated) MEG response crosses a threshold. Although some EEG studies have suggested that the latency of the peak response changes in parallel with the manual RT (Vaughan et al., 1966; Jaskowski et al., 1990), most EEG/MEG studies (Osaka and Yamamoto, 1978; Musselwhite and Jeffreys, 1985; Mihaylova et al., 1999; Patzwahl and Zanker, 2000; Kawakami et al., 2002; Vassilev et al., 2002), including the present one (Figs. 2 and 6), indicate that the change in the peak latency is too small to account for the change in RT (peak detector model). Furthermore, theoretical considerations also suggest that peak detection is not a good candidate for the mechanism of perceptual decision. First, neural implementation of peak detection is not easy. Second, it is hard to decide in real time whether a given peak is a global maximum or a local one. Third, the peak latency estimation is not robust against noise. Introducing a threshold detection process can solve the first two problems because such a process is easy to implement in neural mechanisms; appropriate setting of the threshold would exclude the contribution of irrelevant local peaks. As for the noise problem, the level detector model often makes an erroneous detection in the presence of a transient noise. In comparison, the integrator models (the leaky integrator model in particular) are more robust to noise, because only the consecutively accumulated signals cross a threshold. Therefore, both the empirical evidence and theoretical considerations suggest that the integrator models are appropriate for the perceptual decision process.
From the data currently available, we cannot specify the time constant of the temporal integrator. In most cases, both the full and leaky (τ = 100 ms) integrator models predict the data equally well. We also found that a continuous change in τ affects the prediction performance only slightly (data not shown). In experiment 1, slope of the regression line for the full and leaky integrator models were very similar, whereas the leaky integrator model worked better than the full integrator model in experiment 2. Because temporal integration of neural responses without any leaks is not physiologically plausible, the leaky integrator model would be the best candidate for the visual detection mechanism in the human brain.
The integrator models make a perceptual decision based on the time course of the SSP or RMS component, which reflects population activities in a broad area. On the other hand, even if the neurons in the human brain indeed make a perceptual decision based on a similar integration rule, they likely monitor more localized neural activities. Additionally, some of those activities may not be detectable by MEG sensors because MEG is sensitive only to the neural activities tangential to the surface of the brain. Furthermore, MEG mainly measures EPSPs, rather than action potentials, although it is believed that these potentials are highly correlated. Despite these limitations, however, the present findings provide profound insights into when and where the neural evidence for a perceptual decision (visual detection) is made in the human brain.
In the stimulus used in experiment 1, a transition from random motion to coherent motion is detected by global-motion-sensitive neurons in higher motion areas (e.g., MT), but not by local-motion-sensitive neurons in lower visual areas (e.g., V1) (Newsome and Pare, 1988; Britten et al., 1992). The integrator models accurately predicted the RT variation using the time course of the SSP-extracted activity in higher motion areas. This suggests that the temporal integration of population responses in the higher motion areas is essential for a perceptual decision of the onset of visual motion. On the other hand, in the stimulus used in experiment 2, the onset of a chromatic grating is detected by chromatic-sensitive neurons in the color-processing pathway. The integrator model could not accurately predict RT using the time course of activity in lower visual areas extracted by SSP, whereas it could using the time course of SSP-extracted activity in which the extrastriate activity is presumably dominant. This also supports the importance of activity of the higher visual areas in the detection of a visual signal, although we cannot exclude the possibility of a subsidiary contribution from the lower visual areas. It would be of interest to see in future how well the integrator model can predict RTs to the onset of a chromatic stimulus that is designed to selectively activate higher visual areas.
Although the estimation of the postdetection delay was affected by model parameters, the result was in the range between 150 and 250 ms in many cases. This delay, which includes the latency for motor command, motor execution, and myoelectric activity, is in good agreement with the estimations of motor preparation delay for MT (264 ms) and VIP (196 ms) by Cook and Maunsell (2002). It is possible to interpret the present findings as suggesting that the best guess of the time that a given visual stimulus is detected in the observer's brain is ∼150–250 ms before the observer manually responds to the stimulus. This is ∼300–450 ms from the onset of coherent motion and ∼150–260 ms from the onset of chromatic grating. Additional examination of the neural activities around this timing would reveal the neural correlates of perceptual decisions. It should be noted, however, that perceptual decisions to deliver manual responses are not necessarily accompanied or followed by conscious awareness.
The successful estimation of perception from MEG responses might be used for the BCI, which is a communication system that works using brain signals, not peripheral nerves and muscles (Pfurtscheller and Lopes da Silva, 1999; Wolpaw and McFarland, 2004). Our results suggest the possibility that we might be able to replace general behavioral responses, such as keyboard or button operation, with the measurement of brain activity. Whereas the variation in RT includes not only the variation in detection latency but also that in the latency for motor execution (postdetection delay), the latency predicted by sensory MEG signals mainly reflects the detection latency. Therefore, an interface using brain responses could be more accurate and faster than one using behavioral responses. This application will be especially useful for tasks requiring urgent action. It is true that MEG is not so appropriate for a BCI in practice, because the equipment is large and expensive and does not allow users' movement, but similar predictions of behavioral timing using brain signals could be made by using EEG signals, which can be measured far more easily than MEG.
K.A. was partially supported by Grant-in-Aid for Japan Society for the Promotion of Science Fellows. N.G. was supported in part by the National Institute of Information and Communications Technology. Y.O. was partially supported by Grant-in-Aid for Scientific Research (B) 16330141.
- Correspondence should be addressed to Kaoru Amano, Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan. Email: