Abstract
Temporal integration of visual motion has been studied extensively within the frontoparallel plane (i.e., 2D). However, the majority of motion occurs within a 3D environment, and it is unknown whether the principles from 2D motion processing generalize to more realistic 3D motion. We therefore characterized and compared temporal integration underlying 2D (left/right) and 3D (toward/away) direction discrimination in human observers, varying motion coherence across a range of viewing durations. The resulting discrimination-versus-duration functions followed three stages, as follows: (1) a steep improvement during the first ∼150 ms, likely reflecting early sensory processing; (2) a subsequent, more gradual benefit of increasing duration over several hundreds of milliseconds, consistent with some form of temporal integration underlying decision formation; and (3) a final stage in which performance ceased to improve with duration over ∼1 s, which is consistent with an upper limit on integration. As previously found, improvements in 2D direction discrimination with time were consistent with near-perfect integration. In contrast, 3D motion sensitivity was lower overall and exhibited a substantial departure from perfect integration. These results confirm that there are overall differences in sensitivity for 2D and 3D motion that are consistent with a sensory difference between binocular and dichoptic sensory mechanisms. They also reveal a difference at the integration stage, in which 3D motion is not accumulated as perfectly as in the 2D motion model system.
Introduction
Perceptual decision making is often explained in terms of the temporal integration of noisy sensory evidence (Barlow, 1958; Laming, 1968; Burr and Santoro, 2001). In a well-studied random dot direction discrimination task (Newsome and Paré, 1988), performance reflects near-perfect integration of noisy sensory signals over hundreds of milliseconds (Gold and Shadlen, 2003; Palmer et al., 2005; Kiani et al., 2008). Previous work has focused on discriminations in which the axis of motion is on the frontoparallel plane. For example, subjects may be asked to discern whether the dots translate left versus right or up versus down. Little, if anything, is known about whether the conclusions of such work in the frontoparallel (i.e., 2D) plane extend to the processing of motion through depth—motions toward versus away from the observer—hereafter called “3D motion.”
By measuring the effects of both the strength and duration of motion, prior work (Burr and Santoro, 2001; Gold and Shadlen, 2001; Palmer et al., 2005; Kiani et al., 2008) has shown that 2D direction discrimination depends on the time-integrated motion over some nontrivial time range (between 250 and 2000 ms), during which discrimination sensitivity improves with the square root of viewing duration. Although this decision integration stage has received primary focus, it is likely to be bounded by earlier and later stages where sensitivity follows different dependencies on duration. For very short durations, sensitivity can increase more steeply with duration, reflecting the properties of early sensory-processing stages (Bair and Movshon, 2004). For very long durations, sensitivity can stop improving with viewing duration and instead saturate (Watamaniuk and Sekuler, 1992; Burr and Santoro, 2001).
To compare the temporal integration of 2D and 3D motion, we measured accuracy as a function of stimulus duration and motion coherence for stimuli that were identical except for the axis of motion. Consistent with prior work, discrimination accuracy was typically higher for 2D than for corresponding 3D conditions (Tyler, 1971; Brooks and Stone, 2006). Sensitivity for both 2D and 3D direction improved rapidly over the first ∼150 ms of viewing duration, followed by a more gradual improvement from 150 ms to ∼1 s, until ultimately saturating. Although it is unlikely that motion processing relies on three absolutely separate stages, this pattern suggests distinguishable phases that are interpretable as an early sensory period, a later decision stage involving some form of temporal integration, and a terminal period in which performance was basically constant.
During the early sensory period, 3D sensitivity was lower than 2D sensitivity but increased with duration in a similar manner, implying a lower signal-to-noise ratio for 3D sensory mechanisms. During the decision phase, 3D sensitivity was still lower than 2D, but increased with a shallower slope, indicating a less-than-perfect mechanism for integrating sensory evidence over time. Finally, 3D sensitivity stopped improving at a slightly later time, but still at a lower level than for 2D motion. Together, these results suggest that both sensory and decision components of discriminating 3D motion direction cannot be parsimoniously explained by what is known about frontoparallel motion processing, motivating further work to explain both the sensory and decision differences.
Materials and Methods
General procedure.
Data were collected from five observers (four males and one female; age range, 24–50 years; including three of the authors), all with good stereopsis and normal or corrected-to-normal vision. Experiments were undertaken with the written consent of each observer, and all procedures were approved by the University of Texas at Austin Institutional Review Board.
We characterized direction discrimination for 2D and 3D motion using a random dot kinetogram inspired by Newsome and Paré (1988), which was stereoscopically generalized so that motion could be controlled either on the x-axis (leftward/rightward; 2D) or the z-axis (toward/away; 3D; Czuba et al., 2010). 2D frontoparallel motion was generated by presenting the same motion direction to each eye. At high coherences, this generated a percept of many dots at various depths moving leftward/rightward. For 3D motion, opposite directions of motion were presented in the two eyes. At high coherences, this generated a percept of many dots at different depths flowing toward/away through a cylinder (Fig. 1A). Most monocular properties of the stimuli were therefore identical, allowing us to compare 2D and 3D sensitivity in common stimulus units of motion coherence. Each subject completed between 10 and 20 sessions (mean, 15 ± 5 sessions). 2D and 3D motion types were presented in separate experimental runs. A total of 73,800 trials were collected across the five observers.
Each trial began with a 300 ms presentation of a fixation cross, followed by a motion stimulus of variable viewing duration and motion coherence. Observers reported the perceived direction of motion with a keypress. Auditory feedback was provided 700 ms after the response, and the next trial began 400 ms later. Within each run (360–600 pseudo-randomized trials), stimuli were drawn from one of six different motion coherences and two directions (rightward or leftward for 2D motion; toward or away for 3D motion). Motion coherence was defined as the proportion of coherently moving dots. Coherences were 3%, 6%, 12%, 25%, 50%, and 100%; one subject was presented with 1.5% coherence (and none were presented at 100%) due to especially high sensitivity.
Each coherence/direction parameter combination was presented over a range of durations. In the first round of data collection, durations were selected from a truncated exponential distribution to approximate a flat hazard rate (minimum of 33 ms/two monitor frames at 60 Hz; maximum of either 1.2 or 1.5 s). The distribution was divided into deciles from which durations were randomly selected to approximate an exponential shape. A total number of 69,000 trials was collected across all observers in this round.
We performed a second round of data collection using longer viewing durations (uniform distribution, 300–6000 ms). Both phases were combined after verifying that performance did not differ on overlapping durations (likelihood ratio rest, p > 0.999). All subjects completed two runs with these longer durations for both 2D and 3D, netting 4800 trials across the five observers.
Display and stimulus.
Stimuli were presented on a single linearized 42 inch LCD monitor (60 Hz, 1920 × 1080 resolution; LC-42D64U, Sharp) viewed through a 70 cm optical path of a mirror stereoscope. The monitor was confirmed to produce reliable refreshes and luminance additivity, and was driven by a Mac Pro computer with an NVIDIA GeForce 8800 GT video card. The stereoscopic stimuli (described in detail previously by Czuba et al., 2010) were generated using the Psychophysics Toolbox (Brainard, 1997) and MATLAB (MathWorks) version 2012a. Monocular half-images were presented separately on the left and right halves of the display, with a septum and baffles. Each half-image (subtending 30°) had a white fixation dot in the center of a small central square (1.0°), with horizontal (black) and vertical (red) nonius lines located 2° off-center of each monocular half-image. To further aid fixation and binocular alignment, static 1/f noise texture was presented in the background of the fixation square and stimulus aperture.
The random dot stimulus (6° diameter aperture, centered 5° above fixation) consisted of 40 uniformly distributed and binocularly paired moving dots (average density, 1.4 dots/degree2; Michelson contrast of 0.3 on a middle gray background). Half the dots (9 arcmin diameter) were dark, and half were bright (19.5 and 36.5 cd/m2, respectively). Individual dots subtended a visual angle of 9 arcmin, and each dot moved at a monocular speed of 1.0°/s, with a maximum lifetime of 250 ms. Dots reaching the edge of the stimulus volume before their lifetime expired were “wrapped” to the opposite end. Dot disparities were constrained to uniformly span a cylindrical volume ±0.7° from the plane of fixation. For 2D conditions, each dot was at a fixed disparity and moved along the frontoparallel plane; for 3D conditions, each dot moved along the z-axis within a volume that spanned the same range of disparities as for the 2D conditions. The different wrapping contingencies between 3D and 2D decreased the 3D lifetimes by one to two video frames compared with the 2D dots, but this small lifetime difference had no measurable consequences on direction discrimination performance over time as assessed in other experiments (data not shown), in line with previous studies (Scase et al., 1996; Festa and Welch, 1997).
Data analysis.
Data were analyzed in Python, and used the pandas, numpy, scipy, and pypsignifit packages (Fründ et al., 2011). Figures were generated in MATLAB (Mathworks) version 2014a. All code is available at https://huklab.github.io/3d-integ.
Each subject contributed at least 3400 trials in a given condition (2D or 3D), although the exact number of trials varied because subjects completed different numbers of runs. To compute the average across all subjects, we sampled 3400 trials (with replacement) from each subject and then combined the data across all five subjects, resulting in 17,000 trials contributing to the averages. In all analyses, we discarded trials with viewing durations or motion coherences that were not shown to every subject, leaving a total of 61,500 trials (57,500 from the initial round of data collection, 4000 from the second round). All reported confidence intervals (CIs) were generated as bootstrapped estimates of ±1 SEM (i.e., the central 68.2%).
To assess the effect of motion coherence (x) on accuracy, we fit psychometric functions of logistic form to trials from each duration bin, with lower bound 50% and upper bound (1-λ), as follows:
This is a three-parameter fit where m and w represent the midpoint and width of the curve, respectively, along with an additional parameter specifying a stimulus-independent lapse rate, λ. These parameters were fit using the Bayesian inference framework in pypsignifit, with priors m ≈ normal(0, 5), w ≈ normal(1,3), λ ≈ β(1.5, 12).
We extracted the 75% motion coherence threshold as the numerical inverse of the fitted psychometric function where P(x) = 0.75. Error bars were calculated from 10,000 bootstrapped thresholds.
To characterize motion sensitivity as a function of duration, we divided the distribution of motion durations into 20 equal quantiles, computed a threshold for each bin, defined sensitivity as the inverse of threshold, and fit sensitivity values versus duration using a continuous trilimb function on logarithmic coordinates. The slope of the third line was fixed to zero. Given the constraint of continuity, sensitivity as a function of stimulus duration, with five free parameters, becomes the following:
where the elbow points t1→2 and t2→3 specify the transition times between the three line segments; m1 and m2 are the slopes of the first and second phases (recall that the third phase is fixed at zero slope); and S0 is the asymptotic sensitivity (i.e., accuracy in the third phase).
We could not confidently obtain sensitivity estimates in the two shortest duration bins of the 3D condition because accuracies in this range did not reach the threshold value of 75% correct. These data points were excluded from analysis.
Results
Subjects stereoscopically viewed a random dot motion stimulus and discriminated between two possible directions of motion. In the 2D motion (frontoparallel) condition, the coherent dots moved either leftward or rightward. In the 3D motion (motion-through-depth) condition, coherent dots moved through depth either toward or away from the observer (Fig. 1A). From trial to trial, we manipulated the proportion of coherently moving dots similarly in both 2D and 3D motion conditions, allowing sensitivity comparisons between the two motion types in common units of motion coherence.
Direction discrimination in 2D and 3D depends on motion coherence and duration. A, Apparatus and task. Subjects viewed the stimulus through a mirror stereoscope. Motion appeared in a circular aperture above the fixation cross. In the frontoparallel 2D motion condition, the two monocular views of the moving dots were identical except for fixed horizontal disparities. In the 3D motion condition, paired dots in the two monocular images moved in opposite directions from one another, consistent with motion-through-depth. Motion was presented at variable coherence values and for a variable duration. In this example, 2D motion is to the right; 3D motion is away from the observer (the correct answer is circled in green). B, Psychometric function for a single subject (left) and for all five subjects (right), for both motion conditions. The proportion of correct responses as a function of motion coherence for 2D (blue) and 3D (red), over all stimulus durations. Thresholds are indicated on the x-axis. C, D, Surface plot shows the joint influence of motion coherence and motion viewing duration (x and y “floor” axes) for 2D (C) and 3D (D) direction discrimination accuracy (height). E, F, Slices from the surfaces can be taken to make sample psychometric functions for more conventional visualizations of these dependencies for 2D sensitivity (E) and 3D sensitivity (F). Each psychometric function was computed for a particular duration range, color coded to match the corresponding location on the surface plot. Only 4 functions are illustrated for clarity. Error bars of ±1 SEM are often smaller than the rendered data points.
Accuracy for both 2D and 3D direction discrimination improved with motion coherence (Fig. 1B shows the overall psychometric functions, combining all stimulus durations). Consistent with previous reports of “stereomotion suppression,” 2D direction discrimination was superior to 3D (Tyler, 1971; Brooks and Stone, 2006). Average discrimination thresholds were 11% coherence for 2D motion and 24% coherence for 3D motion (68% CIs, 10.8–11.0 and 23.4–24.1, respectively).
Discrimination accuracy depends on motion coherence and viewing duration
Viewing duration also affected discrimination accuracy for both 2D and 3D motion. Accuracies for 2D and 3D motion (as a function of both duration and coherence) are shown as surfaces in Figure 1, C and D, respectively. Accuracy increased with either motion coherence or duration. The effect of duration on the psychometric function (i.e., slices of the surface parallel to the motion coherence axis) is illustrated for four sample duration ranges (Fig. 1E,F).
For 2D motion, accuracy increased steeply as a function of motion coherence and reached perfect or near-perfect levels, even at viewing durations as short as 67–83 ms (Fig. 1E, green curve). Thresholds were 4%, 8%, 15%, and 24% coherence for the four sample duration ranges shown, confirming the systematic dependence of sensitivity on duration. This pattern also held for 3D motion, with thresholds of 12%, 18%, 40%, and 83% for the same four duration ranges, despite thresholds being overall higher than those for matched 2D conditions.
Temporal integration of 2D and 3D motion
To more completely compare 2D and 3D temporal integration, we analyzed direction discrimination sensitivity (inverse threshold) across the full range of viewing durations (from a minimum of two video frames to a maximum of 6 s). These sensitivity-versus-duration functions are shown in Figure 2A. For both 2D and 3D motion direction discrimination, sensitivity first increased steeply, followed by a more gradual increase over an intermediate range, capped by an asymptotic sensitivity level. For simplicity, we fitted trilimb piecewise linear functions to these curves. Although not a mechanistic model (nor an endorsement of cleanly discrete stages), the three linear regimes loosely map onto the following three distinguishable phases: first, a brief sensory phase where small increases in duration can have dramatic effects on direction sensitivity; second, a decision stage, where prolonged sampling of the stimulus benefits performance, given some form of temporal integration of noisy sensory evidence; and third, a final regime where prolonged viewing has no additional effect on performance. The trilimb fits describe these stages with five free parameters: slopes for the first (m1) and second (m2) limbs, transition times between the first and second (t1→2) and second and third limbs (t2→3), and an asymptotic sensitivity value (S0).
2D and 3D motion sensitivity exhibit different patterns of dependency on duration. A, Sensitivity versus stimulus viewing duration for 2D motion (blue) and 3D motion (red), on logarithmic axes. Each data point is a sensitivity value derived from a psychometric function within a given stimulus duration range. The trilimb function fit describes the three stages of integration. The second stage (decision integration) is emphasized in gray. B, Best fitting slope values for the integration stage limb for 2D and 3D motion (perfect integration of noisy evidence indicated by the dashed line; slope, 0.5). Error bars indicate ±1 SEM. C, Individual subject integration stage slopes, for 2D motion (abscissa) and 3D motion (ordinate). D, Comparison of a trilimb and bilimb fits for 2D motion (left) and 3D motion (right). Top, The two fits applied to data from A. Bottom, Residuals for both fits (up until performance saturation). The dashed line represents the elbow point in the trilimb fit, at which a systematic change in bilimb residuals is evident (from underestimation to overestimation of slopes).
The most obvious difference between the two functions, the vertical shift, captures the higher overall sensitivity for 2D motion over 3D motion. Despite this large offset in overall sensitivity, the transition times between stages were quite similar between 2D and 3D conditions. The median transition times from the first phase to the second, t1→2, were 136 ms for 2D and 171 ms for 3D (CIs, 86–149 and 141–194, respectively). Median t2→3 values were 983 ms for 2D and 1267 for 3D (CIs, 983–1040 and 1199–3110, respectively). Thus, it appears that in both types of motion the early sensory stage persists for ∼150 ms, which transitions to the decision phase until ∼1 s, after which sensitivity ceases to benefit from longer viewing duration.
Integration of 3D motion is different than 2D motion
Despite similarities in the timings of transition points among the three stages, the slopes and intercepts revealed differences between 2D and 3D motion. The putative sensory stage had higher sensitivities for 2D than for 3D motion, but had similar m1 values of 1.23 and 1.30 (CIs, 1.11–1.54 and 1.10–1.60, respectively). The offset points to a lower sensory signal-to-noise ratio for 3D motion, but the similar slopes suggest commonality in temporal summation of the sensory mechanisms.
Of primary interest is the difference between integration slopes during the second (putative decision) stage (Fig. 2B). 2D motion had an m2 value of 0.47, whereas the m2 value for 3D motion was significantly shallower (0.31; CIs, 0.41–0.53 and 0.20–0.35, respectively). The 2D motion slope was close to 0.5 (Fig. 2B, dashed line), which is indicative of near-perfect integration of noisy evidence and is consistent with the results of prior studies of frontoparallel direction discrimination (Palmer et al., 2005; Kiani et al., 2008). The shallower 3D motion slope, in contrast, does not uniquely specify a particular integration mechanism but is clearly distinct from the frequently observed near-perfect integration for 2D motion. This difference was statistically reliable in four of five subjects when fit individually (Fig. 2C).
3D sensitivity did not continue to improve with viewing duration to levels comparable to those for 2D motion. Even though, if anything, the second phase of evidence accumulation may have continued for a slightly longer time than that for 3D (hitting an asymptote at 1267 ms, compared with 983 ms for 2D), the final level of performance (S0) for 3D (8.3 coherence−1) was still significantly lower than that for 2D (22.4 coherence−1; CIs, 8.0–8.8 and 21.5–23.6, respectively).
Support for the three-stage descriptive model
We have favored trilimb fits with two separate integration stages before saturation (Neri et al., 1998), but it is also conventional to consider a simpler bilimb fit with a single integration phase followed by an asymptotic regime (Watamaniuk and Sekuler, 1992; Burr and Santoro, 2001). We therefore compared the previously used bilimb and our proposed trilimb functions directly by fitting each to the data and evaluating the fitting errors (Fig. 2D). For both 2D and 3D motion, the bilimb fit resulted in substantially larger residuals than the trilimb fit. This is not surprising given that the trilimb function has a larger number of parameters, but closer inspection reveals that the residuals for the bilimb fit exhibit a systematic pattern, beginning with a large negative lobe followed by a positive one. This is not apparent for the trilimb fits. The structure of these residuals implies that bilimb fits may underestimate the steepness of the temporal integration sensory stage—and, more critically, can overestimate the steepness of the decision phase. The first ∼150 ms of motion processing appear distinct enough from decision-related temporal integration to warrant care that the two phases be modeled separately in quantitative fits. Although we do not propose trilinear fits as a biophysically plausible model, we do favor them as a tractable descriptive form. That said, other functions yield similar results. Saturating exponential fits also reveal that 2D integration is closer to perfect than 3D integration (time constants, 291 and 413 ms, respectively; CIs, 260–324 and 377–457 ms, respectively). Furthermore, a model-free comparison that does not enforce distinct phases also supports differential integration: the log-difference between 2D and 3D sensitivities (across all durations for which thresholds could be estimated) is approximately linear with nonzero slope (slope, 0.34; CI, 0.32–0.35).
Discussion
Our characterization of 3D direction discrimination revealed several important differences from the well-established near-perfect integration of 2D (frontoparallel) motion. First, 3D sensitivity was lower than 2D sensitivity across all viewing durations, consistent with a lower sensory signal-to-noise ratio. Second, decision formation for 3D motion deviated significantly from lossless accumulation of noisy evidence. A similar distinction has been noted for complex motions that were not stereoscopic, but were consistent with motion through depth (Burr and Santoro, 2001).
Although our results do not definitively reveal why the discrimination of motion through depth does not rely on the near-perfect temporal integration that is so often found for frontoparallel motion, the differential ecological importance of toward/away versus left/right motions may underlie this stark difference. Motion directly approaching the head captures attention automatically (Lin et al., 2009), warrants immediate action, and may thus be more amenable to processing on very brief time scales. Put another way, humans might not integrate perfectly beyond 200 ms simply because there is rarely pressure to do so; any required action will have already been initiated. On the other hand, although less perfect integration for 3D motion was evident in all subjects, it certainly remains possible that 3D integration might approach 2D integration with more extensive training or instruction.
Relatedly, our 3D motion stimuli contained both disparity-based and velocity-based cues to motion through depth, which have been shown to operate in different regimes with respect to speed and eccentricity (Czuba et al., 2010). It is possible that they also exhibit a different reliance on temporal integration. Future work will be required to understand the differential contribution of these cues to the integration of motion through depth, the impact of monocular cues (e.g., changes in size and looming), and the neural mechanisms subserving this process. Our work here raises the intriguing possibility that 3D motion perception may differentially depend on different sources of information as a function of time and the integration demands of the task at hand. It now seems imperative to understand why signals that are likely to have more direct behavioral and ecological importance are decided upon by a distinct integration scheme that is mathematically less ideal.
Footnotes
This work was supported by National Eye Institute Grant R01-EY-020592 to A.C.H., L.K.C., and Adam Kohn (Associate Professor, Albert Einstein College of Medicine, NY). L.N.K. is supported by a Howard Hughes Medical Institute International Student Research Fellowship.
The authors declare no competing financial interests.
- Correspondence should be addressed to Leor N. Katz, Center for Perceptual Systems, A8000, Austin, TX 78712. leor.katz{at}utexas.edu