Abstract
The common marmoset (Callithrix jacchus), a small-bodied New World primate, offers several advantages to complement vision research in larger primates. Studies in the anesthetized marmoset have detailed the anatomy and physiology of their visual system (Rosa et al., 2009) while studies of auditory and vocal processing have established their utility for awake and behaving neurophysiological investigations (Lu et al., 2001a,b; Eliades and Wang, 2008a,b; Osmanski and Wang, 2011; Remington et al., 2012). However, a critical unknown is whether marmosets can perform visual tasks under head restraint. This has been essential for studies in macaques, enabling both accurate eye tracking and head stabilization for neurophysiology. In one set of experiments we compared the free viewing behavior of head-fixed marmosets to that of macaques, and found that their saccadic behavior is comparable across a number of saccade metrics and that saccades target similar regions of interest including faces. In a second set of experiments we applied behavioral conditioning techniques to determine whether the marmoset could control fixation for liquid reward. Two marmosets could fixate a central point and ignore peripheral flashing stimuli, as needed for receptive field mapping. Both marmosets also performed an orientation discrimination task, exhibiting a saturating psychometric function with reliable performance and shorter reaction times for easier discriminations. These data suggest that the marmoset is a viable model for studies of active vision and its underlying neural mechanisms.
Introduction
For decades, the rhesus macaque has been the dominant model for studying the neural correlates of visual perception. Details of the visual anatomy and physiology are well established, as well as behavioral tasks that link neuronal signaling to perceptual discrimination and decision making (Felleman and Van Essen, 1991; Parker and Newsome, 1998). A key gap in our understanding stems from the limited set of tools for manipulating the activity of neuronal populations, a critical step for establishing their causal role in behavior. The development of transgenic lines, such as CRE lines, and virus-based optogenetic techniques have made this level of control possible in the mouse (Livet et al., 2007; Cardin, 2012). However, the mouse brain differs substantially from both human and nonhuman primates. Mice, whose primary form of sensation is whisking, are colorblind and have low spatial resolution vision that lacks the strong foveal representation characteristic of primates. Though they do make rapid eye movements, these do not serve to bring regions of interest onto the fovea (Sakatani and Isa, 2007). The common marmoset (Callithrix jacchus), a small-bodied New World primate, provides a good model of human vision and can perform a variety of visual discrimination and cognitive tasks (Roberts et al., 1990; Maclean et al., 2001; Derrington et al., 2002; Barefoot et al., 2003; Clarke et al., 2004, 2011; Spinelli et al., 2004; Yamazaki et al., 2011; Nakako et al., 2013). Marmosets mature quickly and breed readily in captivity (Rylands, 1993), so they are amenable to the kinds of genetic manipulations used in mice and have been key to the development of the first primate transgenic lines (Sasaki et al., 2009; Okano et al., 2012). Thus the marmoset holds the potential to revolutionize the tools used for studying vision and higher cognition in the nonhuman primate.
A critical question, however, is whether the marmoset can perform visual tasks when the head is stabilized. Head restraint is critical for making standard neuronal recordings, and in visual studies, is critical for controlling eye movements. The effect of head restraint on marmoset eye movements has not been systematically studied. Because New World primates, including marmosets, use head movements more than other primates (McCrea and Gdowski, 2003; Burkart and Heschl, 2006, 2007), it is unclear if head restraint would disrupt their visual behavior or make them unresponsive. A still greater unknown is whether marmosets can learn to deliberately control their fixation behavior and perform visual discrimination tasks under head fixation.
In the first set of experiments in our study we compared the free viewing behavior of marmosets to macaques under head restraint and tracked eye movements. We then applied operant conditioning techniques that have been optimized for macaques to determine whether marmosets could control their fixation behavior and make perceptual decisions. We find that their visual behavior is comparable to macaques and that they can learn to perform tasks using eye movements. The marmoset thus represents a viable alternative to the macaque as a model system for studying visual cognition and oculomotor behavior.
Materials and Methods
Many of the methods used for habituating marmosets to sitting in a primate chair, behavioral training, and single-unit neurophysiology have been pioneered by studies of auditory processing in Xioaqin Wang's laboratory at the Johns Hopkins University (Lu et al., 2001a,b; Wang et al., 2005; Eliades and Wang, 2008a,b; Osmanski and Wang, 2011; Remington et al., 2012; Roy and Wang, 2012; Osmanski et al., 2013). Details for the design of the primate chair, behavioral conditioning under head restraint, and stability of neuronal recordings in head-fixed marmosets are available (Lu et al., 2001a,b; Remington et al., 2012). In the current study, we introduce methods for accurate eye tracking and for training in visual tasks requiring accurate fixation control. These methods will be essential for studies of visual processing in the awake marmoset.
Eye position calibration and stimulus presentation.
To obtain accurate eye tracking, head posts were surgically implanted in two marmosets. They served to stabilize the head during experimental sessions. Eye movements were also collected from two macaques that had been implanted for earlier studies. Surgical procedures have been described previously in macaques (Reynolds et al., 1999) and in marmosets (Lu et al., 2001b). All procedures with marmosets were performed in the laboratory of Cory Miller under the approval of the Institutional Animal Care and Use Committee at the University of California, San Diego. All procedures with macaques were performed in the laboratory of John Reynolds under the approval of the Institutional Animal Care and Use Committee at the Salk Institute for Biological Studies. All procedures conformed to National Institutes of Health guidelines. All marmoset and macaque subjects were male.
Eye position was continuously monitored with an infrared eye-tracking system (120 Hz, ETL-200 ISCAN) for two marmoset and two macaque subjects. This system operates by identifying the darkest regions of the infrared image that correspond to the imaged pupil (as seen for a marmoset in Fig. 1A). After a threshold is applied for the dark pixels of the pupil (Fig. 1B, white), the center of mass of the pupil is computed to estimate the direction of gaze. Accurate eye-tracking benefits from zooming and focusing the pupil to fill the imaged area. An example of the pupil of a macaque is shown for comparison in Figure 1C. While the macaque pupil is well contained in the ocular orbit, the marmoset pupil can be so large that it becomes occluded at the edges. To constrict the pupil, it was important to provide a bright display for viewing. In addition, it was critical to position the marmoset's head to face to the center of the monitor to ensure that the pupil was at the center of the orbit when fixating. This maximizes the ocular range in the eye tracker.
Imaging of the marmoset pupil and calibration of eye position. A, B, Image of the marmoset pupil before (A) and after (B) thresholding dark pixels. C, Images of a macaque pupil after thresholding. D–F, A subset of images used for eye position calibration with fixations (shown in red) and eye traces (in yellow) superimposed. Calibration images consisted of discrete patches of marmoset faces against a uniform gray background.
Static color images were presented on the computer monitor to calibrate eye position and assess free viewing behavior. The monitor was adjusted for a high background luminance to constrict the pupil (Sony SDM X95F, 1024 × 768 pixels, 60 Hz, with 150 on all guns giving 90 cd/m2 and with the black background giving 4.6 cd/m2). To linearize the monitor, the luminance as a function of each gun value was measured in steps of 15 LUT values with a photometer (PR-701; Photo Research). The data were well fit by a quadratic function (LR(x) = 0.1061x + 0.0010x2, LG(x) = 0.3200x + 0.0005x2, LB(x) = 0.0024x + 0.0002x2), which were used to linearize the luminance of the monitor. Each marmoset subject sat upright viewing the monitor from a distance of 44 cm in a dark enclosure that was covered by black drapes. Most visual experiments select a viewing distance of 57 cm for convenience, as 1 cm on the monitor then corresponds to one visual degree. However, our initial design of the marmoset behavior rig was more compact giving a viewing distance of 44 cm. At this distance 0.77 cm on the monitor corresponded to one visual degree and contained 21.8 pixels on the horizontal and 21.2 pixels on the vertical. Due to constraints with the macaque chair, they viewed the monitor at 47 cm at which 0.82 cm on the monitor corresponded to one visual degree. The images were rescaled to give identical size in visual degrees. The stimuli were RGB images of natural scenes obtained at Google images. They were framed against a gray background in a 960 × 720 pixel area, which spanned ±22 degrees on the horizontal axis and ±17 degrees on the vertical axis. The total stimulus set included five images used for calibration of pupil coordinates and 25 natural images including marmosets, macaques, and humans to assess free viewing behavior.
The eye tracker was calibrated using a subset of images that attracted the marmosets' gaze to discrete locations. This set of calibration images consisted of small, framed marmoset faces against a gray background. Examples of three calibration images are shown in Figure 1, D–F, with eye traces and fixations (shown as red points) superimposed over them. Calibration images were used off-line to align the pupil position in the eye-tracking system to the coordinates of the viewed images. Each image was displayed for 20 s and pupil position was recorded. The coordinates were then centered, rotated, and scaled on each axis to overlay the eye position estimates with the points of interest in the calibration image. The same scaling parameters were applied over the full set of calibration images to obtain the best overall fit. Outliers in eye position beyond the image area, typically caused by blinks, were removed. Once aligned, fixations were registered as points where the eye dwelled within a 0.5 visual degree window when comparing the position in the first and second halves of a 150 ms window. As seen for the examples in Figure 1, D–F, the clustering of eye position indicated good registration with the discrete image patches. There is modest distortion (<5%) at the edges of the viewable area where the linearity of the calibration begins to break down. If the pupil is not well contained in the orbit, this linear approximation breaks down earlier because the pupil is occluded from the edges of the orbit and eyelid. Thus positioning the pupil so it is centered in the orbit for the central fixation point is important for maximizing the range over which the calibration is accurate.
Detection of eye movements.
Raw eye traces of eye position were filtered to reduce sources of imaging noise before identification of saccadic eye movements. Eye traces were first passed through a median filter that reduces higher frequency jitter (median filter width ±3 samples, giving ±25 ms at 120 Hz). Traces were then spline interpolated, resampled at 1000 Hz, and smoothed with a second-order Butterworth noncausal filter (−3 dB at 50 Hz). Examples of the raw and smoothed traces for a marmoset subject are shown for horizontal and vertical eye position in Figure 2A and zoomed in for Figure 2E (black dots are raw data, solid lines are smoothed). Using the intervals of constant fixation, we assessed the accuracy of the eye-tracking system by the variance in eye position. The root mean squared error was 0.094 and 0.088 degrees for marmosets and 0.234 and 0.208 degrees for macaques. After filtering, it was reduced to 0.054 and 0.041 degrees and 0.099 and 0.086 degrees.
Detection of eye movements. A, Raw traces for horizontal (in red) and vertical eye position (in blue) during free viewing over 2 s. Vertical green lines indicate the start and end of each identified saccade. B, C, The velocity and acceleration over time with dashed lines indicating thresholds for saccade detection. D, The percentage improvement fitting a logistic function at each time compared with a spline of equal parameters (dashed line indicates threshold). E, Close-up of the raw eye position (black dots), the smoothed traces (solid lines), and the fit logistic (dashed lines).
Saccadic eye movements were detected from the filtered traces using velocity and acceleration criteria similar to previous studies (Krauzlis and Miles, 1996). The first two criteria implement a threshold for saccade velocity (10 degrees/s) and acceleration (1000 degrees2/seconds). Velocity and acceleration are shown on a log scale for the example traces in Figure 2, B and C. Velocity was computed from the smoothed eye traces, and then the velocity was smoothed (second-order Butterworth noncausal filter, −3 dB at 50 Hz) before computation of acceleration. The velocity profile was searched for local peaks that crossed the threshold. If the acceleration threshold was crossed in the 75 ms before and after the velocity peak, then a saccade was marked for consideration.
Applying velocity and acceleration criteria alone can result in large numbers of false alarms due to imaging noise with infrared eye trackers. Thus we applied a third test that is robust to small amplitude jitter in eye traces (Mitchell et al., 2007). We required that any marked saccade must also be well fit by a logistic function that models a jump in eye position in a 150 ms window. To be considered a saccade, the variance explained by the logistic model must be 50% better than a spline model having an equal number of parameters. The percentage improvement for this model is shown over time in Figure 2D. The fit of the logistic function was also used to estimate the start and end times of the saccades, providing a more robust estimate than crossing an acceleration threshold. The starts and ends are indicated for each saccade by green vertical lines in Figure 2.
The fit logistic function had five total parameters. The first three fit the mean, linear, and quadratic trends of eye position over the 150 ms window (second-order spline). The other terms fit the width and the amplitude of the logistic function centered in the time window. This function was compared against a spline with an equal number of parameters (fourth-order spline, five parameters including the mean). The logistic and spline models were fit independently for horizontal and vertical eye position, and error evaluated over both dimensions. An example fit for the logistic function is seen for a saccade in the Figure 2E (eye traces indicated by points, logistic fit by dashed lines). The start and end times of each saccade were taken as minimum and maximum of the range spanning −3 to +3 units of the fit logistic function's width parameter (Fig. 2E, green vertical lines). The saccade amplitude was taken as the change in eye position from the identified start and end of the saccade.
These methods perform well at automatically identifying saccades of amplitudes larger than half a degree of visual arc, and can detect smaller saccades, though less reliably. Eye traces were processed automatically and each saccade verified by manual inspection, eliminating any clear false alarms resulting from blinks or other noise events.
The relationship between saccade amplitude and peak velocity (i.e., the main sequence) was sampled across the total set of identified saccades in all natural image scenes. It was fit by a least-squares regression on the peak velocity using the function y = (ax)n where y is the peak velocity, x is the saccade amplitude, a is the slope parameter, and n is an exponent capturing the saturation that is observed for larger amplitude saccades.
Normalizing face regions against regions of matched eccentricity and contrast energy.
To quantify and compare the preference for targeting faces in free viewing between macaques and marmosets, we computed the number of saccade end points that fell inside labeled face regions across a subset of 15 natural images that contained human, macaque, or marmoset faces. We also computed the probability of making another saccade inside the same face region once fixation was in that region (i.e., a refixation) as well as the duration of each fixation. The probability of targeting a face in an image, however, could be influenced simply by how well it is centered in the image as well as the focus or image contrast, which are influenced by the human photographer's preferences. To control for this possibility, we compared the labeled face regions against regions of equal size that were matched in eccentricity from the image center and for their total contrast energy (contrast or focusing). Each face was manually labeled by a circular area that enclosed it (see Results; Fig. 4A, green circle). To compute the contrast energy in that area, the image color was ignored and the luminance of each pixel computed from the monitor lookup table. Each grayscale face was windowed with a Gaussian (σ = 2/3 the radius of the labeled region), the mean luminance was subtracted, and the fast Fourier transform (FFT) computed. The absolute value of the FFT was averaged as a function of radius (i.e., as a function spatial frequency). The net contrast energy was taken as the energy summed across spatial frequencies. To identify image patches of matched size and contrast, we searched in 12 degree steps around a circle of matched eccentricity sampling radial locations from −25 to 25% of the original eccentricity in steps of 12.5%. The best four regions found, as assessed by their matching contrast energy, were recorded for comparison (see Results; Fig. 4B, labeled in blue). Labeled face regions were not included in the analysis if matching regions with net contrast energy within 25% of the original could not be identified. In total, 23 instances of faces with regions of matched contrast were included from the set of 15 images containing faces. One face (1 of 24) was excluded from analysis because no region of matched contrast energy could be identified at the same matched eccentricity.
Behavioral training in fixation with peripheral flashing.
Marmosets were trained in daily sessions (4–5 d each week) between 9 A.M. and 3 P.M. A small metal tube (25 gauge stainless steel) delivered liquid rewards through a computer-controlled solenoid (Christ Instruments). The reward tube was carefully positioned within 1 mm of their upper lip so liquid dripped into the mouth without pressing against the top teeth. No food or water control was imposed during most training. However, in one subject we did implement food control for a single week to evaluate if it improved performance. The liquid reward consisted of ¼ parts marshmallow blended with ¾ parts warm water. In one subject, strawberry Nesquik was added after initial training failed with marshmallow liquid alone. Each drop of liquid reward delivered through the metal tube was ∼0.01–0.02 ml. If comfortably positioned in the primate chair, marmosets would consume between 5 and 15 ml in a daily session working several hundred (300–800) trials.
Each daily session began with the calibration of the eye position using a simple preferential looking task. A single circular fixation point (0.1 visual degrees radius) was flashed intermittently (100 ms on and 200 ms off) on a blank gray screen (Fig. 7A). To ensure attention was drawn to the central point, with a one-third probability on each flash the fixation point was replaced with the image of a face (chosen at random from 60 different images of marmosets, macaques, or humans, Fig. 7B). Because marmosets naturally engage in looking at faces, especially if they are novel each trial, this provided an excellent stimulus to draw their gaze for initial calibration of the eye-tracking system at the central location. The experimenter delivered liquid reward manually by pressing a button while the initial calibration at center was established.
Once the initial center calibration was complete, reward was delivered when the estimated eye position fell inside a defined fixation window. Computer control was handled by National Institute of Mental Health Cortex software. The initial reward required fixation only within a large window 2 degrees in radius, chosen to tolerate modest measurement errors during refinement of the calibration. Each time the marmoset acquired the flashing central point or face image within that larger window, the flashing stimulus was replaced with a constant point (white circle, 0.1 degree radius) for 250 ms (with no peripheral flashing stimuli; Fig. 7C,D). If the eye position was held within the fixation window over that brief period, then the originally flashed face reappeared for viewing as a visual reward and simultaneously three drops of juice were delivered over a 500 ms interval (Fig. 7E). The experimenter could manually vary the position of the fixation point from trial to trial in 5 degree horizontal or vertical steps, allowing for further refinement of the horizontal and vertical gains of the eye-tracking system. Calibration of the eye-tracking system typically required <30 trials and resulted in <1 ml of juice delivered to the subject. Once calibration was complete, the fixation window was reduced in size to ensure accurate fixation (0.5–1.0 degree radius).
Marmosets were first trained in a simple fixation task that included longer holding periods with no distracting peripheral stimuli. In each trial, the fixation was initially acquired as described in the task above with a required minimum hold period of 250 ms, and then the hold period was extended from 250 to 2500 ms. Behavioral shaping began with the hold period drawn at random from a uniform distribution from the minimum to maximum hold period (initially 250–400 ms). Each time the marmoset completed two consecutive trials successfully the minimum and maximum hold duration was incremented by 150 ms (reaching 2350–2500 ms at the maximum) while each failure correspondingly decremented it 150 ms. If the eye position remained within the fixation window for the entire hold period, the fixation point then turned black, 2–6 drops of liquid reward were delivered as an image of a face was shown, and a bell sound was played. Larger numbers of drops were given for longer duration hold periods. If the marmoset broke fixation before the end of the hold period the trial was terminated without the final reward.
Once each marmoset held a fixation for a period >800 ms we began introducing flashed peripheral stimuli that had to be ignored. The peripheral stimuli began flashing at 300 ms in the hold period continuing to its end (Fig. 7C,D). The stimuli were Gabor images (2 degree diameter, random orientation, 2 cycles/degree, Gaussian windowed with σ = 0.5 degrees) flashed at 60 Hz, each with a duration of 16.7 ms, and flashed at randomized locations in a rectangular region (Fig. 7D, narrow field), which was centered at 7 degrees eccentricity in the lower right quadrant and uniformly spanned a region ±3 degrees. Stimuli were first introduced at low contrast (5–20% Michelson contrast) and increased to higher contrasts (40 and 80% Michelson contrast) as the marmoset held fixation for at least 800 ms at the given contrast. Once the marmoset was able to maintain fixation for high-contrast stimuli, the field of flashing stimuli was centered at fixation and extended to span from ±12 degrees on the vertical and horizontal excluding the 2 degrees around fixation (Fig. 7E, wide field). We measured how long marmosets were able to maintain fixation (median duration and the upper 50–90% range of the distribution of held durations) over the course of training. Confidence intervals for the median fixation duration from each session were computed by bootstrapping.
Behavioral training in an orientation discrimination task.
To test if marmosets can be trained to perform a demanding perceptual task we measured their ability to find a target among distracters. Each trial began as described above for fixation trials, but after holding fixation for a variable period (200–400 ms), six equally spaced gratings (2 degrees in diameter, Gabor σ = 0.5 degrees, spatial frequency varying from 1 cycle/degree, 1.4 cycles/degree, or 2 cycles/degree, random spatial phase) were presented at 6 degrees eccentricity (see Results; Fig. 8A–D). Five of the six gratings were horizontally oriented while one was tilted either clockwise or counterclockwise (±0 degrees, 2 degrees, 4 degrees, 8 degrees, 12 degrees, 16 degrees, 32 degrees, or 45 degrees). After 250 ms, the fixation point turned black and disappeared cueing the marmoset to make a saccade to the location of the tilted grating. The marmoset was rewarded if it held fixation through the initial 250 ms stimulus display, and additional reward was delivered upon making a saccade to the correct target choice. Trials terminated if the first saccade went to a nontarget. Reward increased with task difficulty from two drops for easy discriminations (45 degrees) to six drops for the hardest discriminations (2 degrees). Reward was given at random (1/6 probability) on the target absent trials (0 degree tilts). The spatial frequency, tilt, and target location were chosen at random each trial. In initial training only easy discriminations (45 degree or 90 degree tilts) were included. To reduce spatial biases that occur naturally, it was also important to ensure the same spatial location was never sampled across consecutive trials. When the performance on easy discriminations exceeded 60% correct, harder discriminations were included until the full set was sampled. To ensure the marmoset was not discouraged due to task difficulty, easy discriminations (32 and 45 degrees) were sampled with twice the frequency of more difficult ones.
Results
Free viewing behavior in macaques and marmosets
Previous reports indicated that New World monkeys use head movements to a much greater extent in their orienting movements than humans or macaques (McCrea and Gdowski, 2003). As such, it was unclear if marmosets would remain visually active while under head restraint. Thus we first assessed their viewing of natural scene images that included regions of interest such as marmoset, macaque, or human faces. Earlier studies have shown marmosets are highly social and use gaze information from conspecifics in a head-free context (Burkart and Heschl, 2006, 2007), but no prior studies have tracked fixation under head restraint.
We found that both marmoset subjects were highly active in exploring natural scenes, and similar to macaques used saccadic eye movements to target regions of interest. In Figure 3, two examples of their viewing of natural scenes are shown for a marmoset subject (left) and for a macaque subject (right). Yellow lines indicate the scan path of the eyes, with red points marking fixations of greater duration than 200 ms. Marmosets are not passive under head restraint, but clearly target regions of interest that include macaque or marmoset faces.
Scanpaths of natural images including marmosets and macaques. A, C, Scanpaths for a marmoset subject for two images (red points indicating fixations with traces shown in yellow). B, D, Scanpaths shown in the same format for a macaque subject.
To quantify the extent to which marmosets target faces in free viewing, and how that compares with macaques, we assessed the number of saccades made into regions containing faces. A bias toward viewing faces was evident across the set of 15 natural scenes that contained faces, as exemplified in Figure 3. However, some biases could reflect the bias of the human photographing the image, as humans themselves would have a bias to center faces in the image and also to focus the face, thus making its features of higher luminance contrast.
To control for this possibility, we labeled each face in the natural scenes and searched for nonface regions of each size and matching eccentricity that had the same net luminance contrast (see Materials and Methods). In Figure 4, an example of an image patch containing a face (labeled in green) is shown along with four nonface regions at similar (<25%) eccentricity of matching contrast energy (labeled in blue). The contrast energy of the regions as a function of spatial frequency is shown in Figure 4B. As is characteristic of most natural images, they exhibited a 1/f drop in energy as a function of frequency. While other subtle trends differed between species, both showed a strong preference to target faces in natural viewing. An example of a marmoset subject's scan paths for this example image are shown in Figure 4C, and clearly target the labeled face over matching nonface regions. Across the set of images containing labeled faces, both marmoset and macaque subjects made significantly more saccades with end points falling in the labeled face regions (Fig. 4D, bars shown in color for each subject) compared against the matching nonface zones (shown in black, Wilcoxon rank sum test, p < 0.05). Further, once a region was acquired it was also more probable that the following saccade would fall in the same region if it were a face rather than a nonface zone (Fig. 4E; Wilcoxon rank sum test, p < 0.05).
Macaques and marmosets target image patches with faces more than nonface regions of matched contrast and eccentricity. A, Natural image containing a labeled face (green circle) and four regions of matched eccentricity and contrast. B, The contrast energy plotted on log scale as a function of spatial frequency for the example face patch (in green) and four nonface regions (in blue). C, Marmoset scanpaths of the example image (same format as Fig. 3). D, Average number of saccades landing within face and matched nonface regions over the 20 s presentation interval. E, The probability of a second saccade landing within the same face or nonface region.
Comparison of saccade metrics and oculomotor range
We next considered how the basic eye movement metrics compared between the two species based on identified saccades in viewing the full set of natural scenes. Human and macaque eye movements are characterized by a nearly linear relation between the amplitude and peak velocity of their eye movements, a relationship known as the main sequence, which reflects that saccade duration is roughly constant despite amplitude. The main sequence for a marmoset and macaque were very similar for saccade amplitudes 15 degrees or less, as illustrated in Figure 5A and B. The data points were fit by a linear relation raised to an exponent (see Materials and Methods). The macaque showed a trend for greater saturation in velocity for larger amplitude saccades, as indicated by a smaller exponent parameter, but overall the distribution was highly overlapping for two marmoset and macaque subjects (Fig. 5C). Further, the distribution of intervals between saccadic eye movements was highly overlapping with median intervals between 230 and 280 ms (Fig. 5D). The duration of saccades was also highly overlapping, with median durations of 27–28 ms (Fig. 5E). The distribution of saccade amplitudes showed a preference for nearby targets in both species, though macaque subjects had a distribution skewed toward larger amplitude saccades (Fig. 5F). This difference can in part be explained by differences in the oculomotor range, which we consider in more detail next.
Marmosets and macaque exhibit similar saccade metrics. A, B, Comparison of the relation between saccade amplitude and peak velocity, the main sequence, for a marmoset and macaque for 15 degrees or less. C, The 95% range of the data in the main sequence for two marmoset and two macaques are overlapping. D–F, The distribution of intersaccade intervals (D), saccade durations (E), and saccade amplitudes for two marmoset subjects (blue and light blue) and two macaques (red and orange).
One key difference in the oculomotor behavior of the two species is the range to which they move their eyes from the default position of rest. In each behavioral session, the face was positioned toward the monitor such that the default rest position of the eyes would target the center of the viewed image as closely as possible. Macaques more readily explored regions of the visual scene located beyond this position of rest than marmosets. To quantify this difference we computed the density of fixation points across the total set of natural images (smoothed with a Gaussian kernel, σ = 1.5 visual degrees). The normalized density of fixation locations for a marmoset subject and macaque subject are shown in Figure 6, A and B, respectively. As can be seen, the marmoset's fixations largely remained within a 10 degree radius of the larger area containing the viewed natural scene (indicate by the thin white square) while the macaque explored regions out to the boundary. To further quantify this range, we computed the fixation density as a function of the radial distance from the center of mass of the fixation locations. Both marmoset subjects showed a faster drop off in their sampled range (largely <10 degrees) than compared with macaques (Fig. 6C). This more limited oculomotor range compared with the macaque is consistent with recent findings from another New World monkey, the squirrel monkey (McCrea and Gdowski, 2003; Heiney and Blazquez, 2011). It may reflect that New World species use head movements to a greater extent in directing their gaze (McCrea and Gdowski, 2003), which could be advantageous given that their smaller heads would have to overcome much smaller inertial forces than macaques (Heiney and Blazquez, 2011). Humans exhibit larger oculomotor ranges that are more comparable to macaques (Tweed and Villis, 1990).
The oculomotor range of marmosets is more limited than macaques. A, Density of fixation positions in viewing natural images shown in color scale for a marmoset. White rectangle indicates the area of the viewed image. B, Density of fixation positions over the same set of images for a macaque. C, Density of fixation positions as a function of radial distance from the center of mass for two marmosets and macaques.
Marmoset performance in visual fixation tasks
To map out neuronal receptive fields, animals need to maintain fixation while the mapping stimuli are presented. We therefore examined to what extent marmosets can maintain fixation on a central point while peripheral stimuli are flashed. Marmosets were initially trained to look at a fixation point flashed against a blank background, which was intermittently replaced with an image of a face to draw their gaze to the central location (Fig. 7A,B). After the central target was acquired, a single fixation point remained present for 250 ms or longer durations (Fig. 7C–E). Between 2 and 6 drops of juice were delivered as reward for maintaining fixation through the hold period, with larger numbers of drops for longer durations. Successful fixation throughout the hold period was also rewarded with a second presentation of the face image. The hold period was varied according to a staircase procedure to keep the probability of completing the trials at least 70%. As the marmoset acquired the task on a blank background, we began to introduce flashing peripheral Gabor stimuli at low contrasts in a region of the lower right quadrant (Fig. 7D). The contrast of stimuli was increased through training and flashed stimuli were extended to cover a broader range (Fig. 7E).
Marmosets hold fixation on a central point as peripheral stimuli are flashed. A, B, Fixation trials begin with a central point or a face flashed on and off. A face is flashed with 1/3 probability to draw gaze. After fixation is acquired, it is held for 250 ms (C) and then peripheral Gabor stimuli are flashed (D–E) for a variable duration (250–2500 ms). Flashed stimuli appear in the lower right quadrant early in training (shown in D) and in a wider region later in training (shown in E). If fixation is held the face reappears centrally and juice reward is delivered (F). G, H, Increases in the held durations over training are shown for two marmosets. Each point plots the median fixation duration (with its 95% confidence interval shown in color, and the upper 50–90% range of the data in gray). Increasingly difficult versions of the task are indicated by the color of each point (red, blank background; purple, low-contrast stimuli in narrow field; green, high-contrast stimuli, narrow field; blue, high-contrast stimuli, wide field). Fixation window size is indicated by symbol shape (square, 1.5 degree radius; circle, 1.0 degree radius; triangle, 0.75 degree radius). I, Fixation performance for larger (circle, 1.0 deg radius) and smaller (diamond, 0.5 degree radius) fixation windows over a week of food control in Marmoset B.
To quantify how quickly marmosets learned the fixation task we measured the median duration that fixation was held over the course of training. In Figure 7G, Marmoset B failed to exceed more than brief fixation holds under 250 ms for almost 20 d. Such brief holds are in line with the typical fixation duration in free viewing, suggesting no deliberate control of fixation in the task. However, on some trials the subject did hold fixation for longer periods ranging from 500 to 900 ms (as indicated by the upper 50–90% range of durations shown by thin vertical gray lines; Fig. 7G). This subject did not progress past the threshold of 250 ms (indicated by horizontal dashed line) until a more appetitive reward was identified (labeled at day 19). It then first acquired the task for the blank background (red points), and advanced through low to high contrast flashing peripheral stimuli (purple and green points), and finally acquired the task for high-contrast stimuli flashed full field (shown in blue). As fixation training progressed, the initial fixation window (1.5 degree radius; Fig. 7G, square symbols) was reduced in size (1.0 degree radius, indicated by circle symbols). Marmoset B held fixation for a median duration of 982 ms in the final task after 30 d of training, 10–11 d after an appetitive reward was identified. A second marmoset, Marmoset P, whose data are shown in Figure 7H, made more rapid progress learning the initial task, and in fact, performed blocks of trials with median holds beyond 250 ms even in the first day of training (Fig. 7H, two leftmost points in red). Marmoset P was rapidly promoted to more difficult versions of the task, and held fixation for a median duration of 934 ms in the final task after 10–11 d (comparable to Marmoset B once appetitive reward had been identified). Although these fixation durations are shorter than typical for macaques (2–3 s), they would be sufficient to map receptive fields given enough trials. Both marmoset subjects were able to complete between 150 and 250 trials of fixation in each daily session.
Macaques typically learn fixation tasks in a few days to a week. While the second marmoset (Fig. 7H) had comparably fast learning the other subject was clearly much slower (Fig. 7G). However, if learning is considered from the time that the juice was changed to a preferred flavor, this subject also shows comparably fast learning. It is also worth noting that macaques are usually water controlled before fixation training while neither of these marmoset subjects were food or water controlled. The more immediate learning in the second subject (Fig. 7H) could reflect he was 5–10% underweight at the start of training, which may have augmented his motivation. This second marmoset continued training for over 10 months and eventually achieved performance comparable to macaques with longer hold durations (median 2172 ms; Fig. 7H, rightmost points). This highly trained subject was also able to work longer sessions, completing 600–800 trials in a discrimination task where the fixation hold periods were comparably short (<500 ms), as described below.
To further consider if the weaker performance of the first subject (Marmoset B) in fixation tasks could have reflected poor motivation, we implemented food control over a week. The subject's weight dropped by 5% in the first 3 d of food control but then stabilized. Around the same time, the amount of liquid reward consumed in daily sessions increased dramatically from 3 to 10 ml, and the subject began working much longer sessions increasing from 150 to 250 trials up to 400–600 trials. The fixation performance over that week is shown in Figure 7I for tasks including larger fixation windows (1 degree radius) and tasks with tighter fixation windows (0.5 degree radius), which would be needed for accurate receptive field mapping. The duration of fixation improved both for larger and tighter fixation windows, with performance being slightly worse for tighter windows. Thus food control can improve performance substantially, providing accuracy in fixation tasks that is adequate to map receptive fields in early visual areas.
Marmoset performance in an orientation discrimination task
A critical question for the use of marmosets in studying visual neuroscience is whether or not they can learn to perform more interesting perceptual judgments with high reliability similar to macaques. We first tested if the more highly trained of the two marmoset subjects could discriminate the location of a target Gabor grating that differed in orientation from uniformly oriented distracters. Each trial was initiated when the marmoset fixated a small flashing central point (Fig. 8A). After holding fixation at the center for a variable period (Fig. 8B), six equally spaced similar gratings (2 degrees in diameter, Gabor σ = 0.5 degrees, spatial frequency varying from 1 cycle/degree, 1.4 cycles/degree, or 2 cycles/degree, random spatial phase) were presented at 5 degrees eccentricity (Fig. 8C). Five of the six gratings were horizontally oriented while one was slightly tilted (±0 degrees, 2 degrees, 4 degrees, 8 degrees, 12 degrees, 16 degrees, 32 degrees, or 45 degrees). After 250 ms, the fixation point disappeared cueing the marmoset to make a saccade to the location of the tilted grating (Fig. 8D). If the marmoset made a saccade to a nontarget first, the trial was aborted with no reward. The marmoset first trained in 14 daily sessions with only the easiest discriminations included, then continued in 10 sessions with progressively harder discriminations added to the sampled set, followed by the last 8 sessions, which are reported here in Figure 8E and F. The marmoset was able to discriminate a 4 degree tilt above chance, and performance was reliable (>80% correct) for tilts larger than 16 degrees (Fig. 8E). Reaction times for more difficult discriminations were also longer (Fig. 8F). These results demonstrate that a marmoset can perform a visual search task, and exhibits a psychometric function with reliable performance for easy discriminations, indicating that the animal understood the task, and drops to chance performance with increased task difficulty.
Marmoset performance in an orientation discrimination task. Each task trial began with a flashing central point (A), which was acquired by fixation and held for 200–400 ms (B). Then six peripheral stimuli were presented, one of which differed from a horizontal orientation by a slight tilt clockwise or counterclockwise (C). The marmoset was given one drop of reward for holding fixation 250 ms, and additional drops for making a saccade to the target that differed in orientation (D). Percentage correct performance (E, G) and reaction times (F, H) as a function of target orientation difference for Marmoset P (E, F) and Marmoset B (G, H). RT, response time.
One key question is whether marmosets can routinely learn to perform more challenging tasks such as the orientation discrimination task described above. Training of the second weaker subject (Marmoset B) proceeded much more slowly compared with the first subject. During the first two training sessions this subject was unable to reliably select targets that were differentiated by 45 degree or 90 degree tilts, easier discriminations that would normally act as pop-out stimuli in the search array. We thus trained the subject first to select an obvious target distinguished by a contrast difference. Training proceeded in 33 sessions that included a 20–40% contrast increase labeling the target and then in another 8 sessions with a 5–10% contrast increase. Performance was above chance, but hovered between 40 and 60% correct. However, when food control was implemented the subject rapidly improved, performing above chance for easier discriminations with no contrast difference in the first 2 d and progressing to harder discriminations over the next 2 d. The performance over the last three sessions in the week of food control is shown in Figure 8G and H. This subject exhibited highly similar psychophysical performance, differing mainly in showing slower net reaction times overall, which may reflect that this subject waited for fixation to disappear before making his judgment. The improvement in this subject with food control suggests that marmosets can indeed be routinely trained in these kinds of task, as long as motivation is sufficient.
Discussion
The common marmoset, a New World monkey, offers several advantages for studies of visual neurophysiology. These include a lissencephalic (flat) cortex, making them ideal for recording using linear depth arrays as well as flat planar arrays. While the anatomy and physiology of the marmoset visual system has been studied extensively in anesthetized animals (Rosa et al., 2009), it has been unclear if they can perform visual tasks while under head restraint. This is a crucial step in the development of the marmoset as a model for visual neuroscience, as performance of tasks under head restraint is crucial both for accurate eye tracking and standard neurophysiology methods. Establishing the validity of these methods is crucial for the further development of the species as a model of human vision and the development of transgenic lines.
In the first set of experiments presented in the current study, we found that marmoset free viewing behavior was highly comparable to that of macaques. Marmosets actively engage in parsing visual scenes with saccadic eye movements that target similar regions of interest as macaques, such as other primate faces, and thus parallel eye movements in humans (Yarbus, 1967; Hayhoe and Ballard, 2005). Marmosets have different facial expressions in displays of fear, aggression, or submission (Stevenson and Rylands, 1988), and a recent study reports that the facial expressions of conspecifics influence marmoset behavior (Kemp and Kaplan, 2013). Our findings support the importance of face analysis through targeted eye movements resembling that of other primates.
The metrics of marmoset eye movements during free viewing were also highly comparable to that of macaques. Marmosets exhibited a similar relation between amplitude and peak velocity (the main sequence), distribution of intersaccade intervals, and distribution of saccade durations. One key difference was that marmosets explored a more limited range of positions from the central position of rest. Correspondingly, marmosets also exhibit smaller average amplitude saccades than macaques. This limited oculomotor range is consistent with that reported for another New World primate, the squirrel monkey (Heiney and Blazquez, 2011). Recognizing this limited range may be important for training marmosets in tasks that require central fixation, as they may exhibit greater spatial bias or even become unresponsive if stimuli deviate far from their natural position of rest. Nonetheless, marmosets do actively explore scenes up to 10 degrees from the central position, which is more than adequate for most studies of visual processing and eye movements.
In the second set of experiments we used conditioning techniques commonly used with macaques to assess to what extent eye movements in marmosets can be brought under deliberate control. Whereas macaques exhibit a variety of covert gaze control behaviors in natural conditions to avoid direct eye contact, which can constitute a threat gesture, no equivalent covert behavior has been documented among marmosets. Therefore it is not immediately clear that marmosets would possess the same degree of control over their fixation behavior. Our results demonstrate that they can learn to control fixation and can use it in making perceptual judgments for liquid reward. Marmosets will thus represent a viable alternative to the macaque and other larger primates for studying visual behavior.
During training marmosets were provided with a second potential reward, the presentation of a face on completion of correct trials. Behavior during free viewing suggested that marmosets were naturally engaged for face stimuli. During subsequent behavioral conditioning, we used a face stimulus to draw attention to central fixation and then again presented the face for inspection at the end of correct trials, accompanied by liquid reward. The liquid reward was not contingent on viewing the face, but both subjects viewed face stimuli for prolonged fixation durations at the end of trials. Thus, face stimuli may contribute as a secondary form of reward. Further studies will be necessary to ascertain the value of these two potential rewards over training.
Behavioral studies with macaques have typically used fluid control to motivate subjects. One concern with marmosets is that their smaller size could render them susceptible to rapid dehydration under fluid control. In our studies, we do find support that food control improves performance. One marmoset subject began training while slightly underweight and was able to learn the fixation task more quickly than the other subject. To test the role of food control, in the other subject we implemented food control over a single week and found clear improvements both in fixation and orientation discrimination tasks. Thus our findings in Marmoset P, which included only limited food control, likely represent a lower bound on marmoset behavioral performance.
The present findings demonstrate that marmosets exhibit interesting free viewing behaviors that are comparable to macaques, and thus will be of value as a model of active vision. However, their ability to perform tasks that include extended periods of fixation is relatively limited in comparison with macaques. For example, Marmoset P maintained fixation reliably for 2 s intervals, but was only able to perform 150–250 trials of fixation in each daily session. Macaques can maintain fixation for 3 s or more, perform several hundred trials in daily sessions, and can do so for tasks that also demand covert attention to peripheral targets (Mitchell et al., 2007). While macaques have proven reliable in learning highly constrained psychophysical tasks that involve extended stable fixation periods, it is arguable that those conditions are unrealistic in the context of natural vision and that there would be much value in developing new paradigms for more natural contexts. It is notable that the same marmoset subject who struggled with extended fixation could easily perform 600–800 trials in an orientation search task. Though this task demanded subtle discriminations of orientation, it only required brief fixations under 500 ms. Further work is required to determine the set of paradigms appropriate for marmosets, and how higher cognitive tasks might be embedded in more natural viewing conditions that avoid unrealistic periods of stable fixation. Given progress made in studies of natural auditory behaviors in marmosets (Eliades and Wang, 2008a,b; Miller et al., 2009; Miller and Wren Thomas, 2012), it may be that this nonhuman primate species is particularly appropriate for studies of neural processing in more naturalistic contexts.
A key advantage in using the marmoset is the feasibility for developing transgenic lines that will enable selective expression of proteins, to allow activation of neural circuits through opsins expressed using CRE-dependent viruses, as is currently done in mice. Building transgenic lines requires both large numbers of animals and a species that produces new generations in short periods of time. Although some success has been possible in the macaque (Yang et al., 2008), it has been difficult to achieve germ line transmission in which genetic material is passed on to subsequent generations. This is because the maturation period in the macaque is relatively long. Marmosets mature more rapidly, reaching sexual maturity in 12–18 months and breed readily in captivity (5 month gestation, typically giving birth to twins). Further, due to their small size (0.3–0.5 kg compared with 10–15 kg for macaques), it is much easier to maintain larger colonies at affordable costs. These advantages have made it possible to build the first transgenic primates with germ line transmission (Sasaki et al., 2009). Development of other lines could bring those optogenetic tools now fruitfully being used to tease apart circuits of the mouse brain to bear on understanding the brains of primates.
Although the macaque's visual anatomy and physiology has been studied in greater detail, studies over the last two decades have laid a solid foundation for studies in the marmoset. They have been used in the study of auditory and vocal processing in the awake and behaving conditions for over a decade (Lu et al., 2001a,b; Barbour and Wang, 2003; Bendor and Wang, 2005; Wang et al., 2005; Eliades and Wang, 2008a,b) and necessary techniques for their handling and behavioral conditioning for auditory tasks have been established (Osmanski and Wang, 2011; Remington et al., 2012). The anatomy and physiology of the marmoset visual systems have been examined in detail for anesthetized animals (Kaas et al., 1978; Huerta et al., 1986; Krubitzer and Kaas, 1990; Rosa and Tweedale, 2000, 2005; Bourne et al., 2002; Solomon et al., 2002; Collins et al., 2005; Roe et al., 2005; Szmajda et al., 2005; Rosa et al., 2009; Yu et al., 2010; Martin et al., 2011; Solomon et al., 2011; Valverde Salzmann et al., 2012; Chaplin et al., 2013), including a stereotaxic atlas of the marmoset brain (Paxinos et al., 2012), and recently noninvasive techniques for anatomical and functional imaging have been established (Belcher et al., 2013; Liu et al., 2013; Papoti et al., 2013), thus providing a sound basis for continued study with invasive techniques in the awake, behaving animal. One key advantage of the marmoset compared with the macaque is its lissencephalic (flat) cortex. All visual and oculomotor cortical areas are accessible on the surface of the brain facilitating optical imaging and recording with planar array electrodes. The flat cortex also makes it possible to ensure perpendicular entry of linear array electrodes, which facilitates identification of laminar position using current source density analysis techniques (Mitzdorf, 1985; Schroeder et al., 1998; Chen et al., 2007). Our current findings demonstrate that the marmoset can perform visual tasks under head restraint, which will make it of interest to a larger neuroscience community.
Footnotes
This research was supported by a seed grant from the Kavli Institute for Brain and Mind at University of California, San Diego, the Gatsby Charitable Foundation, and R01DC012087. Thanks to Cecilia Chow for assistance in training marmosets. Thanks to Camille Toarmino, Cindy Kyi, and Wren Thomas for assistance during surgeries and in the care of marmosets. Thanks to Katie Williams with assistance in the care of macaques. Thanks to Saeko Morita for the innovative idea of marshmallow juice.
The authors declare no competing financial interests.
- Correspondence should be addressed to Jude F. Mitchell, Systems Neurobiology Laboratory, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037-1099. jude{at}salk.edu