Abstract
Recent studies have challenged the ventral/“what” and dorsal/“where” two-visual-processing-pathway view by showing the existence of “what” and “where” information in both pathways. Is the two-pathway distinction still valid? Here, we examined how goal-directed visual information processing may differentially impact visual representations in these two pathways. Using fMRI and multivariate pattern analysis, in three experiments on human participants (57% females), by manipulating whether color or shape was task-relevant and how they were conjoined, we examined shape-based object category decoding in occipitotemporal and parietal regions. We found that object category representations in all the regions examined were influenced by whether or not object shape was task-relevant. This task effect, however, tended to decrease as task-relevant and irrelevant features were more integrated, reflecting the well-known object-based feature encoding. Interestingly, task relevance played a relatively minor role in driving the representational structures of early visual and ventral object regions. They were driven predominantly by variations in object shapes. In contrast, the effect of task was much greater in dorsal than ventral regions, with object category and task relevance both contributing significantly to the representational structures of the dorsal regions. These results showed that, whereas visual representations in the ventral pathway are more invariant and reflect “what an object is,” those in the dorsal pathway are more adaptive and reflect “what we do with it.” Thus, despite the existence of “what” and “where” information in both visual processing pathways, the two pathways may still differ fundamentally in their roles in visual information representation.
SIGNIFICANCE STATEMENT Visual information is thought to be processed in two distinctive pathways: the ventral pathway that processes “what” an object is and the dorsal pathway that processes “where” it is located. This view has been challenged by recent studies revealing the existence of “what” and “where” information in both pathways. Here, we found that goal-directed visual information processing differentially modulates shape-based object category representations in the two pathways. Whereas ventral representations are more invariant to the demand of the task, reflecting what an object is, dorsal representations are more adaptive, reflecting what we do with the object. Thus, despite the existence of “what” and “where” information in both pathways, visual representations may still differ fundamentally in the two pathways.
Introduction
The visual system has traditionally been divided into a ventral/“what” and a dorsal/“where” pathway (Mishkin et al., 1983). However, both monkey and human studies over the last two decades have reported robust representations of a variety of “what” information in the dorsal pathway (Sereno and Maunsell, 1998; Sawamura et al., 2005; Janssen et al., 2008; Konen and Kastner, 2008; Liu et al., 2011; Christophel et al., 2012; Hou and Liu, 2012; Ester et al., 2015; Xu and Jeong, 2015; Bettencourt and Xu, 2016; Bracci et al., 2016; Freud et al., 2016; Jeong and Xu, 2016). These findings challenge the two-pathway view and argue for a convergence between the two pathways. Are visual representations in the two pathways indeed similar or do they still differ in fundamental ways?
The existence of top-down attentional control signals in parietal cortex (Corbetta and Shulman, 2002; Yantis and Serences, 2003) and task modulation of parietal visual responses (e.g., Toth and Assad, 2002; Todd and Marois, 2004; Freedman and Assad, 2006; Xu and Chun, 2006; Gottlieb and Synder, 2010; Xu, 2010; Liu et al., 2011; Jeong and Xu 2013; Xu and Jeong, 2015; Shomstein and Gottlieb, 2016; Bracci et al., 2017) suggest that task may exert a stronger impact on visual representations in the dorsal than ventral pathway. However, some studies have failed to find such an effect (Konen and Kastner, 2008; Harel et al., 2014). Attention can also modulate visual responses in human early visual areas (Gandhi et al., 1999; Martínez et al., 1999; Somers et al., 1999) and ventral object-processing areas (O'Craven et al., 1999; Murray and Wojciulik, 2004; Reddy et al., 2009; see also Çukur et al., 2013; Harel et al., 2014). Nevertheless, the impact of task is not always found in these regions either (e.g., Kourtzi and Kanwisher, 2000; Yi et al., 2004; Peelen et al., 2009; Bracci et al., 2017). There are thus conflicting reports within each visual pathway regarding the impact of task on visual representation and the degree to which the two pathways differ in this regard.
Task can affect visual representation by prioritizing the processing of task-relevant information through selective attention, given priority to a position in space (space-based attention) (Posner, 1980), an object in the scene (object-based attention) (Scholl, 2001), or a feature within an object (feature-based attention) (Maunsell and Treue, 2006). The degree to which task-irrelevant information is processed may depend on how it is conjoined with the task-relevant information, whether they share the same position, object, or feature. For example, in ventral regions, object shape representation was degraded when attention was directed away from the object to the central fixation (Murray and Wojciulik, 2004) but unaffected when attention was directed to another feature of the same object (O'Craven et al., 1999). This suggests that the stronger the conjunction between the task-relevant and -irrelevant features, the weaker the task effect. Yet, existing studies have not taken this into account when comparing dorsal and ventral regions and have instead either focused on a single manipulation (Xu and Jeong, 2015; Bracci et al., 2017) or averaged results from a number of different manipulations (Harel et al., 2014). Consequently, a variety of results have been obtained. To understand the precise impact of task on visual representation in the two pathways, the strength of the conjunction of the task-relevant and -irrelevant information must be taken into account.
Here, across three experiments, using fMRI pattern decoding, we varied the strength of color and shape conjunction, from partially overlapping, to overlapping but on separate objects, to being fully integrated. We compared shape-based object category representation in the two visual pathways when shape was and was not task-relevant. Based on prior findings, we predict a greater task effect in the dorsal than ventral pathway and a decrease in task effect with stronger integration of the task-relevant and -irrelevant features. Our results confirm these predictions.
Materials and Methods
Participants
A total of 7 healthy adults (4 females), 18–35 years of age, with normal color vision and normal or corrected-to-normal visual acuity participated in all three experiments. All participants gave their informed consent before the experiments and received payment for their participation. The experiments were approved by the Committee on the Use of Human Subjects at Harvard University.
Experimental design and procedures
Experiment 1: color on the object and background.
In this experiment, we used gray-scaled object images from 8 object categories (faces, bodies, houses, cats, elephants, cars, chairs, and scissors). These categories were chosen as they covered a good range of natural object categories encountered in our everyday visual environment and were the typical categories used in previous investigations of object category representations in ventral visual cortex (e.g., Haxby et al., 2001; Kriegeskorte et al., 2008). For each object category, 10 unique exemplar objects were selected (Fig. 1A). These exemplars varied in identity, pose (for cats and elephants), expression (for faces), and viewing angle to reduce the likelihood that object category decoding would be driven by the decoding of any particular exemplar. Objects were placed on a light gray background and covered with a semitransparent colored square subtending 9.24° of visual angle (Fig. 1B, leftmost image). Thus, both the object and the background surrounding the object were colored. On each trial, the color of the square was selected from a list of 10 different colors (blue, red, light green, yellow, cyan, magenta, orange, dark green, purple, and brown). Participants were instructed to view the images while fixating at a centrally presented red dot subtending 0.46° of visual angle. To ensure proper fixation throughout the experiment, eye movements were monitored in all the experiments using an SR-Research Eyelink 1000 eyetracker. Because of technical problems, eye tracking data for one participant in Experiment 2 and one in Experiment 3 were not properly recorded. These participants were not included in the eye tracking data analysis.
In a block design paradigm, participants performed a one-back repetition detection task when the exact same object exemplar repeated back to back or the exact same color repeated back to back (Fig. 1C). In each block, 10 colored exemplars from the same object category were presented sequentially, each for 200 ms followed by a 600 ms fixation period between the images (Fig. 1C). In half of the runs, participants attended to the object shapes and ignored the colors, and pressed a response button whenever the same object repeated back to back. Two of the objects in each block were randomly selected to repeat. In the other half of the runs, participants attended to the colors and ignored object shapes and detected a one-back repetition of the colors, which occurred twice in each block.
Each experimental run consisted of 1 practice block at the beginning of the run and 8 experimental blocks with 1 for each of the 8 object categories. The stimuli from the practice block were chosen randomly from 1 of the 8 categories, and data from the practice block were removed from further analysis. Each block lasted 8 s. There was a 2 s fixation period at the beginning of the run and an 8 s fixation period after each stimulus block. The presentation order of the object categories was counterbalanced across runs for each task. To balance for the presentation order of the two tasks, task changed every other run with the order reversed halfway through the session so that, for each participant, one task was not presented on average earlier than the other task. Each participant completed one session of 32 runs, with 16 runs for each of the two tasks. Each run lasted 2 min 26 s.
Experiment 2: color on the dots over the object.
The stimuli and paradigm used in this experiment were similar to those of Experiment 1, except that, instead of both the object and the background being colored, a set of 30 semitransparent colored dots, each with a diameter subtending 0.93° of visual angle, were placed on top of the object, covering the same spatial extent as the object (Fig. 1B, middle). This ensured that participants attended to approximately the same spatial envelope whether or not they attended the object shape or color in the two tasks. Other details of the experiment were identical to those of Experiment 1.
Experiment 3: color on the object.
The stimuli and paradigm used in this experiment were similar to those of Experiment 1, except that only the objects were colored (Fig. 1B, rightmost image), making color more integrated with shape in this experiment than in the previous two. Participants thus attended to different features of the same object when doing the two tasks. Other details of the experiment were identical to those of Experiment 1.
Localizer experiments
The regions we examined in this study included topographic regions in occipital cortex (V1–V4) and along the intraparietal sulcus (IPS) including V3a, V3b, and IPS0-2 (Sereno et al., 1995; Swisher et al., 2007; Silver and Kastner, 2009) (Fig. 2A). We also included functionally defined object-selective regions in both pathways. In the dorsal pathway, we selected two parietal regions previously shown to be involved in object selection and encoding/storage, respectively, with one located in the inferior and the other in the superior part of IPS (hence forward referred to for simplicity as inferior and superior IPS, respectively) (Xu and Chun, 2006, 2009; see also Todd and Marois, 2004) (Fig. 2B). In the ventral pathway, we selected regions in lateral occipital (LO) (Malach et al., 1995; Grill-Spector et al., 1998) (Fig. 2C) and posterior fusiform (pFs) (Grill-Spector et al., 1998) (Fig. 2D), whose responses were shown to be correlated with successful visual object detection and identification (e.g., Grill-Spector et al., 2000; Williams et al., 2007) and whose lesion has been linked to visual object agnosia (Goodale et al., 1991; Farah, 2004).
All the localizer experiments conducted here used previously established protocols, and the details of these protocols are reproduced here for the reader's convenience.
To localize topographic visual field maps, we followed standard topographic mapping techniques (Sereno et al., 1995; Swisher et al., 2007) and optimized our parameters to reveal the maps in parietal cortex (Swisher et al., 2007). A 72° polar angle wedge swept across the entire screen, with a sweeping period of 55.467 s and 12 cycles per run. The entire display subtended 23.4 × 17.6° of visual angle. The wedge contained a colored checkerboard pattern that flashed at 4 Hz. Participants were asked to detect a dimming in the polar angle wedge. Each participant completed 4–6 runs, each lasting 11 min and 5.6 s.
To identify superior IPS, we used a visual short-term memory (VSTM) paradigm first developed by Todd and Marois (2004). As in Xu and Jeong (2015), in an event-related design, participants viewed a sample display consisting of 1–4 everyday objects, and after a delay, judged whether a new probe object matched the category of the object at the same position in the sample display. Match occurred in half of the trials. Objects were gray-scaled images from four categories (shoes, bikes, guitars, and couches). In the sample display, objects could be placed above, below, to the left, or to the right of the central fixation ∼4.0° away from the fixation (center to center). Four dark-gray rectangular placeholders, subtending 4.5° × 3.6°, marked all the possible object positions and were always present during the trial. The entire display subtended 12° × 12°. Each trial lasted 6 s and consisted of a fixation period of 1000 ms, a sample display period of 200 ms, a delay of 1000 ms, a test display period of 2500 ms in which participants provided their responses, and a feedback period of 1300 ms. Each run contained 15 trials for each set size and 15 fixation trials in which only the fixation dot appeared for 6 s. The trial order was predetermined using a counterbalanced trial history design (Todd and Marois, 2004; Xu and Chun, 2006). Two filler trials appeared at the beginning and one at the end of each run for practice and trial history balancing purposes. Each participant completed two runs of this localizer run, each lasting 8 min.
To identify inferior IPS, we used the procedure first developed by Xu and Chun (2006). As in Xu and Jeong (2015), in a block design paradigm, participants viewed blocks of sequentially presented objects and noise images. In the object blocks, gray-scaled images from four categories of everyday objects (shoes, bikes, guitars, and couches) were presented simultaneously. In the noise blocks, phase-scrambled and unrecognizable version of the same object images were presented sequentially. Each image subtended 12° × 12° of visual angle. Each experimental block lasted 16 s and contained 20 images, each presented for 500 ms and followed by a 300 ms blank period. Participants performed a motion direction discrimination task and reported the direction of a spatial jitter (vertical or horizontal), which randomly occurred, on the images twice in each block. Participants completed two runs of this localizer, with each containing 8 blocks of object images and 8 blocks of noise images. The presentation order of the different stimulus blocks was balanced following Epstein and Kanwisher (1998). An 8 s fixation period was inserted at the beginning, middle, and end of the each run. Each run lasted 4 min 40 s.
To localize LO and pFs regions involved in visual object processing, we followed the procedure described by Kourtzi and Kanwisher (2000). In a block design paradigm, black-and-white photographs of male and female faces, indoor and outdoor scenes, common objects (e.g., cars, tools, and chairs), and phase-scrambled versions of the common objects were presented. Each image subtended ∼12° ×12° of visual angle. In each stimulus block, 20 images from the same category were shown sequentially with each image presented for 750 ms each, followed by a 50 ms blank display. Participants detected a slight spatial jitter, which occurred randomly twice per block and reported it with a key press. Each run contained four 16 s blocks for each stimulus category as well as three 8 s fixation blocks inserted at the beginning, middle, and the end of the run. Participants completed two runs of this localizer experiment, each lasting 4 min and 40 s.
MRI methods
MRI data were collected using a Siemens MAGNETOM Trio, A Tim System 3T scanner, with a 32-channel receiver array head-coil. Participants lied on their back inside the MRI scanner and viewed the back-projected display through an angled mirror mounted inside the head coil. The display was projected using an LCD projector at a refresh rate of 60 Hz and a spatial resolution of 1024 × 768. An Apple Macbook Pro laptop was used to generate the stimuli and collect the motor responses. All stimuli were created using MATLAB and Psychtoolbox (Brainard, 1997), except for the topographic mapping stimuli which were created using VisionEgg (Straw, 2008).
A high-resolution T1-weighted structural image (1.0 × 1.0 × 1.3 mm) was obtained from each participant for surface reconstruction. For all functional scans, T2*-weighted gradient-echo, echo-planar sequences were used. For the three main experiments, 33 axial slices parallel to the AC-PC line (3 mm thick, 3 × 3 mm in-plane resolution with 20% skip) were collected covering the whole brain (TR = 2 s, TE = 29 ms, flip angle = 90°, matrix = 64 × 64). For the LO/pFs and inferior IPS localizer scans, 30–31 axial slices parallel to the AC-PC line (3 mm thick, 3 × 3 mm in-plane resolution with no skip) were collected covering occipital, parietal, and posterior temporal lobes (TR = 2 s, TE = 30 ms, flip angle = 90°, matrix = 72 × 72). For the superior IPS localizer scans, 24 axial slices parallel to the AC-PC line (5 mm thick, 3 × 3 mm in-plane resolution with no skip) were collected covering most of the brain, except the anterior temporal and frontal lobes (TR = 1.5 s, TE = 29 ms, flip angle = 90°, matrix = 72 × 72). For topographic mapping, 42 slices (3 mm thick, 3.125 × 3.125 mm in-plane resolution with no skip) just off parallel to the AC-PC line were collected covering the whole brain (TR = 2.6 s, TE = 30 ms, flip angle = 90°, matrix = 64 × 64). Different slice prescriptions were used here for the different localizers to be consistent with the parameters we used in previous studies. Because the localizer data were projected into the volume view and then onto individual participants' flattened cortical surface, the exact slice prescriptions used had minimal impact on the final results.
Data analysis
fMRI data were analyzed using FreeSurfer (https://surfer.nmr.mgh.harvard.edu), fsfast (Dale et al., 1999), and in-house MATLAB codes. LibSVM software (Chang and Lin, 2011) was used for support vector machine (SVM) analysis. fMRI data preprocessing included 3D motion correction, slice timing correction, and linear and quadratic trend removal. No spatial smoothing was applied.
Region of interest (ROI) definitions
Topographic maps in both ventral and dorsal regions were defined following the procedure outlined by Swisher et al. (2007). We identified V1, V2, V3, V3a, V3b, V4, IPS0, IPS1, and IPS2 separately in each participant (Fig. 2A). We could not reliably identify IPS3 and IPS4 with confidence in all our participants. Consequently, we have limited our investigation only to IPS0 to IPS2. Following Todd and Marois (2004), superior IPS was identified in each participant using that participant's behavioral VSTM capacity K score (Cowan, 2001) (Fig. 2B). The statistical threshold for selecting superior IPS voxels was set to p < 0.001 (uncorrected) for 2 of the participants. This threshold was relaxed to 0.05 (uncorrected) in 3 participants and to 0.1 (uncorrected) in 2 participants to obtain at least 100 voxels across the two hemispheres. This produced an ROI with a range of 111–707 voxels and an average of 238 voxels across the participants. Following Kourtzi and Kanwisher (2000) and Xu and Chun (2006), inferior IPS (Fig. 2B), LO (Fig. 2C), and pFs (Fig. 2D) were defined as clusters of voxels in the lateral and ventral occipital cortex or inferior portion of IPS, respectively, that responded more to intact than scrambled object images (p < 0.001 uncorrected).
Multivariate pattern analysis (MVPA)
For each experiment and each task, we first performed a GLM analysis in each participant and obtained the β value for each category in each voxel of the brain and in each run. We then used the β values from all the voxels in each ROI as the fMRI response pattern for that ROI in that run. To remove response amplitude differences between categories, runs, and ROIs, we z-transformed the β values across all the voxels in an ROI for each category in each run. The resulting normalized data had the mean amplitude of 0 and the SD of 1. Following Kamitani and Tong (2005), we used a linear SVM and a leave-one-out cross-validation procedure to calculate the pairwise category decoding accuracy in each ROI separately for each task. As pattern decoding results could vary depending on the total number of voxels in an ROI, when comparing results from different ROIs, it is important to take into account variations in the number of voxels across the different ROIs and to equate the number of voxels in each ROI. To do this, we selected the 75 most informative voxels from each ROI using a t test analysis (Mitchell et al., 2004). Specifically, during each SVM training and testing iteration, we selected the 75 voxels with the lowest p values for discriminating between the two conditions of interest in the training data. An SVM was trained and tested only on these voxels. The decoding accuracies for all 28 pairs of comparisons were then pooled to determine the average decoding accuracy for each task in each ROI. Finally, we compared the average decoding accuracy across the two tasks to determine how goal-directed visual processing modulated object category representations in each ROI. Results from each participant were then combined to perform group-level statistical analyses.
To determine whether or not similar object category representations were formed in the two tasks, we performed cross-task decoding by training the classifier to discriminate pairs of categories in the shape task and testing its ability to discriminate the same pair of categories in the color task. An above chance decoding in this analysis would indicate that the representations formed in the two tasks shared a significant amount of similarity, enough to allow the classifier to generalize across the two tasks. We also compared these cross-task decoding results with those obtained in within-task decoding in which both the training and testing were done within the shape task. A significantly greater performance in the within- than cross-task decoding would indicate that the representations formed in the two tasks still differed. Because training and testing were always done on different runs for both types of decoding and there was no order effect in task presentation (see Experimental designs and procedure), differences between the within- and cross-task decoding could only be driven by the similarity of the object category representations formed in the two tasks.
To further examine whether object category representations formed in the two tasks differed, we measured task decoding in each ROI. Using SVM, we obtained the decoding accuracy for the same object category across the two tasks. To do so, adjacent runs in which different tasks were performed were paired together. In each iteration of the leave-one-out procedure, the classifier was trained to discriminate responses to the same object category across the two tasks in all but one pair of runs, and then tested on the left out pair of runs. This was done for each object category, and the results were then averaged across categories.
When results from each participant were combined to perform group-level statistical analyses, p values reported throughout the manuscript were corrected for multiple comparisons using the Benjamini–Hochberg procedure for false discovery rate controlled at q < 0.05 (Benjamini and Hochberg, 1995). In the analysis of the 13 ROIs, the correction was applied for 13 comparisons; and in the analysis of the three representative regions, the correction was applied for three comparisons.
In each experiment, to visualize how the similarities between the object categories were captured by a given brain region and how this similarity structure would be modulated by task, from the group-level pairwise SVM category decoding accuracy, we constructed a 16 × 16 dissimilarity matrix across both tasks (i.e., two tasks × 8 categories resulting in the matrix containing 16 rows and 16 columns). Figure 1D shows an example dissimilarity matrix. Each cell of this matrix contains the classification accuracy for discriminating one category-task pairing from another category-task pairing. For example, the cell in the second row third column of this matrix shows the classification accuracy for discriminating between cars and cats in the color task; and the cell in the second row second column shows the accuracy for discriminating cars between the two tasks. To obtain a meaningful dissimilarity matrix for the multi dimensional scaling (MDS) analysis, we then subtracted 0.5 from all cell values to obtain a dissimilarity matrix with the diagonal values set to 0. To avoid negative values, we replaced all values <0 with 0. On average, only 10% of the cells had such values; and those cells had an average value of −0.11 (note that the dissimilarity matrix in Fig. 1D shows an example dissimilarity matrix before this modification). This modification did not significantly change the average decoding results described above. This matrix served as the input to the MDS analysis (Shepard, 1980). The similarity structure of the 8 object categories across the two tasks was then projected onto a 2D surface where the distances between the categories reflected the similarities between the categories.
Results
In the present study, using fMRI pattern decoding, we compared shape-based object category representation in both the ventral and dorsal visual processing pathways when shape was task-relevant and when it was not. To take into account how the conjunction between the task-relevant and -irrelevant features may affect attentional selection, we systematically varied across three experiments the strength of color and shape conjunction, from partially overlapping, to overlapping but on separate objects, to being fully integrated.
We used images from 8 object categories (faces, bodies, houses, cats, elephants, cars, chairs, and scissors) (Fig. 1A). These categories were chosen as they cover a good range of natural object categories encountered in our everyday visual environment and were the typical categories used in previous investigations of object category representations in ventral visual cortex (e.g., Haxby et al., 2001; Kriegeskorte et al., 2008). Ten unique exemplars from the same object category were shown sequentially within a block of trials (Fig. 1A). These exemplars varied in identity, pose (for faces, cats and elephants), expression (for faces), and viewing angle to reduce the likelihood that object category decoding would be driven by the decoding of any particular exemplar or shape. To systematically vary how task-relevant and -irrelevant features were conjoined, in Experiment 1, a semitransparent colored square, in 1 of 10 colors, was superimposed on the object and the surrounding background, making both the objects and their immediate background colored (Fig. 1B, leftmost image). In Experiment 2, to increase the conjunction between color and shape, a set of semitransparent colored dots in 1 of 10 colors was shown on top of the objects, sharing the same spatial envelope as the objects (Fig. 1B, middle image). In Experiment 3, to fully integrate color and shape, each object appeared in 1 of 10 colors (Fig. 1B, rightmost image). Thus, from Experiments 1–3, the color feature became more integrated with object shape, going from partial to a complete overlap with the object shape. In different runs of the experiment, participants either attended object shapes and performed a shape one-back repetition detection task or attended colors and performance a color one-back task (Fig. 1C).
Response amplitude measures
We first examined response amplitude measures. In Experiment 1, responses were higher in the color than in the shape task in V1 and V2 (t(6) = 2.66, p < 0.05, and t(6) = 2.13, p = 0.09, respectively, both corrected for multiple comparisons using Benjamini–Hochberg procedure with false discovery rate set at q < 0.05; this applies to all subsequent t tests), did not show a task difference in either V3 or V3a (t values <1.71, p values >0.15, corrected), reversed direction, and were higher in the shape than in the color task in higher ventral regions V4, LO, and pFs (t values >3.74, p values <0.05, corrected) and dorsal regions, including V3b, IPS0–2, inferior IPS, and superior IPS (t values >2.66, p values <0.05, corrected). In Experiments 2 and 3, no region examined showed a task difference (t values <1.59, p values >0.6, corrected; for the details of the statistical results, see Table 1).
In Experiment 1, because both the objects and their surrounding background were colored, the color task likely drew participants' attention to a larger spatial area and to focus more on the colored background unoccupied by the objects, whereas the shape task likely drew attention only to the location occupied by the object. A space-based attention account could thus explain why early visual areas, with their smaller receptive field size than later areas, showed an increase in response in the color task when the area attended was greater. A space-based attention account could also explain why, in the shape-sensitive regions, responses were higher in the shape than in the color task, as attention to the colored background in the color task would not put shape at the focus of spatial attention and thus would not activate these regions as strongly as when shape was the focus of attention in the shape task. In Experiments 2 and 3, because color and shape occupied the same spatial envelope, color and shape were both at the focus of spatial attention in both tasks. This could explain why responses no longer differ between the two tasks in the shape-sensitive regions in those two experiments. Overall, across the three experiments, average fMRI response amplitude measures revealed very little difference between the two tasks across the two visual pathways.
Object category decoding and MDS analysis of category representations
Because a lack of response amplitude difference does not imply a lack of difference in representation (e.g., Kamitani and Tong, 2006; Liu et al., 2011), to better understand whether object category representations differ between the two pathways, we examined multivariate responses in each experiment. To remove differences in response amplitudes across different brain regions, we z-normalized the response amplitudes across all the voxels in a region for each object category in each run. Using a linear SVM classifier, we obtained pairwise decoding accuracy for each pair of object categories within and across tasks from the 75 most informative voxels in each ROI (Mitchell et al., 2004) (Fig. 1D).
To directly visualize how goal-directed visual processing would impact object category representation in a brain region, using the pairwise decoding accuracy for each pair of object categories as input, we performed a classical MDS analysis (Shepard, 1980) and projected the two dimensions that captured most of the representational variance among the categories onto a 2D surface with the distance between each pair of categories on this surface reflecting the similarity between them. To facilitate comparisons among early visual, ventral, and dorsal regions, we selected V1, pFs, and superior IPS as three representative regions and plotted the MDS results for these three regions (Fig. 3). V1 was chosen for early visual areas because it was the first cortical stage of visual information processing, pFs was chosen for ventral regions due to its role in visual shape processing and detection (Grill-Spector et al., 2000; Williams et al., 2007), and superior IPS was chosen for dorsal regions due to its ability to represent a variety of visual features (Xu and Jeong, 2015; Bettencourt and Xu, 2016; Jeong and Xu, 2016). In the MDS plots, while the categories from the two tasks appeared to be spread out to a similar extent in V1 and pFs in all three experiments, the spread was much greater for the shape than the color task in superior IPS in Experiments 1 and 2. This suggests that task had a stronger modulation in superior IPS than in V1 or pFs in terms of how distinctive object categories may be represented relative to each other within a task. Additionally, across all three experiments, while object category representations from the two tasks overlapped extensively in V1 and pFs, they were completely separated in superior IPS (i.e., the separation between the red and blue clusters in Fig. 3, rightmost panels).
These two observations suggest that goal-directed visual processing played a more dominant role in determining the distinctiveness of object category representation in superior IPS than in V1 or pFs. Whereas the representational structures of V1 and pFs predominantly reflect the differences among object categories, that of superior IPS reflects both the differences among object categories and the goal of visual processing. This suggests that object representation is more adaptive and task-sensitive in the dorsal than ventral regions. This difference in visual representation likely reflects a fundamental distinction of how visual information may be represented in the two processing pathways. In the results presented below, we provided detailed decoding analyses to quantify these observations.
Object category decoding in each task
To quantify our MDS observation that goal-directed visual processing had a greater impact on dorsal than early visual or ventral regions in terms of how distinctive object categories may be represented relative to each other within a task, in this analysis, for each brain region examined, we averaged the pairwise object category decoding over all possible object pairs in each task and compared the decoding accuracy between the two tasks. This analysis also allowed us to quantify how the strength of color and shape conjunction would impact decoding in the two tasks.
In all three experiments (Fig. 4), above chance object category decoding was observed in all early visual, ventral, and dorsal ROIs in both tasks (for a detailed report of the statistical results, see Table 2). Direct comparison between tasks revealed greater (significant or approaching significant) category decoding in the shape than the color task in Experiment 1 in all ROIs. In Experiment 2, this comparison was not significant in early visual regions V1, V2, and V3, ventral region V4, and dorsal regions V3B and IPS0, but significant in other higher level dorsal and ventral regions. In Experiment 3, decoding did not differ between the two tasks in any of the ROIs (Table 2).
To compare the decoding results across early visual, ventral, and dorsal regions, we focused on our three representative regions, namely, V1, pFs, and superior IPS. We subtracted the decoding accuracy in the color task from that in the shape task to calculate the amount of task modulation on object representations in each region. We then performed a two-way repeated-measures ANOVA with experiment and region as factors. We found significant effects of experiment (F(2,12) = 9.03, p < 0.01) and region (F(2,12) = 11.26, p < 0.01), but no interaction between the two (F(4,24) = 1.63, p = 0.2). The effect of experiment was driven by an overall stronger (significant or approaching significant) task effect in Experiment 1 than in either Experiment 2 or 3, with no difference between the latter two (for the detailed statistical results of the pairwise comparisons, see Fig. 5). The effect of region was driven by a greater task modulation in superior IPS than in either pFs or V1, with no difference between the latter two (Fig. 5). Within each individual experiment (Fig. 5), in both Experiments 1 and 2, the effect of task was greater (significant or approaching significant) in superior IPS than in either V1 or pFs, with no difference between the latter two. In Experiment 3, however, effect of task did not differ among the three brain regions.
In summary, when object shape was task-relevant, object category could be decoded in all the ventral and dorsal regions examined in all three experiments. When color and shape partially overlapped in Experiment 1, attention to color reduced object category decoding in all the regions, with the effect being greater in the dorsal than ventral pathway. When color appeared on a set of dots overlapping the objects and sharing the same spatial envelope in Experiment 2, attention to color did not affect object category decoding in early visual areas but still affected decoding in higher ventral and dorsal regions, again with the effect being greater (marginally significant) in the dorsal than ventral pathway. When color appeared on the objects in Experiment 3, attention to color, by and large, did not impact object category decoding across brain regions. Thus, consistent with our observation from the MDS plots (Fig. 3), goal-directed visual processing had a greater impact on superior IPS than V1 or pFs in terms of how distinctive object categories may be represented relative to each other within a task. However, this only occurred when color and object shape were not fully conjoined (Fig. 4A,B). When they were integrated to form a single coherent object, differences between tasks and between brain regions were no longer present (Fig. 4C; for more supporting evidence on this, see the next two sets of analyses).
Despite the variability of the exemplars used for each object category, we found above chance category decoding in early visual areas. This indicates the presence of low-level features indicative of category membership despite changes in viewpoint. This is not surprising as many categories may be differentiated based on low-level features even with a viewpoint change, such as curvature and the presence of unique features (e.g., the large round outline of a face/head, the protrusion of the limbs in animals) (Rice et al., 2014). In any regard, above chance category decoding in all the regions examined allowed us to systematically compare how visual representations may be differentially modulated by goal-direct visual processing.
Cross-task decoding of object category
Although significant object category decoding existed in both tasks, it was unclear whether or not similar representations were formed in the two tasks. To test this, in this analysis, we performed cross-task decoding by training the SVM classifier using the data from the shape task and testing it on the data from the color task. If similar representations were formed in the two tasks, then above chance cross-task decoding would be expected. Moreover, if representations were identical in the two tasks, we expected to see no drop in cross-task decoding compared with within-task decoding in which training and testing were both done with the data from the shape task (within-task decoding is essentially the category decoding in the shape task from the previous analysis). Any drop in performance would indicate some difference in the representations formed in the two tasks.
In all three experiments (Fig. 6), above chance cross-task decoding was observed in all the ROIs (for a detailed report of the statistical results, see Table 3). Comparison between cross- and within-task decoding accuracy in Experiment 1 revealed lower cross- than within-task decoding in all the ROIs. In Experiment 2, lower cross- than within-task decoding was observed in higher ventral and dorsal regions, but not in early visual regions. In Experiment 3, the difference between cross- and within-task decoding disappeared in most ROIs, except in IPS1, which showed higher within- than cross-task decoding, and V2, which showed an opposite pattern of results (Table 3).
Next, we calculated the amount of task modulation in this analysis by subtracting cross- from within-task decoding accuracy in the three representative regions V1, pFs, and superior IPS. A two-way repeated-measures ANOVA showed significant effects of experiment (F(2,12) = 29.79, p < 0.001), region (F(2,12) = 22.54, p < 0.001), and an interaction between the two (F(4,24) = 0.0079, p < 0.01). The effect of experiment was driven by a greater task modulation in Experiment 1 than in either Experiment 2 or 3, and a greater task modulation in Experiment 2 than Experiment 3 (for the detailed statistical results of the pairwise comparisons, see Fig. 7). The effect of region was driven by a greater task modulation in superior IPS than in either V1 or pFs, and a marginally greater task modulation in pFs than V1 (Fig. 7). Within each individual experiment, superior IPS exhibited a greater (significant or approaching significant) task modulation than either V1 or pFs in all three experiments. Task modulation in pFs did not differ from V1 in Experiments 1 and 3 but was higher than V1 in Experiment 2 (Fig. 7).
Overall, across the three experiments, significant cross-task decoding was observed in all the ROIs, indicating the existence of similar object category representations whether or not object shape was directly attended and task-relevant. Nevertheless, compared with within-task decoding, cross-task decoding was significantly weaker when object shape and color did not completely overlap in Experiment 1 in all the early visual, ventral, and dorsal regions examined. When object shape and color shared a spatial envelope, but did not completely overlap in Experiment 2, although the difference between within- and cross-task decoding remained in higher ventral and most dorsal regions, it was absent in early visual areas. In Experiment 3, when object shape and color were fully integrated, the difference between the two types of decoding was largely absent in most regions examined. Direct comparisons among the three representative regions revealed a stronger task effect in superior IPS than in either V1 or pFs, even when features were fully integrated in Experiment 3.
It is worth noting that, in a given brain region, when the within-task decoding accuracy was lower for the color than shape task, it was usually accompanied by a drop in cross-task decoding accuracy when the classifier was trained with the shape task data and tested on the color task data. This suggests that attention to color in these cases likely prevented detailed object shape processing and resulted in degraded category representations.
Results from both object category decoding and cross-task decoding showed that object category representations in both early visual and ventral regions were significantly influenced by goal-directed visual information processing, such that, within a task, object category representations were more distinctive with respect to each other when they were task-relevant than when they were not. This is consistent with previous reports showing attentional modulations of visual responses in these brain regions (Gandhi et al., 1999; Martínez et al., 1999; O'Craven et al., 1999; Somers et al., 1999; Murray and Wojciulik, 2004; Reddy et al., 2009; see also Çukur et al., 2013; Harel et al., 2014). Nevertheless, this task effect was not captured by our MDS plots (Fig. 3), which projected the representational structure of a brain region based on the two dimensions that captured most of the representational variance. In the MDS plots, the categories from the two tasks appeared to be spread out to a similar extent in V1 and pFs in all three experiments. This suggests that goal-directed visual processing likely plays a relatively minor role in driving the representational structure of early visual and ventral object regions. On the other hand, consistent with the decoding results from superior IPS, in the MDS plots, the spread of the categories was much greater for the shape than the color task in superior IPS in Experiments 1 and 2. This suggests that object category information and goal-directed visual processing both played important roles in shaping the representational structures of the dorsal regions.
The decoding of task-related information
In addition to modulating how distinctive object categories may be represented relative to each other within a given task, in the MDS plots, task also appeared to separate object category representation into two clusters based on the task performed in superior IPS, but not in V1 or pFs. To quantify this observation, in this analysis, we compared task decoding as well as the relative strength of task and category decoding across the two pathways.
To obtain task decoding accuracy, we asked the classifier to decode the two instances of each object category across the two tasks and then averaged the decoding performance over all the categories. Given that category decoding was overall stronger in the object shape than the color task in all the regions examined, we took the accuracy from the object shape task as our measure of the strength of category decoding. The presence of task separation of object category representation in superior IPS in the MDS plot (Fig. 3) would predict a stronger task than category decoding or equally strong task and category decoding in the dorsal regions, whereas the absence of this separation in V1 and pFs in the MDS plot would predict a much stronger category than task decoding in early visual and ventral regions.
In Experiment 1, task decoding was significant in all examined regions. Task decoding was lower than category decoding in all early visual and ventral regions as well as V3A and inferior IPS, was not significantly different from category decoding in V3B, IPS0, and IPS1, reversed direction, and was higher than category decoding in IPS2 and superior IPS (Fig. 8A; for a detailed report of the statistical results, see Table 4). In Experiment 2, task decoding disappeared in V1 and V2 but was still present in all other regions. Task decoding in this experiment was lower than category decoding in all regions, except for IPS1, IPS2, and superior IPS (Fig. 8B; Table 4). In Experiment 3, task decoding was significant in all regions, except for V1, V2, V3, V4, and V3A (Fig. 8C; Table 4). Task decoding in this experiment was lower than category decoding in all regions, except for IPS1, IPS2, and superior IPS (Fig. 8C; Table 4).
To directly compare task decoding in early visual, ventral, and dorsal regions, we examined responses from the three representative brain regions, V1, pFs, and superior IPS. A two-way repeated-measures ANOVA with experiment and region as independent variables and task decoding accuracy as the dependent variable revealed a significant effect of experiment (F(2,12) = 26.84, p < 0.001) and region (F(2,12) = 63.03, p < 0.001), but no interaction between the two (F(4,24) = 1.98, p = 0.13). The effect of experiment was driven by higher task decoding in Experiment 1 than in either Experiment 2 or Experiment 3, and higher task decoding in Experiment 2 than Experiment 3 (for the detailed statistical results of the pairwise comparisons, see Fig. 9A). The effect of region was driven by higher task decoding in superior IPS than in either V1 or pFs, and higher task decoding in pFs than V1. Within each individual experiment, in both Experiments 1 and 3, task decoding was greater in superior IPS than in either V1 or pFs, and greater in pFs than V1. In Experiment 2, V1 had lower task decoding than either pFs or superior IPS, with no difference between the latter two (Fig. 9A).
We also compared the relative strength of task and category decoding by subtracting the task decoding accuracy from the category decoding accuracy in the three representative regions and performed a two-way repeated-measures ANOVA with experiment and region as factors. We found a significant effect of experiment (F(2,12) = 49.93, p < 0.001), region (F(2,12) = 79.67, p < 0.001), and an interaction between the two (F(4,24) = 5.3198, p < 0.01). The effect of experiment was driven by a smaller between category and task decoding difference in Experiment 1 than in either Experiment 2 or 3, with no difference between the latter two (for the detailed statistical results of the pairwise comparisons, see Fig. 9B). The effect of region was driven by a smaller between category and task decoding difference in superior IPS than in either V1 or pFs, with no difference between the latter two (Fig. 9B). Within individual experiments, there was a smaller between-category and task-decoding difference in superior IPS than in either V1 or pFs in all three experiments. In Experiment 2, the difference between these two types of decoding was smaller in pFs than V1; this difference, however, was not significant in Experiments 1 and 3 (Fig. 9B).
Overall, these results showed a much stronger category than task representation in early visual regions and ventral regions, but a stronger or equally strong task and category representation in higher dorsal regions. They provided a quantitative description of the task separation of object category representations seen in superior IPS but not in V1 or pFs in the MDS plots (Fig. 3).
Behavioral results
In all three experiments, participants were instructed to detect a repetition in either object shape or color, and a speeded response was never emphasized. We thus only recorded detection rates but not speed. Detection rates were high in all three experiments, and they were (color and shape task) as follows: 92.7 ± 0.06 and 96.8 ± 0.05 in Experiment 1, 94.5 ± 0.05 and 94.9 ± 0.05 in Experiment 2, and 93.1 ± 0.06 and 97.1 ± 0.04 in Experiment 3. A two-way repeated-measures ANOVA with task and experiment as factors showed a significant effect of task (F(1,6) = 10.39, p < 0.05), no significant effect of experiment (F(2,12) = 0.13, p = 0.88), and a significant interaction between the two (F(2,12) = 8.83, p < 0.01). Looking at individual experiments, the detection rate was higher for the shape than color task in both Experiments 1 and 3 (t values >3.5, p values <0.05), but not in Experiment 2 (t(6) = 1.16 p = 0.29).
This pattern of behavioral results did not track fMRI decoding accuracy across tasks and experiments. In both Experiments 1 and 3, behavioral performance was lower for the color than shape task. Yet the decoding results between these two experiments differed significantly across a number of different measures, with between task difference observed in Experiment 1 but not in Experiment 3. In Experiment 2, even when there was no difference in behavioral task performance, we still observed task difference in decoding in a number of brain regions. Thus, there was no consistent relationship between task difficulty and fMRI decoding, and it is unlikely that behavioral task performance directly contributed to the observed fMRI decoding results.
We also collected eye position data during the MRI scan sessions. Participants were able to maintain fixation throughout the experiment. We analyzed the eye position data after removing saccades. To correct for eye movement measurement drifts across experimental runs, we measured the median deviation in eye position during the stimulus presentation blocks compared with the baseline conditions in each task, participant, and experiment. Across experiments and tasks, the deviation in eye position did not exceed 0.55 degrees in either the horizontal or vertical direction. Additionally, these deviations did not systematically vary across tasks in any of the three experiments (t values <2.1, p values >0.09).
Discussion
Recent studies have reported a convergence between the human ventral and dorsal visual processing pathways in their abilities to represent visual information, challenging the validity of the two-pathway distinction first laid out by Mishkin et al. (1983). Because the dorsal pathway has long been implicated in attention-related processing, here we examined whether or not goal-directed visual information processing may differentially impact visual representations in the two pathways. To take into account the conjunction strength between the task-relevant and -irrelevant features and its influence on attentional selection and goal-directed visual processing, using fMRI MVPA, we systematically varied across three experiments the strength of color and shape conjunction, from partially overlapping, to overlapping but on separate objects, to being fully integrated. We compared shape-based object category representations in two tasks when object shape was task-relevant and when it was not. We found that object category representations in early visual, ventral, and dorsal regions examined were all significantly influenced by whether or not object shape was task-relevant. This task effect, however, tended to decrease when task-relevant and -irrelevant features were more integrated, reflecting object-based feature encoding. Most significantly, we found that dorsal visual representations exhibited a greater sensitivity to goal-directed visual information processing than those in early visual and ventral regions, such that object category representations became more distinctive from each other when they became task-relevant. This was found in the comparison of object category decoding between the two tasks, the comparison between within- and cross-task decoding, direct task decoding, and the comparison between task and object category decoding. Additionally, these results showed a much stronger category than task representation in early visual and ventral regions, but a stronger or equally strong task and category representation in higher dorsal regions. Our MDS analysis further illustrated that, whereas the representational structures of early visual and ventral regions were predominantly shaped by object categories regardless of their task relevance, that of a dorsal region was jointly determined by both object category and task relevance.
Harel et al. (2014) used three physical and three conceptual tasks and a correlational analysis in an event-related design. By averaging comparisons over all six tasks, Harel et al. (2014) observed task-independent object representations in early visual cortex, LO, and parietal cortex, but task-dependent representations in pFs. They additionally reported task decoding among the physical tasks in early visual cortex, LO, pFs, and parietal cortex. Although the second set of results are consistent with our findings (Table 4; Fig. 8), the first set differ. With a simple perceptual task, we observed task-dependent visual representations in the dorsal regions for loosely conjoined features but not for fully integrated features due to object-based processing. Some of the perceptual tasks and all the conceptual tasks used by Harel et al. (2014) invoked object-based processing. Thus, averaging over all tasks likely overemphasized object-based processing and washed out the task effect we observed here. Our use of a block design and the SVM classifier, as well as the inclusion of only two perceptual tasks, increased power and placed us in a better position to detect the impact of task on visual representation in both the ventral and dorsal regions (Coutanche et al., 2016).
By examining responses in an action and an object category task, using fMRI MVPA, Bracci et al. (2017) reported greater task modulation of visual representation in parietal than in occipitotemporal region, with no representation found in parietal region for the task-irrelevant information. Although their overall conclusion is consistent with ours, the use of an event-related design likely weakened the effect and prevented them from observing the representation of the task-irrelevant information in the dorsal region. It is also possible that the representation of high-level semantic category and action information is less automatic and more under top-down control in Bracci et al. (2017) than the representation of the shape-based object category information studied here. In any event, the present study shows that task-irrelevant visual information can be represented in dorsal regions, albeit much weaker than that in early visual and ventral regions.
Previous monkey neurophysiology (Stoet and Snyder, 2004) and human fMRI MVPA decoding studies (Woolgar et al., 2011, 2011) have shown that task rules can be represented in posterior parietal cortex (PPC). In all of these studies, the representation of a task rule involved a unique stimulus-response mapping. Because PPC has been regarded as an interface between sensory and motor processing that facilitates sensorimotor transformations (Andersen and Cui, 2009), the representation of stimulus-response mapping rules in PPC is expected. In the present study, the two tasks used involved identical motor responses, yet we still found robust representation of the task information. Thus, task information independent of stimulus-response mapping could also be represented in PPC.
Attention to specific visual features can increase the gain of the neuronal responses to these features (electrophysiology studies: e.g., Motter, 1994; McAdams and Maunsell, 1999; Reynolds et al., 2000; and human imaging studies: e.g., Wojciulik et al., 1998; Serences et al., 2004; Baldauf and Desimone, 2014), and produce a tuning shift of the neuronal responses (Connor et al., 1997; David et al., 2008) or fMRI voxels (Çukur et al., 2013). Both of these changes could result in increased selectivity to the attended feature and thus more distinctive fMRI response patterns (Peelen et al., 2009; Reddy et al., 2009). This could explain the overall better object category decoding in our shape task and a drop in this decoding in the color task.
In our study, we used photographs of real-world objects. Because both the shapes and colors used were easily discriminable and salient, our one-back repetition detection tasks were fairly easy. This low target-processing load likely encouraged shape-based object category processing even when they were task-irrelevant and could explain why, in all three experiments, we observed significant object category decoding during the color task. Similarly, in another study, highly engaging face and gazebo distractors shown during the delay period of a VSTM task were also found to be encoded in the dorsal regions (Bettencourt and Xu, 2016). This may also explain why passive viewing or attention to fixation did not modulate shape representation in an adaptation study (Konen and Kastner, 2008). These results suggest that highly salient task-irrelevant information is difficult to ignore, especially because its processing poses no measurable cost to that of the task-relevant feature. This is consistent with monkey neurophysiology studies showing the prominent role of saliency in driving parietal responses (e.g., Gottlieb et al., 1998; see also Constantinidis and Steinmetz, 2005; Bisley and Goldberg, 2006). Nevertheless, the present study showed that dorsal regions exhibited a stronger filtering of the task-irrelevant information than ventral regions, consistent with prior findings using fMRI response amplitude measures (e.g., Xu, 2010; Jeong and Xu, 2013).
Depending on the goal of the observer, the unit of attentional selection can be location (Posner, 1980), feature (Maunsell and Treue, 2006), or object (Scholl, 2001). In Experiment 1, when diverting attention away from the object shape was accompanied by a change in the spatial extent of attention, object category representation decreased in all regions examined. Thus, both ventral and dorsal regions were sensitive to the location of visual information and could partially filter out irrelevant information at an unattended location. This is consistent with the presence of topographic maps throughout the ventral and dorsal regions (Silver and Kastner, 2009) and the report of strong location representation in higher ventral regions (e.g., Schwarzlose et al., 2008; Zhang et al., 2015; Hong et al., 2016). In Experiment 2, when the spatial extent of attention was fixed, attending to another stimulus at the same location decreased object category representation only in higher ventral and dorsal regions, but not in early visual areas. Thus, processing in early visual areas seems to be largely space-based such that all information at the attended locations is automatically encoded regardless of its task relevance and how features are conjoined. Dorsal regions and higher ventral regions, on the other hand, can select which visual object to encode and partially filter out information from an unattended object at the same location. This was confirmed in Experiment 3 when attention was directed to another feature of the same object. Here, diverting attention away from the object shape did not affect object category representation in most regions examined. This is consistent with the characteristics of object-based encoding effects reported in both prior behavior and imaging studies (e.g., Duncan, 1984; Luck and Vogel, 1997; O'Craven et al., 1999).
Overall, our results showed that, whereas ventral regions encode visual information in a more invariant manner with goal-directed visual processing playing a relatively minor role, dorsal regions adaptively combine object and task information to support goal-directed visual information representation. Thus, whereas ventral regions are more concerned with “what an object is,” dorsal regions care more about “what we do with it.” These results likely illustrate a fundamental difference the two visual processing pathways may play in visual information representation.
Footnotes
This work was supported by National Institutes of Health Grant 1R01EY022355 to Y.X. We thank Katherine Bettencourt for assistance in localizing the parietal topographic maps and Michael Cohen for some of the images used in the experiment.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Maryam Vaziri-Pashkam or Yaoda Xu, Vision Sciences Laboratory, Department of Psychology, Harvard University, 33 Kirkland Street, Cambridge, MA 02138. mvaziri{at}fas.harvard.edu or xucogneuro{at}gmail.com