Our visual experience is generally not of isolated objects, but of scenes, where multiple objects are interacting. Such interactions (e.g., a watering can positioned to pour water toward a plant) have been shown to facilitate object identification compared with when the objects are depicted as not interacting (e.g., a watering can positioned away from the plant) (Green and Hummel, 2004, 2006). What is the neural basis for this advantage? Recent fMRI studies have identified the lateral occipital cortex (LO) as a potential neural origin of this behavioral benefit, as LO showed greater responses to object pairs depicted as interacting compared with when they are not (Kim and Biederman, 2010; Roberts and Humphreys, 2010). However, it is possible that LO was modulated by an attention-sensitive region, the intraparietal sulcus (IPS), which sometimes showed a similar pattern of responses as that of LO in the Kim and Biederman (2010) investigation. To test this hypothesis, we delivered transcranial magnetic stimulation (TMS) to human subjects' LO and IPS while they detected a target object that was or was not interacting with another object to form a scene. TMS delivered to LO but not IPS abolished the facilitation in identifying interacting objects compared with noninteracting depictions observed in the absence of TMS, suggesting that it is LO and not IPS that is critical for the coding of object interactions.
Objects in our environment typically appear in scenes, where they tend to be interacting with other objects. These relations are automatically processed and have been shown to affect the perceptibility of the objects themselves (Biederman et al., 1974, 1982; Green and Hummel, 2006). For example, Green and Hummel (2006) showed that object recognition is facilitated when a pair of objects is depicted as interacting (e.g., a pitcher positioned to be pouring into a glass) to form a scene compared with when the objects are not interacting (e.g., the pitcher positioned to be pouring away from the glass). This behavioral benefit, which we term the scene-facilitation effect, has been shown in other behavioral tasks including visual search (Green and Hummel, 2004) and cued recall (Epstein et al., 1960).
Recent fMRI studies have shown that pairs of objects depicted as interacting elicit greater activity than noninteracting pairs in the lateral occipital cortex (LO) (Kim and Biederman, 2010; Roberts and Humphreys, 2010), a region critical for shape-based object recognition (Malach et al., 1995; James et al., 2003), rendering LO a potential candidate for the locus of the origin of the scene-facilitation effect. However, the intraparietal sulcus (IPS) also showed (although less consistently) a pattern of responses similar to that of LO (Kim and Biederman, 2010), leaving open the possibility that activity in LO is dependent on IPS (or vice versa).
IPS has been shown to be sensitive to visual attentional demands (Wojciulik and Kanwisher, 1999; Kanwisher and Wojciulik, 2000), and this activity can, in turn, modulate activity in ventral visual areas (Kastner et al., 1999; Martínez et al., 1999). That IPS activity is specific to visual processing and not driven by any effortful task has been shown by Wojciulik and Kanwisher (1999). An overlapping region in IPS was active across tasks including peripheral shifting, object matching, and a nonspatial conjunction task, but it was not active when the same group of subjects engaged in a language task. To the extent that interacting objects could attract more attention (as such pairs may elicit more interpretation), it is possible that the previous fMRI studies showing greater LO activity to interacting than noninteracting objects was dependent on IPS.
To determine whether LO and/or IPS might be critical for processing object interactions, we compared performance in a target detection task when offline theta burst repetitive transcranial magnetic stimulation (TMS) was delivered to the right LO (rLO) and right IPS (rIPS) and when TMS was not delivered. TMS was delivered only to the right hemisphere as lesions to the right hemisphere produce greater deficits in scene processing than lesions to the left (Milner, 1958). TMS administered to a particular region of interest (ROI) can disrupt the normal processing temporarily, allowing a test as to whether that region is essential for a specific cognitive process. If normal LO or IPS activity is required for producing the scene-facilitation effect, TMS delivered to LO or IPS would be expected to abolish that effect.
Materials and Methods
Twelve subjects (nine men, mean age = 22.8 years, range: 19–28 years; all were right handed and all had normal or corrected-to-normal vision) who were native Chinese speakers from the National Central University participated in the experiment. Participants received monetary compensation, were screened for safety, and gave informed consent in accordance with procedures approved by the local ethics committee.
Stimuli and procedures.
Stimuli were selected from a set of 46 line drawings of individual objects. These were combined, pairwise, to make 23 different two-object interacting scenes. The noninteracting scenes were created by mirror reversing either one or both of the objects in each scene (Fig. 1a). Each object subtended an average of 2° × 2° and the center of each scene was presented 4.5° either to the left or right of central fixation.
Subjects performed the target detection task, modified from that of Green and Hummel (2006), where on each trial, two objects were depicted as either interacting (Inter) or not interacting (No-Inter). Subjects responded with button presses if a target label (written in Chinese characters) matched one of the two simultaneously presented objects (Fig. 1b). Critically, the target label was shown after the object pairs so as to not bias the subjects to look for the target object when the stimuli appeared. The objects were shown either in the left visual field (LVF), contralateral to the TMS site, or the right visual field (RVF), ipsilateral to the stimulation site, chosen randomly with equal probability. The object pairs were followed by a mask, which was created by randomly selecting four objects, rotating them 90°, and dividing each object into an 8 × 8 grid, whose cells were then shuffled and randomly selected for one of the 64 positions. Fifty percent of the trials were target-match trials. Participants were instructed to maintain central fixation and to respond as quickly and as accurately as possible.
Each participant completed three sessions (No-TMS, TMS to LO, and TMS to IPS) across 2 different days. The TMS sessions were separated by 1 d and the order of the sessions was counterbalanced across subjects. During each session, participants completed two blocks, each consisting of 368 trials, lasting ∼12 min each. Before the first session, participants were given practice trials with 16 new objects not included in the main experiment.
To confirm that target detection in our experimental setup would be more accurate when object pairs are shown as interacting compared with noninteracting, as had been reported by Green and Hummel (2006), we ran a preliminary study with 14 subjects that replicated their results with higher d′ values for interacting (3.2) than noninteracting (2.9) depictions, t(13) = 3.61, p = 0.003.
fMRI parameters and ROI localization.
Functional and anatomical MRI scans were performed for each subject to determine coordinates for the TMS sites using a Siemens MAGNETOM Trio 3T scanner with a 12-channel coil at the MRI Research Center, National Yang-Ming University. One anatomical (T1-weighted scan with the MPRAGE sequence with TR = 1950 ms, TE = 2.26 ms, 1 × 1 × 1 mm voxels, and 160 sagittal slices) and two functional scans (using a T2*-weighted echo planar sequence with TR = 2000 ms, TE = 30 ms, FOV = 192, flip angle = 90°, voxel size = 3 × 3 × 3 mm, and 33 transversal slices) were run for each subject.
Functional localizer runs were composed of sixteen 12 s blocks with alternating blocks between intact objects, places, faces, and scrambled images. Each image subtended a visual angle of 6° × 6° presented centrally. Subjects were asked to passively view the stimuli. For each subject, rLO (Fig. 2a) was defined by comparing the contrast of object minus scrambled with a t-map threshold of p < 0.05, Bonferroni corrected in the dorsal–caudal region in the right occipitotemporal region. Right IPS (Fig. 2b) was similarly defined as those voxels with the same contrast but in the intraparietal region.
This method of defining IPS was similar to that used by Xu and colleagues (Xu and Chun, 2006; Xu, 2008). By defining IPS this way, we guaranteed that this is the part of the parietal cortex that is specifically sensitive to object processing. Indeed, almost identical IPS loci have been identified as being involved in different kinds of visual attention tasks (Wojciulik and Kanwisher, 1999). Given the spatial resolution of TMS, it is highly likely that the region of IPS that we were stimulating overlapped to a large extent with the region implicated in visual attention. Although we did not use an attention task to functionally localize IPS, it is likely that merely viewing the intact objects would engage visual attention to a greater extent than viewing scrambled meaningless images because it is impossible to refrain from identifying meaningful objects. This point was documented by Smith and Magee (1980), who showed that when required to classify a word, “shirt,” for example, as an article of clothing, subjects evidenced marked Stroop-like interference when the word was presented against a picture from another category (e.g., a sofa).
The fMRI scans were preprocessed and analyzed using BrainVoyager (Brain Innovation). All functional images were coregistered to each individual subject's anatomical scan. The anatomical scans were transformed into Talairach coordinates (Talairach and Tournoux, 1988), on which the statistical contrasts were performed to define the ROIs. Peak activation coordinates for rLO and rIPS were transformed back into each individual subject's native space using FSL software (FMRIB). Before each TMS session, each participant's TMS ROI site was coregistered to the anatomical MRI scan using the Brainsight system (Rogue Research) and the Polaris infrared tracking system (Northern Digital).
The mean peak Talairach coordinates across subjects for rLO (39.8, −71.5, −6.5) and rIPS (26.8, −76.9, 29.8) were comparable to previously reported coordinates (Xu and Chun, 2006; Kim et al., 2009; Kim and Biederman, 2010).
TMS and data analysis.
A Magstim Super Rapid Stimulator was used to deliver TMS pulses to LO and IPS using a figure-of-eight coil with a diameter of 70 mm. A theta burst stimulation protocol was the same as that used in the study by Vallesi et al. (2007) with the following parameters: three pulses given at 50 Hz every 200 ms (at 0 ms, 20 ms, and 40 ms followed by 160 ms of rest) for 20 s. This resulted in a total of 300 pulses per session, with the effect of TMS expected to last ∼20–30 min (Huang et al., 2005; Nyffeler et al., 2006). A single threshold of 40% of the maximal stimulator output (2 tesla) was used based on past studies showing reliable TMS effects across a wide range of tasks (Liu et al., 2010; Chao et al., 2011) and because motor cortex excitability does not provide a good guide to TMS thresholds in other cortical regions (Stewart et al., 2001). The coil handle was placed at each ROI pointed upward and parallel to the midsagittal plane.
A repeated measures 3 × 2 × 2 ANOVA with TMS site (No-TMS, LO, and IPS), visual field (LVF vs RVF), and relation (Inter vs No-Inter) was run with the d′ and reaction time (RT) data.
There was no overall difference in performance to the target detection task across the different stimulation conditions, F(2,10) = 0.74, p = 0.5. Consistent with previous findings (Green and Hummel, 2006, and the pilot results), subjects were more accurate (higher d′ values) when object pairs were depicted as interacting than noninteracting, showing a positive scene-facilitation effect, F(1,11) = 5.86, p = 0.03 (Fig. 3). Across both visual fields, the accuracy of the interacting and noninteracting conditions depended on the TMS site, producing a two-way interaction of TMS site and relation, F(2,10) = 11.94, p = 0.002. The amount of the scene-facilitation effect varied across TMS site and visual field, producing a significant three-way interaction, F(2,10) = 4.57, p = 0.04. The interaction of TMS site and visual field was not significant, F(2,10) = 0.20, p = 0.82.
A within-subjects contrast analysis revealed that the magnitude of the scene-facilitation effect was significantly reduced when TMS was delivered to LO (Δd′ = 0.18) compared with the No-TMS (Δd′ = 0.23) condition, F(1,11) = 14.15, p < 0.01. The scene-facilitation effect did not differ between No-TMS and IPS sessions (Δd′ = 0.25), F(1,11) < 1.00. Whereas the magnitude of the scene-facilitation effects did not differ across visual fields in the No-TMS session (Δd′ = 0.25 for LVF and Δd′ = 0.22 for RVF), it was significantly lower for the LVF trials (Δd′ = −0.41) than for the RVF trials (Δd′ = 0.04), when TMS was delivered to LO, F(1,11) = 9.60, p = 0.01. The comparison between the magnitudes of the differences of the scene-facilitation effects across visual fields for IPS (Δd′ = 0.23 for LVF and Δd′ = 0.26 for RVF) and No-TMS was not significant, F(1,11) < 1.00.
Critically, the positive scene-facilitation effect observed in the baseline condition was maintained when TMS was delivered to IPS but was completely lost when TMS was delivered to LO.
Although the mean accuracy after TMS was delivered to IPS was higher than that of the No-TMS condition, these differences were not close to reaching significance and thus likely reflect noise. The statistical comparison for the LVF trials between the Inter No-TMS and Inter IPS conditions was t(11) < 1.00, p = 0.46; that between the LVF trials for the No-Inter No-TMS and No-Inter IPS conditions was t(11) < 1.00, p = 0.41. Because the No-TMS, LO, and IPS conditions were divided into three sessions across 2 different days, effects of practice and sessions likely contributed to the variability of performance across these conditions but not to the Inter and No-Inter comparisons, which were run in the same session.
The differences in RTs across conditions were very small, but subjects were reliably faster for the interacting (859 ms) than the noninteracting (864 ms) conditions, F(1,11) = 9.97, p = 0.009. RT was also marginally faster for the LVF (859 ms) than RVF trials (865 ms), F(1,11) = 4.61, p = 0.06. There were no other reliable effects on RT (0.30 < p values < 0.76).
In the absence of TMS, accuracy of the target detection task was greater when the object pairs were shown as interacting than noninteracting. The benefit of interactions, which previously has been shown to depend on the grouping of the two objects into a single percept (Green and Hummel, 2004, 2006; Riddoch et al., 2010), is consistent with various behavioral studies demonstrating that object recognition is subject to contextual influences (Biederman, 1972; Palmer, 1975; Biederman et al., 1982). The application of TMS to LO, but not IPS, abolished the benefit of the interactions, thus demonstrating that LO, and not IPS, is a critical region for the coding of object interactions.
The coding of object relations in LO, and not IPS, is consistent with studies with parietal lobe patients who manifest extinction. Whereas these patients show failure to report one of two simultaneously presented objects, they show significant recovery of extinction when the two objects are depicted as interacting with each other (Riddoch et al., 2003). The recovery from extinction depends on the functional relationship between the objects (e.g., a pencil positioned toward a ruler to draw a line) and not the semantic associations or the statistical regularities in which they tend to co-occur in real life (e.g., a pencil and a pen). It is possible that an intact LO allows organization of the two objects into one perceptual unit by encoding the spatial relations between the two objects.
The direction of the scene-facilitation effect for the contralateral trials after TMS was delivered to LO was opposite (negative) to that of the No-TMS condition, meaning that not only was the facilitation from the interactions lost but there was a decrement of performance when object pairs were shown as interacting compared with when they were depicted as noninteracting. One, admittedly speculative, account of this effect is that the relations in the Inter condition presented familiar patterns between objects that invited interpretations that could be readily achieved in the absence of TMS. These interpretations could have facilitated object identification. With TMS delivered to LO, however, subjects might have experienced greater difficulty in achieving such interpretations with a resultant cost in their capacity for identifying the individual objects. Such interpretive difficulty is suggested by the finding by Milner (1958) that right temporal lesions reduced the ability of subjects to detect scene anomalies.
TMS delivered to LO had a larger effect on scenes presented to the contralateral than the ipsilateral visual field, an effect consistent with the greater representation of objects presented in the contralateral than ipsilateral visual fields in that area (Grill-Spector et al., 1998; Tootell et al., 1998).
IPS is generally considered to be part of an attentional network that is selectively sensitive to shape responses (Wojciulik and Kanwisher, 1999; Denys et al., 2004), which can selectively modulate responses in ventral visual areas associated with enhanced processing (Kastner et al., 1999; Martínez et al., 1999). TMS delivered to IPS did not change the behavioral benefit of coding of object interactions. Whatever general attentional effect IPS has on LO, it does not seem to affect the processing coding of interactions between objects.
Activity in IPS has also been implicated in visuomotor responses to static objects that have implied action (Sakata et al., 1995; Grèzes and Decety, 2002). To the extent that interacting objects afford more effective action than noninteracting objects, action affordances could be the source of the scene-facilitation effect. Our results suggest that should action affordance be the source, it is not driven by IPS activity. Because IPS is likely a subregion of a potentially large number of areas involved in processing visual action affordances (Riddoch et al., 2003; Culham and Valyear, 2006), future investigation of other regions is warranted to assess their possible roles in the processing of action relatedness.
LO is the earliest region in the ventral pathway where intact shapes are distinguished from their scrambled counterparts (Malach et al., 1995), and it is critical for shape-based object recognition (James et al., 2003). TMS delivered to LO previously has been shown to disrupt processes associated with recognition of individual objects as assessed by object naming and identity matching (Chouinard et al., 2009; Pitcher et al., 2009), but little is known about its functioning with multiple objects. Whereas the scene-facilitation effect was invariant to TMS when delivered to IPS, TMS delivered to LO completely abolished the benefit of scene interactions. This study thus provides strong evidence for LO's critical role in the processing of such relations, consistent with fMRI studies showing sensitivity to object interactions in LO (Kim and Biederman, 2010; Roberts and Humphreys, 2010). The coding of interobject relations is thus not relegated to an attentional cortical region, such as IPS, but occurs at the same cortical locus and likely is simultaneous with the processing of object shape.
This work was supported by National Science Foundation (NSF) Grant 10-15645 to J.G.K.; NSF Grants 04-20794, 05-31177, and 06-17699 to I.B.; and National Science Council (NSC) Grants 97-2511-S-008-005-MY3, 99-2410-H-008-022-MY3, and 98-2517-S-004-001-MY3 to C.-H.J. C.-H.J. was supported by the NSC, Taiwan Grant 98-2918-I-008-011, and the Fulbright scholarship. We thank Neil Muggleton for his helpful discussion about the work and Jiaxin Yu for his assistance throughout the data collection process.
The authors declare no competing financial interests.
- Correspondence should be addressed to any of the following: Jiye G. Kim, Department of Psychology, University of Southern California, 3620 McClintock Avenue, Los Angeles, CA 90089, ; Dr. Irving Biederman, Neuroscience Program, University of Southern California, 3641 Watt Way, Los Angeles, CA 90089, E-mail: ; or Dr. Chi-Hung Juan, Institute of Cognitive Neuroscience, National Central University, Jhongli 320, Taiwan, E-mail: