Abstract
Autistic traits span a wide spectrum of behavioral departures from typical function. Despite the heterogeneous nature of autism spectrum disorder (ASD), there have been attempts at formulating unified theoretical accounts of the associated impairments in social cognition. A class of prominent theories capitalizes on the link between social interaction and visual perception: effective interaction with others often relies on discrimination of subtle nonverbal cues. It has been proposed that individuals with ASD may rely on poorer perceptual representations of other people's actions as returned by dysfunctional visual circuitry and that this, in turn, may lead to less effective interpretation of those actions for social behavior. It remains unclear whether such perceptual deficits exist in ASD: the evidence currently available is limited to specific aspects of action recognition, and the reported deficits are often attributable to cognitive factors that may not be strictly visual (e.g., attention). We present results from an exhaustive set of measurements spanning the entire action processing hierarchy, from motion detection to action interpretation, designed to factor out effects that are not selectively relevant to this function. Our results demonstrate that the ASD perceptual system returns functionally intact signals for interpreting other people's actions adequately; these signals can be accessed effectively when autistic individuals are prompted and motivated to do so under controlled conditions. However, they may fail to exploit them adequately during real-life social interactions.
Introduction
Autism spectrum disorder (ASD) is diagnosed exclusively on a behavioral basis and is associated with impaired skills for social interaction (Lord et al., 2000). Current theoretical accounts hypothesize that it may derive from poor perceptual recognition or interpretation of other people's actions (Simmons et al., 2009). Previous experimental research on this question has focused on sensitivity to detection of biological motion (BM) within point-light displays but has yielded conflicting results (Blake et al., 2003; Hubert et al., 2007; Atkinson, 2009; Murphy et al., 2009; Koldewyn et al., 2010; Saygin et al., 2010; Jones et al., 2011; Nackaerts et al., 2012; Rutherford and Troje, 2012). There are several possible causes for these apparent inconsistencies in the literature.
First, inadequate experimental controls mean that group differences not specific to either ASD or the capacity for motion processing may generate effects. For example, impairments affecting any stage of visual processing before that concerned with the detection of BM may affect action processing (Neri et al., 2007). Similarly, some experimental tasks place high demands on attention, working memory, and decision-making capacity; these could feasibly be affected by impairments of executive function in ASD (Hill, 2004).
Second, a specific aspect of BM perception might only be disrupted in autism, making detection of group differences task dependent. One hypothesis is that BM perception relies on a capacity for perception of the gestalt so that, although perception of whole figures is disrupted, detection of individual joint movement is intact (Happé and Frith, 2006; Mottron et al., 2006; Simmons et al., 2009). Alternatively, knowledge of action could enhance perception via feedback, and this mechanism could be impaired in autism (Klin et al., 2003). A third hypothesis is that the temporal patterns of motion that lend moving objects a sense of animacy (e.g., the Heider and Simmel tasks, 1936; Viviani and Stucchi, 1992) are critical to BM perception, and processing of these patterns is impaired in autism (Castelli et al., 2000, 2002; Rutherford et al., 2006).
In this study, by comparing typically developing (TD) and ASD adolescents with normal intelligence quotient (IQ), we sought to rectify these limitations in two ways. First, we controlled for nonspecific effects by including an inversion condition (Pavlova and Sokolov, 2000; Neri et al., 2007). A marked effect of inversion is one of the longest established features of BM perception from point-light displays (Sumi, 1984; Troje and Westhoff, 2006). Therefore, any deficit in BM perception will affect detection in an upright stimulus more than an inverted stimulus. Second, we performed a comprehensive set of experimental manipulations spanning the action processing hierarchy, with each experiment focusing on a specific cognitive function required for the detection of BM. This program was deployed in a consistent, cross-checked manner, adopting a common set of tools, measurements, and logic across the board. Overall, our behavioral results showed a nonsignificant trend toward impaired performance in ASD, but the performance between groups was remarkably similar after factoring out any aspecific effects with an inverted control condition.
Materials and Methods
We settled on six experiments, each designed to test for a deficit of a specific aspect of action perception in autism. All experiments used point-light displays and a binary choice design. First, we probed the basic capacity to differentiate between BM and non-BM (Experiment 1, see below). Next, we sought to measure the following: the capacity to discriminate robotic from natural motion of local joint movements (Experiment 2); the capacity to discriminate one form of action from another (Experiment 3); the two-stage hierarchical integration of local information (limbs) into full body agents (Experiment 4); the higher-level capacity to distinguish between two agents who are temporally synchronous from those who are not (Experiment 5); and generic attention to BM signals (Experiment 6).
Stimulus
Point-light action sequences depicted ∼20 s of fighting or dancing at a sampling rate of 60 Hz; each sequence tracked 26 joint trajectories (13 per agent: head, shoulders, elbows, wrists, hips, knees, and feet). Details of how these sequences were acquired are available from previous publications (Neri et al., 2006, 2007; Luu and Levi, 2013).
Participant data
The research was ethically approved by the North of Scotland Research Ethics Committee. Participants were included if they had an IQ >75 and no known visual impairment after correction with refractive lenses. Participants were adolescent males (mean ± SD age: ASD, 16.09 ± 2.24 years; TD, 15.54 ± 2.15 years; see Fig. 1B).
IQ was assessed with the Wechsler Abbreviated Scale of Intelligence (Wechsler, 1999) and was in the normal range for all individuals (mean ± SD: ASD, 103.14 ± 11.59; TD, 104.79 ± 9.14; for individual IQ results, see Fig. 1A).
All ASD participants had an existing clinical diagnosis of ASD and were recruited at dedicated units within schools that specifically catered to ASD (Bölte et al., 2008). The existing diagnosis was verified by Autism Diagnostic Interview (Revised; Lord et al., 2000) with severity at time of testing indexed by total score on the social responsiveness scale (SRS; Constantino and Gruber, 2005). Scores showed no overlap between groups (mean ± SD: ASD, 107.95 ± 26.56; TD, 13.79 ± 9.82; Fig. 1).
We recruited 26 ASD participants and 22 TD participants in total. It was not practically feasible to recruit every participant for every task given the minimum amount of testing time required from each participant and the constraints associated with the maximum temporal window available for data collection in any given session. Instead, we sampled from the group we had available at the mutual convenience of researchers and participants. In Experiments 1 and 5, there were 18 participants for each group. In Experiments 2–4 and 6, 15 participants from each group took part.
Experimental setup
Participants sat in front of a laptop with a 13.1 inch LCD screen (resolution, 1024 × 840 pixels; refresh rate, 60 Hz); viewing distance was loosely controlled between 80 and 120 cm (no strict viewing distance was enforced, but participants were instructed to remain seated in front of the display in upright posture and were monitored continuously to verify they did so). We ensured that the experiment took place in an environment that was both suitable for undertaking visual psychophysical experiments (quiet, moderate lighting, no distraction) and comfortable/familiar for the participant (unfamiliar environments may affect performance in autistic populations).
General methodology
We now describe protocol details that applied to most experiments and later highlight relevant departures. Task structure conformed to the two-alternative forced-choice (2AFC) design or one-interval variants with symmetric binary choice (Green and Swets, 1966). Observers saw two intervals on each trial, presented in random order and separated by a 0.5 s gap. The “target” interval showed a 1.5 s segment selected randomly from the original fighting sequence (see the example in Fig. 2A), whereas the “nontarget” interval showed a scrambled version of another segment from the same sequence (see the example in Fig. 2C). Participants were asked to indicate the target interval by pressing one of two keys. Each experiment consisted of two sessions of 150 trials per participant.
Joint trajectories were sampled by 12 dots (size, ∼3 mm) with a limited lifetime of 150 ms (Neri et al., 1998); half the dots were bright (100% contrast), and half were dark on a gray background (luminance, ∼30 cd/m2). The fighting scene spanned ∼20 × 13 cm (width × height). Size/luminance details are approximate because it was often necessary to test observers in variable environments where they felt most comfortable (see above).
Outcome variables
The primary outcome variable for the first three experiments was that of noise tolerance (Neri et al., 1998). Intervals of action sequences were masked by noise dots (each created by randomly sampling frames from a joint from the original action sequence and plotting it on a random location on the screen). The number of noise dots was varied in linear steps (Fig. 3A–D) to derive full psychometric curves (Fig. 3E–G). In the second set of experiments, designed to investigate reliance on global versus local features (Experiment 4) and sensitivity to interaction between agents (Experiment 5), we used scrambling thresholds rather than noise dots (Neri et al., 2006). Joint trajectories in the nontarget sequence were shifted randomly in time either on a limb-by-limb basis (Fig. 2E,F) or between agents (Fig. 2G,H), and the amount of phase scrambling was varied. In the final experiment (6), designed to probe attention, the outcome variable was duration of contrast change.
Threshold estimation
Our goal was to extend our measurements to a wide class of stimuli and manipulations. The potential challenges associated with an experimental program of this kind are illustrated by the psychometric curves in Figure 3E–G. In view of the large numbers of trials required and the consequently high demands placed on participants, characterization of full psychometric curves has rarely been attempted before with ASD participants (Koldewyn et al., 2010). We found threshold measurements to be occasionally comparable with those obtained in TD participants [Fig. 3, compare E (TD) with G], but more often ASD participants generated noisier data (example in Fig. 3F) despite their IQ being within normal range (Fig. 1A). The parameters we finally adopted were the result of extensive piloting to maximize the robustness of our procedures. Thresholds were estimated by averaging the noise intensity values associated with a performance range between 60% and 90% of correct responses (Baldassi et al., 2006). This procedure allowed us to estimate thresholds from data that were too noisy to support robust fitting. Effects of conditions were tested for within groups using paired t tests. Group differences were tested with an unpaired t test comparing the log ratio of upright/inverted thresholds across participants.
Individual experiments
Experiment 1: biological motion detection.
Participants were asked to discriminate between a BM sequence (target) and a randomized motion sequence derived from the original sequence (nontarget). The target sequence was a randomly selected 1.5 s clip from the ∼20 s original sequence (Fig. 2A,B). The nontarget sequence (also 1.5 s duration) was generated by selecting each joint randomly from a different time point in the original sequence, such that animate motion dynamics were maintained but coherence was lost (Fig. 2C,D). There were two experimental conditions (mixed within blocks): upright and inverted. On inverted trials, both target and nontarget stimuli were flipped upside-down.
Experiment 2: original versus robotic motion.
This experiment was almost identical to Experiment 1, except the inverted condition was replaced by a “robotic” condition: the motion of each joint was undersampled and linearly interpolated, thus removing the animate characteristic of motion trajectories seen in BM. Consequently, dots moved in straight lines at constant speeds (Fig. 4B). We then corrected for low-level motion cues (linear interpolation “slows” the speed of individual joints as they take a more direct route) by matching the average joint velocity to the original sequence.
Experiment 3: action discrimination.
We asked participants to perform explicit discrimination between a fighting and a dancing action (Fig. 5A,B). In this experiment, we departed from the 2AFC methodology by only presenting one 2.5 s sequence per trial (randomly selected between fighting and dancing). We corrected for the slightly slower motion cues in the dancing sequence by matching the average velocity between the two sequences. Participants were asked to indicate whether the action type of the presented sequence was fighting or dancing. There were upright and inverted conditions, occurring exactly as described in Experiment 1.
Experiment 4: limb scrambling.
To examine the possibility that a capacity to detect a coherent whole might lend controls an advantage in detecting BM, Experiment 4 retained the BM dynamics of individual joint movements but removed coherence by temporally dephasing the limbs (Fig. 2E,F). This manipulation was achieved by assigning to each limb a unique starting point with respect to the original sequence (Fig. 2, compare B with F). Participants were asked to select the target sequence, in which limbs were intact, as opposed to the nontarget sequence, in which the limbs were scrambled to varying degrees (Neri, 2009). Stimulus duration was 2 s.
Experiment 5: agent scrambling.
The two agents in our sequences interact in a meaningful way through either dancing or fighting, and action interpretation of one agent enhances sensitivity to the action pattern associated with the other agent (Neri et al., 2006). In the same way that point lights within an individual generate a percept of coherent motion as a result of being commonly related to a single action sequence, so it is with two individuals related to one another by a common activity. If a disruption of the ability to perceive coherence causes impaired BM perception in ASD, then coherence at this higher level should be a highly sensitive measure. However, the above-detailed experiments (1–4) do not probe the ability to detect inter-agent interaction. We designed a manipulation that shifted all joints of one agent forward or backward in time relative to the other agent (Fig. 2G,H), allowing us to vary the degree to which the two agents acted in synchrony with one another. Consequently, the meaningful link between one agent's actions and the other agent's actions (e.g., if one agent punches, the other agent attempts to block the punch) was lost in the scrambled sequence. Participants were asked to identify the synchronized (target) sequence (Fig. 2, A,B vs G,H). Successful discrimination was specifically dependent on detection of inter-agent interaction and could not be achieved by relying on the cues that potentially supported previous tasks because intact body fragments, as well as full agents, were delivered by both target and nontarget sequences (Fig. 2, compare A,B with G,H). In this experiment, we also departed from the general protocol by ensuring that agents were clearly distinct from one another: all joints for one agent were bright (100% contrast), whereas all dots for the other agent were dark. All joints were also continuously displayed for the entire duration of the stimulus (no limited-lifetime sampling). Stimulus duration was 2.5 s.
Experiment 6: generic attention.
Group differences in studies of BM perception in ASD could potentially be generated by differences in attentional capacities. To test for a potential role of generic attentional resources, we briefly reduced the contrast (from 100% to 50%) of three randomly selected target joints on the two agents at a random time point throughout stimulus presentation (see Fig. 7A,B) and asked observers to report whether the target joints were brighter (light gray) or darker (dark gray) than the background. We then varied the time period during which the change was applied and estimated threshold duration for performing this task (see Fig. 7C). The contrast change was well above threshold visibility; therefore, task difficulty was dependent on the capacity for sustained voluntary attention (Corbetta and Shulman, 2002) required to monitor the entire 2.5 s sequence on every trial, so as to not miss the change when it occurs. One interval was presented on each trial. There were no noise dots, and the 16 sampling dots had longer limited lifetime (250 ms).
Results
Results of paired tests and group comparisons are shown in Table 1. Threshold measurements for Experiment 1 are shown in Figure 3H: the ability to discriminate intact versus scrambled BM sequences is lost with fewer masking noise dots when the display was inverted upside-down (data points lie above the diagonal equality line), and the magnitude of this effect is similar for both ASD and TD groups (Fig. 3H, filled and open symbols, respectively).
In Experiment 2, we observed no substantial change in noise tolerance thresholds when switching from the original (Fig. 4A) to the robotic (Fig. 4B) stimuli for both ASD and TD populations (Fig. 4C, data points scatter around unity line), indicating that the local motion patterns specifically associated with biological movement are processed similarly by ASD and TD visual systems. In Experiment 3, which required actions (fighting vs dancing) to be discriminated from one another (Fig. 5A vs B), clear inversion effects were similarly detected in both groups (Fig. 5C). The same result was obtained for Experiment 4, in which participants with ASD showed a similar susceptibility to the effects of limb scrambling and the degree to which this was affected by inversion (Fig. 6E). In Experiment 5, ASD and TD groups demonstrated comparable ability to detect inter-agent interaction and a similar degree of impairment with inversion (Fig. 6F). Finally, in Experiment 6, both groups showed similar thresholds for identifying a brightness change applied to a random subset of the joints (Fig. 7C).
Overall comparison of results
Finally, we considered that a subtle deficit of BM perception could exist that was undetected in separate experiments but that may become evident if all results were combined. We investigated this by normalizing thresholds within each experiment and collating overall results. We found a nonsignificant trend toward poorer thresholds for both upright and inverted conditions in the ASD group (t(128) = −1.8844, p = 0.062; Fig. 8A,B), but upright/inverted log ratios were virtually identical (t(128) = −0.2184, p = 0.858; Fig. 8C). The overall drop in sensitivity with inversion we measured across experiments and groups was ∼½ log unit, in close agreement with previous estimates (Neri et al., 2007).
Discussion
We designed a battery of experiments that sought to comprehensively test the hypothesis that the ability to detect BM in autism is impaired. None of our experiments revealed any significant group differences. Rather, we found clear evidence of an inversion effect in several experiments for both groups, which is indicative of intact action perception in ASD. We emphasize that the observed lack of measurable differences between TD and ASD populations is not a consequence of poor resolving power associated with our protocols: it is not that we failed to measure any effect (e.g., deficit) in either TD or ASD populations; to the contrary, we reliably measured inversion effects across several experiments, yet those measured effects were of similar magnitude for TD and ASD participants (Fig. 8C).
When we combined data across our large dataset, we did find a (nonsignificant) trend toward a group difference (Fig. 8A,B, rightward-pointing arrows). Several possibilities might be considered to account for this suggestive result (besides the possibility that it may represent a chance finding). Visual noise theories suggest a more generalized impairment of visual perception in autism deriving from increased neural noise in the visual cortex (Simmons et al., 2009; Dinstein et al., 2010). The absence of group differences in upright tasks argues against this interpretation, although we emphasize that our findings are most specific to the question of action perception. Another possibility is that it stems from differences in executive function between groups (see below). Finally, action processing might only be affected in autism in certain ways, so that specific experiments might be required to measure any resulting deficit. With relation to the latter possibility, we selectively examined three separate functions that might generate specific group differences.
First, we considered the notion that animacy detection might be impaired in ASD: some research has shown abnormal perception of “animate” or life-like kinematics in autism (Rutherford et al., 2006; Cook et al., 2009), whereas other research has suggested that individuals with autism display atypical motor kinematics relative to a TD population (Cook et al., 2013, 2014). We did not measure any effect of joint kinematics in either group, indicating that the dynamics of individual dot movements are not critical to the detection of an overall BM. Therefore, the possibility that life-like kinematics might contribute to group differences is a moot point.
Second, we considered whether the concept of “weak central coherence” (WCC) might be important in BM perception. WCC theory proposes that individuals with autism deploy greater attentional resources to local details as opposed to global details and are impaired at retrieving a coherent whole percept (Plaisted, 2003; Happé and Frith, 2006; Mottron et al., 2006). In Experiment 4, we used a manipulation that disrupted whole coherence while retaining animacy of individual limbs (Neri, 2009). Both groups were equally susceptible to this disruption and inversion effects were also similar (Fig. 6E). Evidently, a capacity for detecting and using coherence was present in both groups to a similar degree. In Experiment 5, we investigated the capacity to use the information carried by the meaningful interaction between two agents. The associated manipulation probed coherence at a further even more global level than integration of limbs into whole bodies (Neri et al., 2006; Luu and Levi, 2013); therefore, it should be sensitive to relatively small deficits in coherence detection. However, again we found good evidence for intact processing of inter-agent communication signals (Fig. 6F).
Third, we considered that knowledge of action could be a factor. Some theoretical frameworks for understanding autism, such as the “enactive mind approach” (Klin et al., 2003) or mirror neuron theory (Williams et al., 2001; Williams, 2008), propose that perception is tightly linked to action knowledge and associated top-down influences, particularly in relation to developmental processes. Such theories would predict that a capacity for action recognition would enhance action detection. Once again, we found no group differences for recognizing action type, and we measured inversion effects indicative of positive performance in both groups (Fig. 5C). Finally, we looked for attentional differences associated with our stimuli and found no differences in capacity for sustained attention (Fig. 7C).
Together, our experiments provide strong evidence for intact BM perception in autism. Importantly, by investigating different stages of the action-processing hierarchy in a single population and by manipulating a single set of stimuli in several different ways, our experimental program contains several internal controls that aid robustness to our conclusions.
Our findings are arguably at odds with the group differences reported for fMRI signals associated with BM perception (Kaiser et al., 2010) and behavioral demonstrations that infants with autism do not attend to action kinematics or show the same preference to action as matched TD infants (Klin et al., 2009). Differences in results between studies highlight important aspects of our findings. We measured the capacity to detect BM under conditions in which attention to the stimuli was maximized, whereas Klin et al. (2009) measured preference for attending to BM stimuli rather than a capacity to detect them. Kaiser et al. (2010) also did not control for attentional effects, and these have been shown to play an important role in generating group differences for other social stimuli, such as faces (Hadjikhani et al., 2007).
The issues discussed above highlight the potential importance of executive function in BM recognition. At the theoretical level, the enactive mind approach (Klin et al., 2003) proposes that the mechanism controlling attention to social stimuli is disrupted in autism rather than a capacity to detect them at the perceptual level. From the practical perspective of experimental design, we planned our study to minimize any effects of differences in motivation or capacity to maintain attention, and our final experiment (Fig. 7) suggests that we achieved our goal. However, it remained a possibility that executive function could still influence our results. We further factored out any residual role for executive function deficits by normalizing our upright-display measurements with corresponding inverted-display measurements. Generalized attentional deficits or limitations associated with executive function (e.g., working memory, decision making) will have equal effect on these two conditions and would cancel out in the upright/inverted comparison. Therefore, the inversion effects we consistently measured across our experimental program reflect genuine changes in perceptual sensitivity for discriminating our BM stimuli. By replicating previously reported effects (Neri et al., 2007), they also demonstrate that our approach is robust and supports accurate psychophysical threshold measurements.
Another important difference is that Klin et al. (2009)) report findings in infants, whereas we report on adolescents. This raises a question as to whether the capacity to detect BM might have a developmental aspect to it (Freire et al., 2006) and whether we might have detected group differences had we used a younger population. Evidently, we are unable to answer this question definitively using the results from this study, but we are not aware of any relevant published measurements and our own estimates of the inversion effect (upright/inverted log ratios) do not correlate significantly (p > 0.05) with age in either population over the (admittedly limited) range we tested (12–19 years). Again, any developmental model would need to disentangle the capacity for BM detection from the development of executive function, which influences experimental task compliance and utilization in higher cognition. Given that a capacity for detecting BM is evident in very young infants (Simion et al., 2008), it would seem likely that executive function would place a bottleneck on the class of threshold measurements used in our study.
Conclusion
Our results demonstrate that individuals with ASD possess intact, functioning neural circuitry for perceptual processing of socially relevant visual signals (Dinstein et al., 2010): when they look at other people, under controlled well motivated conditions, their perceptual system returns functionally intact signals for interpreting those people's actions adequately. However, it remains the case that individuals with autism may still fail to attend to those signals or may not take action on them for the purpose of typical social interaction.
Footnotes
This work was supported by funding from the Medical Research Council and the Royal Society. We thank the participants, the teaching staff from Dyce Academy and Meldrum Academy, and Aberdeen's Autism Awareness Association for their support of this research.
- Correspondence should be addressed to James Cusack, Division of Applied Medicine (Psychiatry), University of Aberdeen, Clinical Research Centre, Royal Cornhill Hospital, Aberdeen, AB25 2ZD UK. j.cusack{at}abdn.ac.uk