Abstract
Due to capacity limits on perception, conditions of high perceptual load lead to reduced processing of unattended stimuli (Lavie et al., 2014). Accumulating work demonstrates the effects of visual perceptual load on visual cortex responses, but the effects on auditory processing remain poorly understood. Here we establish the neural mechanisms underlying “inattentional deafness”—the failure to perceive auditory stimuli under high visual perceptual load. Participants performed a visual search task of low (target dissimilar to nontarget items) or high (target similar to nontarget items) load. On a random subset (50%) of trials, irrelevant tones were presented concurrently with the visual stimuli. Brain activity was recorded with magnetoencephalography, and time-locked responses to the visual search array and to the incidental presence of unattended tones were assessed. High, compared to low, perceptual load led to increased early visual evoked responses (within 100 ms from onset). This was accompanied by reduced early (∼100 ms from tone onset) auditory evoked activity in superior temporal sulcus and posterior middle temporal gyrus. A later suppression of the P3 “awareness” response to the tones was also observed under high load. A behavioral experiment revealed reduced tone detection sensitivity under high visual load, indicating that the reduction in neural responses was indeed associated with reduced awareness of the sounds. These findings support a neural account of shared audiovisual resources, which, when depleted under load, leads to failures of sensory perception and awareness.
SIGNIFICANCE STATEMENT The present work clarifies the neural underpinning of inattentional deafness under high visual load. The findings of near-simultaneous load effects on both visual and auditory evoked responses suggest shared audiovisual processing capacity. Temporary depletion of shared capacity in perceptually demanding visual tasks leads to a momentary reduction in sensory processing of auditory stimuli, resulting in inattentional deafness. The dynamic “push–pull” pattern of load effects on visual and auditory processing furthers our understanding of both the neural mechanisms of attention and of cross-modal effects across visual and auditory processing. These results also offer an explanation for many previous failures to find cross-modal effects in experiments where the visual load effects may not have coincided directly with auditory sensory processing.
Introduction
Much research has shown that the level of perceptual processing load in a task (perceptual load) is an important determinant of the magnitude of neural responses to task-irrelevant stimuli. Tasks of high perceptual load (e.g., visual search in multielement arrays) are associated with significantly reduced visual cortex response to task-irrelevant stimuli that can elicit a robust response in conditions of low load (e.g., search for a feature “pop-out”; Rees et al., 1997; Handy et al., 2001; Yi et al., 2004; Schwartz et al., 2005). These findings are explained with the limited capacity model offered in the load theory of attention (Lavie, 2005). Load modulations have been found throughout visual cortex and subcortical structures, from the lateral geniculate nucleus and superior colliculus (Rees et al., 1997; O'Connor et al., 2002), through striate and extrastriate cortex (V1–V4 and MT: Rees et al., 1997; Schwartz et al., 2005), to category-selective regions, which respond to meaningful stimuli [parahippocampal place area (Yi et al., 2004), inferior temporal cortex (Pinsk et al., 2004)]. ERP studies confirm that these effects are apparent in early cortical evoked responses 70–80 ms after stimulus onset (Kelly et al. 2008; Rauss et al., 2009) and throughout later peaks and frequency-locked responses (Handy et al. 2001; Parks et al., 2011, 2013).
Load-induced modulation of visual responses leads to the phenomenon of “inattentional blindness”: observers fail to notice unattended stimuli when these are presented during task conditions of high perceptual load (Cartwright-Finch and Lavie, 2007). This occurs even when subjects are instructed to detect any additional stimuli beyond the task set (for review, see Macdonald and Lavie, 2008; Carmel et al., 2011; Lavie et al., 2014). The inherent limits in perceptual capacity underlying these behavioral effects are believed to relate to a central limited attention resource (Rees et al. 1997; Dehaene and Changeux, 2011), and this raises the possibility of a cross-modal perceptual load effect, whereby increased perceptual load in a visual task may result in reduced auditory cortex responses to task-unrelated auditory stimuli. Consistent with this premise, previous behavioral findings demonstrated reduced detection sensitivity for auditory tones presented during visual tasks of high perceptual load (load-induced inattentional deafness; Macdonald and Lavie, 2011; Raveh and Lavie, 2015). However, the neural underpinnings of these effects remain as yet unknown. Specifically, it is not understood whether inattentional deafness results from a modulation of early sensory (auditory cortical) activity or of processing at a later stage (e.g., processes that bring signals to subjective awareness).
We recorded participants' brain activity with magnetoencephalography (MEG) while they performed a visual search task under different levels of perceptual load and assessed time-locked responses both to the visual search task and to the occasional incidental presence of unattended tones (Fig. 1). Results reveal a time-specific suppression of early auditory evoked responses also followed by later suppression of the P3 “awareness” response in conditions of high compared to low visual perceptual load, demonstrating that early auditory and visual sensory processes access a central, shared neural resource.
Materials and Methods
Experiment 1
Participants
Fourteen paid participants (eight female; mean age, 28.3 years; SD, 4.5 years) took part in the main MEG experiment. Nine further participants (four female; mean age, 28.7 years; SD, 4.1) participated in a passive viewing control condition (for details, see below, Apparatus and stimuli and Procedure). One of the participants from the main experiment was excluded from analysis because they showed a poor neural response to the auditory stimuli when presented in isolation. All participants were right handed, had normal or corrected to normal vision, and reported normal hearing and no history of neurological disorders. The experimental protocol was approved by the University College London research ethics committee.
Apparatus and stimuli
The experiment was run using MATLAB 7.12 and Cogent 2000 (http://www.vislab.ucl.ac.uk/cogent.php). The magnetic signals were recorded continuously (600 Hz sampling rate; 100 Hz hardware low-pass filter) using a CTF-275 MEG system (axial gradiometers; 274 channels; 30 reference channels; VSM MedTech) in a magnetically shielded room. Subjects were seated in an upright position, with the visual stimuli projected onto a screen placed ∼52 cm from the participants' eyes. Sounds were presented via tubephones (E-A-RTONE 3A 10 Ω, Etymotic Research) inserted into the ear canal. Presentation latencies for auditory (∼15 ms) and visual (∼17 ms) stimuli were measured using a microphone and photodiode, and data were adjusted so that the evoked responses reported are appropriately aligned to the onset of stimulus presentation.
The visual task consisted of six items spaced equally around a circle centered at fixation and subtending a 1.9° viewing angle. The background of the display was dark gray (red, 77; green, 77; blue, 77); the letters and fixation cross appeared in white. The target letters were X or Z, in equal proportion, both measuring 0.6 × 0.6°. The nontarget letters in the high-load condition were the letters K, W, V, N, and M (all the same size as the target letters), and those in the low-load condition were smaller Os (0.2 × 0.2°) as used in previous load research (Forster and Lavie, 2008; Raveh and Lavie, 2015) to maximize the difference in search load. The positions of the letters were randomized on each trial so that the target had an equal probability of occurring in each position. The low-load nontarget letters were easily distinguishable from the targets based on low-level visual features such as line curvature and orientation, resulting in target pop-out. For the high-load condition, to distinguish nontargets from targets requires binding together the low-level features of line orientation with spatial location. This conjunction search involves more visual processing than the basic feature search (Treisman and Gelade, 1980) and therefore induces a higher visual perceptual load. A passive viewing control condition (see below, Procedure) was included to verify that the slight difference between the visual displays in the high- and low-load conditions is not mediating the load effect. If load is the critical factor as we predict, MEG responses should not differ between displays in the passive viewing condition.
The auditory stimuli were 100-ms-long, diotically presented, pure tones with frequencies of 500, 1000, 1500, or 2000 Hz and an envelope shaped by 10 ms raised cosine ramps. The loudness of the tones was adjusted for each participant individually (see below, Procedure). During the main experiment, tones were presented randomly on 50% of trials. Overall, equal proportions of each of the four possible frequencies were used, and tones were selected randomly on each trial to discourage participants from narrowing their auditory attention to a certain frequency band.
Procedure
Thresholds for each of the pure tone frequencies were determined for each participant at the beginning of the experiment using an adaptive staircase procedure. The staircase gave an estimate of the 79.4% correct point on the psychometric function (Levitt, 1971), and these thresholds were increased by 12 dB to produce the tones used for the remainder of the experiment. Participants were informed that they may hear some sounds during the experiment that are part of the head localizing process, and the threshold procedure was used to determine the most suitable sound level for this process.
The main experiment consisted of eight blocks of 64 trials each, four low load and four high, with the order of blocks counterbalanced between participants. No feedback was given during experimental blocks, but at the end of each block, participants were provided with a score of percentage correct on the visual task, to encourage engagement. Blocks lasted for ∼4 min each, and participants were allowed to take breaks between blocks when needed.
Figure 1 shows a schematic diagram of the trial structure. Each trial began with a fixation cross presented at the center of the screen for 1000 ms. Subsequently, a visual search array of either low (Fig. 1a) or high load (Fig. 1b) was presented for 100 ms. The very brief presentation of the visual display in combination with the heightened computational demands of the conjunction-search task results in high perceptual load. On 50% of the trials, the visual display was accompanied by a 100 ms auditory tone, which was time locked with the visual display. A blank screen was then presented for 1900 ms, during which participants were to make a speeded response on a button box regarding the identity of the visual target (using their right hand, button 1 for X and button 2 for Z).
Before and after the main experimental blocks, MEG data were collected from a short block (4 min) that contained only auditory stimuli and served to characterize the auditory response per se with no concurrent visual activation. Each block consisted of 200 presentations of pure tones, 50 at each of the four frequencies used in the experiment, with interstimulus intervals randomly distributed between 700 and 1500 ms in 100 ms increments. Participants fixated at the center of the screen and did not respond to the tones. Participants were told that these blocks were run to help localize their brain in the scanner.
To ensure that any effects on evoked responses found in the main group were not trivially due to differences in the visual stimulus displays between load conditions, an additional group of naive subjects participated in a control session in which they viewed the visual stimuli but did not perform any task. No tones were presented to this group. Blocks consisted of 200 presentations of the visual displays (one block contained the displays from the low-load task and the other contained the displays from high load task; block order counter balanced across subjects) and lasted 3 min each.
Analysis
Preprocessing.
The data were epoched into 700 ms intervals, including 200 ms preonset and baseline corrected to the preonset interval. Epochs with amplitudes above 3pT (∼6% of trials) were considered to contain artifacts and discarded. A PCA-based, denoising source separation (DSS; de Cheveigné and Parra, 2014) routine was applied to each stimulus condition [auditory alone (A), visual alone low load (VL), visual alone high load (VH), visual and auditory low load (AVL), visual and auditory high load (AVH)] to remove 50 Hz electrical noise and extract stimulus-locked activity (maximize reproducibility across trials). Scree plots indicated that the first three components were considerably more repeatable than the others, but to ensure all activity of interest was retained, a conservative selection of the 12 most repeatable components in each condition were projected back to sensor space (de Cheveigné and Parra, 2014).
Separating auditory and visual responses.
In the main experiment, auditory stimuli were always presented concurrently with the visual search array. Although primary auditory and visual cortices are spatially distinct, MEG sensors at one site may pick up activity from distant neural sources due to spread of magnetic fields. This means that data recorded from temporal sensors may be contaminated by visual activity, and any differences between responses under low and high load might include some of the effects on visual processing. To separate the auditory and visual responses, a second stage of DSS analysis (de Cheveigné and Parra, 2014) was applied to each subject's data. This analysis was designed to identify components that differ between audio visual trials (AV; tone-present trials) and visual only trials (V; tone-absent trials). For details of the covariance matrices used to define this criterion, see the analysis by de Cheveigné and Parra (2014). The first two components were found to explain the vast majority of the variance between V and AV trials and were therefore projected back into sensor space and analyzed as the auditory response (Fig. 2B). The remaining components (i.e., those that explained very little of the variance between V and AV trials) were projected back to sensor space and analyzed as the visual response (Fig. 2D). Importantly, this analysis is “blind” in that it does not separate the data based on prior expectations of the time course or source of auditory or visual responses, but implicitly defines the auditory response as activation that is observed in the AV but not V condition.
The DSS reweighting was calculated over the entire data set (collapsed over low- and high-load blocks; all channels), and this same channel reweighting was then applied to both high- and low-load data sets. Similar analyses (not reported here) where the reweighting was calculated based on only the low-load or only the high-load data (and then applied to both) showed the same pattern of results, as did an analysis which applied the low-load and high-load reweighting independently. Source localization (Fig. 2B,D, right) confirmed that the procedure was indeed successful at isolating auditory and visual activity. As further confirmation, we directly calculated both auditory and visual responses from trials where they were not presented together: visual responses were taken from the trials (50%) that did not contain a tone, and auditory responses were taken from the blocks at the beginning and end of the experiment, where the tones were presented alone. Figure 2 shows the auditory and visual responses derived from the DSS (B, D) alongside responses to the tones alone (A, B, condition A) and the visual display alone (C). The fact that the shape, timing, and localization of the auditory responses derived by DSS (Fig. 2B, conditions AVL, AVH) match very well with those from the tone alone blocks (B, condition A) and that, similarly, the visual component derived from DSS (D) matches the responses of visual alone trials (C) provides further confirmation that the DSS procedure was successful.
The analysis of load effects reported in the results section refers to data from the AV trials, with auditory and visual responses derived from DSS, so that data are reflective of processing while the stimuli were competing for processing resources. However, the load effect in the visual alone trials was also analyzed (data not reported) and showed very similar results (Fig. 2C,D).
Channel selection.
In condition A (auditory alone), the auditory M100 (aM100) onset response (cf. Hari, 1990; Roberts et al. 2000; Fig. 2A) was identified for each subject as a source/sink pair located over the temporal region of each hemisphere on the individual scalp maps. The M100 current source is generally robustly localized to the upper banks of the superior temporal gyrus in both hemispheres (Hari, 1990; Pantev et al., 1996; Lütkenhöner and Steinsträter, 1998). For each subject, the 40 most strongly activated channels at the peak of the aM100 (20 in each hemisphere) were considered to best reflect activity in the auditory cortex, and thus selected for the analysis of the data in the main experiment. Similarly, using the data for the VL and VH conditions (visual alone stimuli; collapsed over load conditions), the visual M100 (vM100), which dominates the evoked response (Fig. 2C), was identified for each subject as a source/sink pair over the occipital lobe (cf. Hashimoto et al. 1999). The source of the vM100 (and its EEG counterpart, the vP1) is reliably located within the striate cortex (for review, see Tobimatsu and Celesia, 2006). For each subject, the 40 channels (20 in each hemisphere) that showed the strongest activity during the vM100 response were then selected for analysis of visual responses in the main experiment. Importantly, the DSS analysis detailed above was based on data from all channels; the channel selection as described here was used, after the application of DSS, for deriving the root mean square (RMS) activation measure (see below, Evoked responses), time-frequency data, and subsequent statistics.
Evoked responses.
For each condition, in each hemisphere, the RMS of the field strength across the 20 selected channels was calculated for each sample point. The time course of the RMS, reflecting the instantaneous power of neural responses, is used as a measure of the dynamics of brain responses. For illustrative purposes, the group RMS (RMS of individual subject RMSs) is plotted in Figures 1 and 2, but statistical analysis was always performed across subjects, independently for each hemisphere.
To compare responses between load conditions (AVL vs AVH, for the auditory and visual responses), the difference between the (squared) RMSs for high and low load were calculated for each participant and subjected to bootstrap resampling (1000 iterations, balanced; Efron and Tibshirani, 1993). The difference was judged to be significant if the proportion of bootstrap iterations that fell above/below zero was >99% (i.e., p < 0.01) for 10 or more adjacent samples (16 ms). The bootstrap analysis was run over the entire epoch (200 ms preonset to 500 ms postonset); all significant intervals identified in this way are indicated in Figure 3.
Though the time-domain auditory and visual responses (Figs. 2, 3) were drawn from different subsets of sensors, there was a small amount of overlap in some participants. However, the load effects reported remained the same when any overlapping sensors were removed. Moreover, when the auditory responses were derived only from the more frontal sensors (which have the opposite polarity to occipital activity) versus the posterior sensors (which have the same polarity as occipital activity), the load effect remained the same in both cases, indicating that the effects reported are not trivially due to an imperfect separation of auditory and visual responses.
Time-frequency analysis.
A time-frequency analysis was conducted to examine potential load-induced oscillatory effects. This analysis was based on data that had been preprocessed to remove noise, but had not undergone the second stage of DSS analysis to separate auditory and visual responses, since this process is focused on enhancing evoked activity and may remove induced oscillatory activity. Data for each trial were converted to time-frequency space using a Morlet wavelet transform with seven cycles, across frequencies from 5 to 40 Hz. For each participant, the power spectra were then averaged over trials in each condition separately across the temporal and occipital sensors (selected for each individual as described above). A mixed-design ANOVA was used to compare the power spectra between conditions; subject was entered as a fixed factor, and load (high vs low) and trial type (V vs AV) as random factors. The results reported refer to the main effect of load in this analysis and are significant at the p < 0.05 level after FWE (familywise error) correction. There were no significant interactions between load and trial type.
Source localization.
Activity within the time windows identified in the RMS analysis was localized using the multiple sparse priors method (Litvak and Friston, 2008). Inversions were based on all MEG channels and used a single shell head model and group constraints. Second-level analyses consisted of T contrasts (p < 0.05, FWE corrected) to compare activation between load conditions. Where comparisons between load conditions were used, the contrasts were defined in the direction indicated by the RMS data, i.e., low > high or vice versa.
The inversions reported were performed on the data after the DSS analysis had been applied and therefore largely reflect sources of time-locked activity. Source analysis based on nonprocessed data was also performed to identify any additional (potentially nontime locked) sources. This analysis did not reveal any further significant activity.
Experiment 2
Participants
Eleven paid participants took part in the behavioral study. One was excluded due to an exceptionally high false alarm rate (mean, 63%; for all included participants mean, 8%). For the remaining 10 participants (seven female), ages ranged from 18 to 29 years (mean, 22.2; SD, 3.3). All participants had normal or corrected to normal vision and reported normal hearing. The experimental protocol was approved by the University College London Research Ethics Committee.
Apparatus, stimuli, and procedure
The experiment was run on a Dell PC with a 13 inch monitor using MATLAB 7.12 and Cogent 2000 (http://www.vislab.ucl.ac.uk/cogent.php). A viewing distance of 57 cm was maintained throughout using a chin rest. Sounds were presented using the same tubephones as in the MEG study.
The visual and auditory stimuli were identical to those in Experiment 1 except that the number of trials in each experimental block was reduced to 56. Subjects performed the same visual task as in Experiment 1 while also performing a detection task on the simultaneous auditory stimuli. They were informed that the visual task was their priority and made speeded responses to the visual search task (right hand, “0” for X and “2” for Z). After the visual response, a screen prompted participants to respond with whether or not they had heard a tone (left hand, “A” for sound absent, “S” for sound present). To familiarize participants with the dual task procedure, they completed a series of short demos before the main experiment.
Results
Experiment 1
Behavioral
Mean task reaction time was increased in the high load (mean, 826 ms; SD, 105) compared to low-load conditions (mean, 648 ms; SD, 84; t(1,12) = −13.1; p < 0.001). Accuracy rates were reduced from low (mean, 98%; SD, 1.8) to high load (mean, 88%; SD, 3.5; t(1,12) = 10.3; p < 0.001). These findings confirm that the load manipulation was effective. There was no effect of the presence of the auditory stimulus on either accuracy (low load, t(1,12) = 0.5, p = 0.62; high load, t(1,12) = −0.1, p = 0.95) or reaction times (low load, t(1,12) = −1.3, p = 0.22; high load, t(1,12) = 1.0, p = 0.35) in the visual task. Thus, participants did not seem to pay attention to the tones, as instructed (see Materials and Methods, Procedures).
MEG
Visual search task.
Responses to the visual search task are displayed in Figure 3. The high-load response showed a significantly greater amplitude than the low-load response during the interval 83–123 ms after onset in the left hemisphere and 88–137 ms after onset in the right hemisphere. This difference occurred during the rising slope and peak of the vM100 onset response. These findings are consistent with previous studies of visual features versus conjunction searches (Leonards et al., 2006; Painter et al., 2014), which show stronger activation for conjunction searches in visual cortex during this time frame. The increased activity is likely to reflect the higher level of perceptual processing involved in the high- compared to the low-load conditions. Source localization demonstrated that the load effect was associated with increased activation in the right extrastriate visual cortex, bilateral supramarginal gyrus, the left postcentral gyrus, the right superior parietal lobule, and intraparietal sulcus (p < 0.05, FWE corrected; Fig. 3B, Tables 1, 2). These findings are in line with previous fMRI studies of visual perceptual load (Rees et al., 1997; Donner et al. 2002), which show increased activity in the visual cortex and areas of the attention network under greater visual load.
In the later portion of the trial, the low-load condition showed an increased amplitude compared to the high-load condition, likely reflecting the preparatory response decision and selection components of the task (e.g., greater decision certainty and faster response selection), which are associated with greater behavioral accuracy in low compared to high load. The data from a passive-viewing control group confirmed that these effects were specific to the load task demands; when passively viewed, there were no significant differences between brain responses evoked by the low- and high-load visual stimuli (Fig. 3A).
Auditory responses.
Auditory responses are shown in Figure 3C. Response amplitude was significantly reduced in the high- relative to low-load condition during the latter portion of the aM100 response (130–160 ms after onset in both hemispheres). The sites associated with this modulation were the superior temporal sulcus (STS) and posterior middle temporal gyrus (MTG), both of which showed reduced activity under high compared to low load (p < 0.05, FWE corrected; Fig. 3, Tables 1, 2).
An additional load effect was also observed later in the trial, between 228 and 288 ms after onset, where right hemisphere auditory responses showed a peak in the low- but not high-load condition (Fig. 3C, Table 1). Response peaks with a latency in this range are identified as P3 (Mecklinger et al., 1998; Kluge et al., 2011), known as the “awareness positivity” response, with generators in Heschel's gyrus and superior temporal gyrus (STG; Opitz et al., 1999) and frontal regions (Comerchero and Polich, 1999). The P3 is observed as a positive polarity response in EEG. Its MEG counterpart is usually associated with a topography similar to the aM100, as is also found in the present data. The overall pattern is therefore consistent with a P3 response occurring under low but not high load. The difference in amplitude between load conditions in this time range is associated with increased activity in the STG bilaterally in low compared to high load (p < 0.05, FWE corrected; Fig. 3, Tables 1, 2).
Time-frequency analysis.
Time-frequency analysis on visual (occipital) channels revealed a significant effect of load on prestimulus oscillatory power: from the beginning of the epoch (−200 ms) to −93 ms, there was increased power at 8–9 Hz in low- compared to high-load trials. A similar effect was seen between 335 and 485 ms after stimulus presentation, with increased power at 8 Hz in low compared to high load. These effects are consistent with numerous reports of alpha suppression during active attention (Fu et al., 2001; Kelly et al., 2006). Oscillatory power was also found to be higher in low load from 400 ms after stimulus presentation to the end of the epoch (500 ms) for frequencies between 15 and 30 Hz. This is likely to reflect response selection and preparatory motor responses (Zhang et al., 2008), which occur earlier with low load. The same analysis on auditory (temporal) channels revealed no significant effect of load on oscillatory power. No trends were observed even at a lower p threshold, confirming that the effects on evoked activity were not caused by preemptive, global, suppression of auditory cortical activity under high load.
Experiment 2
The findings from Experiment 1 suggest a plausible explanation for the neural effects underlying inattentional deafness: under high visual perceptual load, fewer processing resources are available to the auditory system, leading to a reduction in the sensory processing of sounds such that they do not reach awareness. To confirm that the reduced P3 is associated with reduced awareness of the tones under high visual load a further behavioral study, based on the same paradigm as used in the MEG experiment, was run. Participants performed a dual task, monitoring the visual display and subsequently (see Materials and Methods) reporting whether a tone was presented.
Performance on the visual task paralleled that observed in Experiment 1. The visual search task showed increased reaction times from low (mean, 710 ms; SD, 114) to high load (mean, 883 ms; SD, 124; t(1,9) = −7.3; p < 0.001). As in Experiment 1, there was also a reduction in accuracy from low load (mean, 94%; SD, 6.8) to high load (mean, 83%; SD, 5.7; t(1,9) = −5.3; p < 0.001). These findings confirm that visual search task performance was equivalent to that found for this task in the scanner. In keeping with the instructions to treat the auditory detection task as secondary, there was no effect of the presence of the auditory stimulus on either accuracy (low load, t(1,9) = −0.7, p = 0.49; high load, t(1,9) = −0.8, p = 0.42) or reaction times (low load, t(1,9) = 0.6, p = 0.54; high load, t(1,9) = 0.2, p = 0.84) on the primary visual search task.
In the auditory detection task, participants showed significantly reduced detection accuracy rates under high-load (mean, 88%; SD, 5.8) compared to low-load (mean, 92%; SD, 5.3; t(1,9) = 2.9; p < 0.05) conditions, as well as reduced detection sensitivity (d′) under high load (mean, 2.6; SD, 0.7) compared to low load (mean, 3.2; SD, 0.7; t(1,9) = 3.7; p < 0.01). Their response criterion (β) did not differ significantly between low (mean, 3.2; SD, 3.4) and high load (mean, 1.3; SD, 0.8; t(1,9) = 2.0; p = 0.07). These data demonstrate that the load task used in Experiment 1 does indeed impact rates of awareness and conversely inattentional deafness—listeners were less likely to detect the tones when these were presented during the high-load task, relative to a low-load visual task (Raveh and Lavie, 2015).
Discussion
The present findings establish the neural underpinnings of inattentional deafness under load. Auditory cortical responses to irrelevant tones presented during performance of a visual search task were clearly found in conditions of low load in the task, whereas higher perceptual load in the task reduced the evoked response both at an early stimulus-processing stage (reflected in the modulation of the aM100) and during later processing (revealed by reduced amplitude of the P3) associated with awareness. Results from a behavioral experiment that assessed the effects of the visual search task on auditory detection confirmed reduced detection sensitivity of the auditory stimuli under the high-load (vs low-load) conditions. Thus, focusing attention on a perceptually demanding visual task leads to reduced availability of neural resources required for perception of a simple auditory stimulus, resulting in reduced detection and leading to the experience of inattentional deafness.
Importantly, the timing and reversed direction of the load effects on visual and auditory responses is precisely what load theory predicts on the basis of shared capacity. The increased visual processing in high load was only apparent during a specific, early portion (vM100) of the evoked response, and it was during this time that the auditory response was lower under high compared to low visual load. This is supported by time frequency analysis, which established no effects of auditory suppression in either prestimulus or poststimulus stages. Thus, rather than simply dampening all auditory activity in conditions of high visual load, the cost to auditory processing occurs only when the visual system incurs a very high demand for perceptual resources. This trade-off between visual and auditory activation is consistent with the capacity limited “push–pull” mechanism envisaged in load theory (Pinsk et al. 2004; Lavie, 2005) and supports the conclusion that the task involves a temporally focused draw on perceptual resources shortly after the presentation of the display. That the impact of increased perceptual load on auditory responses occurred during the vM100 onset response suggests that the sharing of perceptual resources has an early locus, occurring even during stimulus encoding.
The findings that the effects of load modulation were localized in associative auditory cortex (e.g., MTG/STS) offer a plausible account for the source of shared audiovisual capacity. Since associative auditory cortex is known to mediate integration of audiovisual stimuli (for review, see Calvert, 2001), the finding that the evoked response to the tone in this area was significantly modulated by visual perceptual load suggests these regions of associative auditory cortex as a possible site for cross modal audio visual capacity limits.
Increased visual load also abolished the P3 (∼230–290 ms after onset), a response commonly hypothesized to reflect awareness of a stimulus (Comerchero and Polich, 1999; Kok, 2001). The response seen here under low load is likely to be a P3a, which is a relatively early P3 associated with involuntary shifts of attention (Comerchero and Polich, 1999). This is consistent with the load theory proposal that attention is involuntarily allocated to task-irrelevant stimuli in conditions of low load that leave spare capacity.
Although the P3a is typically associated with a frontal source (McCarthy et al., 1997; Escera et al., 1998; Opitz et al., 1999), it has also been shown that P3a generated by an auditory stimulus receives a contribution from auditory cortex (Escera et al., 1998; Opitz et al., 1999). The P3 response identified here appears consistent with this temporal generator, both in terms of its timing and the source in STG. Whereas the frontal generators of the P3a are known to be affected by attention (Comerchero and Polich, 1999; Koivisto et al., 2009), the contribution from auditory cortex was previously believed to be automatic (Escera et al., 1998). The present results, demonstrating a load effect on STG activity during the P3a, suggest that this auditory cortical contribution to the P3a may also be sensitive to the effects of perceptual load.
The amplitude of the P3 to oddball stimuli in a secondary task has been used to quantify resource sharing during dual task conditions (Isreal et al. 1980; Wickens et al., 1983; Kramer et al., 1985), with larger P3 responses to the secondary task when it receives a higher priority. The data presented here highlight that even when stimuli are entirely irrelevant to the task, the P3 may be a useful measure of the availability of “leftover” processing resources, as determined by the level of perceptual load in the attended task.
The present findings shed light in the controversy over whether visual load can affect auditory processing (Otten et al., 2000; Müller et al., 2002; Dyson et al. 2005; Restuccia et al., 2005; Muller-Gass et al, 2006, 2007; Sculthorpe et al., 2008; Parks et al. 2011; Chait et al. 2012). The mixed findings could be due to the fact that, as demonstrated here, the load effect is time sensitive. Depending on the specifics of the task, the peak effect of visual load could occur earlier or later during the processing of the auditory stimuli, or even miss the window of auditory processing altogether, which is particularly likely when extended visual tasks are used. Furthermore, previous paradigms often did not require consistently focused attention to the visual task, and offered the opportunity for task switching to the auditory stimuli. The paradigm used in the present experiments, where the visual task required brief, very focused attention and coincided directly with the auditory stimuli is ideally suited for revealing and elaborating the temporal dynamics of visual attentional load on the neural processing and subsequent awareness of concurrently presented acoustic input.
Our findings give crucial insight into the mechanism of sensory processing in the brain. In situations with numerous sources of sensory information, limits on our perceptual resources can cause our system to become overloaded, leading to reduced processing of stimuli that are not directly relevant to the current task and resulting in inattentional blindness and deafness. That these limits apply across sensory systems has implications for models of attention, and also the understanding of perception and behavior in busy, real-life situations, when multisensory information competes for processing resources.
Footnotes
This study was supported by a Wellcome Trust project grant to M.C. (093292/Z/10/Z) and an Economic and Social Research Council studentship to K.M. (ES/J500185/1). We thank the radiographer team at the UCL Wellcome Trust Centre for Neuroimaging for excellent MEG technical support and Alain de Cheveigné for assistance with the DSS data analysis.
The authors declare no competing financial interests.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
- Correspondence should be addressed to Maria Chait, UCL Ear Institute, 322 Grey's Inn Road, London WC1X 8EE, UK. m.chait{at}ucl.ac.uk
This article is freely available online through the J Neurosci Author Open Choice option.