Abstract
The debate on the neural basis of multitasking costs evolves around neural overlap between concurrently performed tasks. Recent evidence suggests that training-related reductions in representational overlap in fronto-parietal brain regions predict multitasking improvements. Cognitive theories assume that overlap of task representations may lead to unintended information exchange between tasks (i.e., crosstalk). Modality-based crosstalk was suggested as a source for multitasking costs in multisensory settings. Robust findings of increased costs for certain modality mappings may be explained by crosstalk between the stimulus modality in one task and sensory action consequences in the concurrently performed task. Whether modality-based crosstalk emerges from representational overlap in general fronto-parietal multitasking regions or modality-specific regions is not known yet. In this functional neuroimaging study, we investigate neural overlap during multitasking performance in humans, focusing on modality compatibility by employing multivariate pattern analysis and modality-specific practice interventions in three groups (total N = 54, 24 females). We observed significant differences between modality compatible and modality incompatible single-task representations, specifically in the auditory cortex but not in fronto-parietal regions. Notably, improved auditory decoding accuracy related to modality incompatible tasks was predictive of performance gains in the corresponding dual task along with complete elimination of modality-specific dual-task costs. This predictive relationship was evident only in the group practicing modality incompatible mappings, suggesting that specific practice on task sets with modality overlap influenced both neural representations and subsequent multitasking performance. This study contributes to the integration of cognitive theory and neuroscience and the role of task representations in dual-task interference.
Significance Statement
In a society dominated by multitasking, understanding its neurocognitive basis and plasticity is crucial for key aspects of everyday tasks. We investigate the neural mechanisms behind multitasking limitations, offering insights for targeted cognitive interventions. The study builds upon established theories of cognitive multitasking and imaging research, addressing the concept of modality-based crosstalk—the unintended exchange of modality-based information between tasks. Through functional brain imaging and pattern analysis, we examined how neural task representations contribute to performance costs in dual tasks with varying degrees of modality overlap. Notably, our findings demonstrate a practice-related decrease in neural overlap which is associated with substantial multitasking improvements, specifically in the auditory cortex, emphasizing the contribution of sensory regions to flexible multidimensional task representations.
Introduction
Human limitations in multitasking are significant and can lead to safety-relevant consequences in everyday life, for example, when using a mobile phone while driving. A long-held debate relates to the question of whether performance costs in multitasking emerge based on the neural overlap of concurrently performed tasks (Klingberg, 1998; Just et al., 2001). Recent theories focusing on fundamental computational dilemmas (i.e., sharing vs. separation of neural representations) support the idea that representational overlap constrains human multitasking (Badre et al., 2021; Musslick and Cohen, 2021; Garner and Dux, 2023), consistent with evidence from a multivariate imaging study (Garner and Dux, 2015). This study revealed that multitasking training reduces the overlap of concurrent task representations in fronto-parietal brain regions which predicts training improvements in multitasking (Garner and Dux, 2015).
The overlap of task representations can result in the non-intentional exchange of information between tasks, called central crosstalk (Navon and Miller, 1987; Logan and Gordon, 2001; Koch, 2009; Janczyk et al., 2014). Crosstalk may lead to between-task benefits or interference at different task levels, which is supported by behavioral and neural research (Lien and Proctor, 2002; Halvorson and Hazeltine, 2015; Koch et al., 2018; Paas Oliveros et al., 2023). Recently, modality-based crosstalk was suggested to underlie increased multitasking costs when comparing dual tasks with different modality mappings (Hazeltine et al., 2006; Schacherer and Hazeltine, 2020). For example, visual-manual and auditory-vocal (i.e., modality compatible) modality mappings produce consistently lower dual-task costs than visual-vocal and auditory-manual (i.e., modality incompatible) mappings (Fig. 1B) (Hazeltine et al., 2006; Stelzel et al., 2006; Göthe et al., 2016). Modality-based crosstalk refers to interference between the stimulus modality in one task (i.e., auditory stimulus) and sensory action consequences (i.e., auditory action effect of vocal response) in the concurrently performed task, while the stimuli and response modalities do not overlap between tasks.
Study design, modality mappings and behavioral results. A: Overall study design with two sessions. Resting state data and subjective items (before and after each part) will be reported elsewhere. B: The upper part of the figure shows the stimulus-response pairings for the modality compatible mapping, comprising a visual stimulus with the manual response and the auditory stimulus combined with the vocal response. For each response, the corresponding natural action effect is depicted as well. Note that the action effect of the manual response is not exclusively visual but also somatosensory. Likewise, the action effect of the vocal response is typically auditory (i.e., hearing oneself speaking) but also somatosensory (i.e., feeling one’s mouth move). Even though participants might not hear their own voice very clearly due to earphones and scanner noise, the important aspect is the learned and anticipated action effect which is mainly auditory for the vocal response. The lower part depicts the modality incompatible mapping. The visual stimulus is paired with a vocal response and the auditory stimulus with a manual response. In this condition, the match between action-effect modality and stimulus modality is between tasks, potentially causing interference due to higher overlap. C: The graph shows distributions, boxplot, mean (black dot) and individual performance per modality mapping, timepoint and intervention group. Almost all individuals show the robust difference of multitasking costs between modality compatible and modality incompatible mapping. Multitasking cost is the difference between dual task and single tasks, measured as BIS, which is an integration of reaction times and accuracies, BIS = 0 means no difference between single and dual task performance). Comparing the Pre and Post timepoint, only the incompatible intervention group eliminated this robust difference after the practice intervention (middle panel on the right). All pairwise t-tests are corrected with the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995).
So far, the modality-based crosstalk assumption has been supported primarily by behavioral research (Hazeltine et al., 2006; Göthe et al., 2016; Schacherer and Hazeltine, 2020, 2021). It remains unclear how modality-based crosstalk evolves at the neural level and how this is affected by multitasking practice. Specifically, it is unknown whether representational overlap is present in general multitasking-related brain regions (i.e., fronto-parietal regions, as identified by previous research on multitasking training Garner and Dux, 2015 and by a meta-analysis Worringer et al., 2019) or rather in modality-specific sensory brain regions.
Assuming that crosstalk of modality-specific task features most likely involves modality-specific sensory regions as compared to supramodal regions (Garner and Dux, 2015), we hypothesized higher neural overlap between tasks with a modality incompatible mapping in sensory brain regions related to the sensory response-related action effects. Here, we used multivariate pattern analysis (MVPA) to decode single-task representations in functional magnetic resonance imaging (fMRI) data to test this.
Additionally, we expect that the degree of modality overlap contributes to multitasking performance. Based on the training literature (Garner and Dux, 2015), we predicted a practice-related decrease of neural overlap in sensory regions, specifically for modality incompatible tasks. Participants completed single and dual tasks during fMRI measurements before and after a dual-task practice intervention. The sample was randomly split into three practice-intervention groups (one per modality mapping and a passive control group), two completing the same dual tasks as during fMRI measurements for 80 min (Fig. 1A).
Previewing our results, we replicate the elimination of the substantial difference in behavioral dual-task costs between modality mappings after practicing the modality incompatible mapping (Mueckstein et al., 2022). We found a significant difference in decoding accuracy between the modality incompatible compared to the modality compatible mappings in the auditory region of interest. This supports the assumption of differences in representational overlap between modality mappings, thus extending previous behavioral and neural findings in the field. Additionally, only for participants completing the modality incompatible practice, a selective decrease in the neural overlap in the auditory region between the modality incompatible tasks was positively associated with individual performance improvements.
Methods
This study was pre-registered prior to data analyses (https://osf.io/whpz8). Accordingly, sections in the methods are mostly copied from the preregistration and shortened. We explicitly report any deviations.
Participants
The total sample of this study consisted of 71 healthy right-handed adults aged 18 to 30 years with German as their first language (or comparable level) and normal or corrected-to-normal vision. Exclusion criteria were any neurological or psychiatric diseases, current medical conditions that could potentially influence brain functions, past or present substance abuse (alcohol and drugs), a self-reported weakness in distinguishing left and right, and common contraindications for MRI scanning. Participants were excluded from the specific analysis if their head movement exceeded the threshold of 25% volumes with framewise displacement > 0.4 mm or if they committed more than 30% errors per run in more than three single task runs (one localizer run), or if the error rate during the practice intervention was higher than 50%. For the dual-task performance, due to the high error rate, we deviated from the pre-registration protocol and limited the 30% criteria to trials in which both stimuli were presented on the same side (i.e., congruent trials, averaged for both modality mappings) to ensure that participants were still on task, as incongruence of stimulus information (i.e., stimuli are presented on different sides) between tasks enhanced task difficulty in addition to modality compatibility (error rate for congruent stimuli: M = 17.25, SD = 14.90, incongruent stimuli: M = 44.92, SD = 30.04). An overview of the specific exclusion numbers and reasons for each analysis can be found on OSF (https://osf.io/5xbd3). All three groups were very similar in age and gender distribution (∼50% female) (see Table 1). All participants gave their written informed consent before the first session of the study and could choose between 60 € or course credit for reimbursement after completing all sessions. The ethics committee of the Freie Universität approved the study following the Declaration of Helsinki.
Participants age and sex ratio per intervention group and analysis
Experimental overview
Participants completed three sessions; the first session was held online and included behavioral and cognitive measures which will be reported elsewhere. A detailed description of this first session can be found in the preregistration of another project (https://osf.io/nfpqv). The remaining two sessions (each 2.5–3 h) took place at the Cognitive Center for Neuroscience Berlin (CCNB). During the second session (Session 1 in Fig. 1A), participants started outside the scanner with a short familiarization of the tasks (256 trials, 32 per single task, 64 per dual task) before they completed the session in the MRI. Participants returned to the CCNB for the final session (Session 2 in Fig. 1A) after a minimum of five and a maximum of nine days at the same time during the day. They repeated the shortened familiarization (128 trials, 16 per single task, 32 per dual task) and continued with the group-specific practice intervention for 80 min before they finished the study with the Post part in the scanner.
Behavioral tasks
Participants performed sensorimotor choice reaction tasks, either as single or dual tasks, with varying modality mappings (modality compatible or modality incompatible, compare Fig. 1B). In the visual domain, the presented stimuli were a white square (pixel size 56.8 × 56.8) on a black background at six different positions (top, center, bottom), three on the right side of a white fixation cross (pixel size 41.1 × 41.1, thickness 9.9 pixels) and three on the left. In the auditory domain, stimuli were pure tones in three different frequencies (200, 450, and 900 Hz), presented on either the right or the left ear. In the dual-task blocks, stimuli were presented simultaneously (SOA = 0 ms). Participants had to respond to the side of the stimuli by either pressing a button with their right or left hand (index finger) and/or by saying the German word for “right” and “left”. The pairing of stimulus and response modality determines the modality mapping. The combination of visual-manual and auditory-vocal is considered modality compatible and the combination of visual-vocal and auditory-manual modality incompatible. Consequently, there was no overlap in either response or stimulus modality within each dual-task condition.
Additionally, we manipulated the task difficulty in the single-task runs by adding visual noise to the stimuli, increasing the distance between the fixation cross and the stimulus, and reducing the contrast between the stimulus and background. For the auditory stimulus, we also added white noise and reduced the volume of the tone compared to the noise. We only included the easy blocks for the behavioral analysis and used the difficulty manipulation as a control analysis for the MVPA. Stimulus material is provided online (https://osf.io/w9hsu/). We randomized per participant the order of the dual-task runs (first modality compatible vs. first modality incompatible mapping) and the block position for the single tasks for each run. Within each block, each stimulus was presented equally often and in random order. To prevent a systematic confound from the visual appearance of the task instruction that was shown at the beginning of each block, we showed task instructions as either a small picture or a text, with different pictures and different fonts for each block to prevent any repetition in visual appearance. Deviating from the Preregistration, we decided against the separate analysis for reaction times and accuracy rates but used an integrated balanced score BIS (Liesefeld and Janczyk, 2019). The combined BIS parameter has the advantage of controlling for a potential speed-accuracy trade-off, as shown by Liesefeld and Janczyk (2019). Individuals might prefer different strategies, focusing either more on accuracy or more on speed. Using a combined parameter accounts for those potential differences. Additionally, analyzing only one parameter further increases the statistical power, and reduces the complexity thus increasing clarity of the analyses, compared to a separate analysis of reaction time and accuracy. The BIS parameter is calculated as the difference between z-standardized reaction times and accuracies.
All statistical analysis and plotting was done in R [version 4.2.2; R Core Team (2020)] with RStudio [version 2023.12.1; RStudio Team (2019)] and the tidyverse package [version 2.0.0; Wickham et al. (2019]. The manuscript was created with the papaja package [version 0.1.2; Aust and Barth (2023)].
Practice intervention
The practice intervention was completed outside the scanner and consisted only of dual-task trials, using the same stimuli, responses, and presentation timing as for the Pre- and Post measurement (compare the previous section Behavioral Tasks). For the compatible intervention group, only modality compatible dual-task trials were presented, and for the incompatible intervention group, only modality incompatible dual-task trials. Participants worked on seven runs, each consisting of four blocks with 64 trials per block (a total of 1792 trials). After each run, participants were asked about their subjective feelings in terms of focus, motivation, fatigue, and frustration. After completing the intervention, participants entered the scanner and completed another run of the intervention tasks (total of 256 trials), inside the scanner without measuring the brain activity. Participants in the passive control group paused for 80 min, meaning they were instructed to not engage in cognitively demanding tasks but were otherwise free to do what they liked. After the break, they started immediately with the Post part in the scanner.
MRI session
The first fMRI session consisted of a resting state scan and twelve task runs in a block design. Participants started with a 10-min resting state scan with eyes open in the scanner, followed by two runs of a localizer task, used to define the regions of interest (ROI), each including single and dual tasks in both modality mappings. Each run of the localizer tasks contained six blocks, while each block included 16 trials. In the Pre part the first two runs contained only dual-task trials, and each run was assigned to one modality mapping, consisting of 128 trials. The remaining eight runs contained only single-task trials in both modality mappings, with an easy and a difficult version of the tasks. In each run, every task, modality mapping, and difficulty combination occurred only once, resulting in eight blocks with 16 trials per block. All stimuli were presented for 200 ms, followed by a response interval of 1500 ms and an inter-stimulus interval of 200 ms. Each run concluded with an 8 s fixation period. The session lasted about 2.5 h. The Post fMRI session after the practice intervention was the same as the Pre session, starting with the two dual-task runs, followed by eight single-task runs. The session lasted about 3 h.
MRI data acquisition
Due to a scanner upgrade at the imaging center, the data was acquired with two different scanners. For both scanners, the same head coil and parameters were used. Each participant completed both sessions in the same scanner. The first 25 participants (10× passive intervention, 6× modality compatible intervention, 6x modality incompatible intervention, 3× only Pre-measurement) were measured with Siemens Magnetom TIM TRIO syngo 3T and the remaining participants with Siemens Magnetom 3.0T Prisma both with a 32-channel head coil. At the end of the first session, a high-resolution T1-weighted structural image was measured with 176 interleaved slices, 1 mm isotropic voxels; TE = 2.52 ms, TR = 1900 ms, FoV = 256 × 256 × 176 mm. Functional runs consisted of 139 whole-brain echo-planar images of 37 interleaved slices for the localizer task and the dual-task runs, and 183 whole-brain echo-planar images for each single task run. Each functional run was acquired with 3 mm isotropic voxels, TE = 30 ms, TR = 2000 ms, flip angle = 75°, FoV =192 × 192 × 133 mm. After each dual-task run, a gray-field mapping was measured (3 mm isotropic voxel, TE1 = 4.92 ms and TE2 = 7.38 ms; TR = 400 ms; FoV = 192 × 192 × 133 mm, flip angle = 60°). Participants received auditory stimuli via MRI-compatible headphones (Sensi-Metrics S14, SensiMetrics, USA). Visual stimuli were projected on a screen at the end of the bore, which participants could view through a mirror attached to the head coil. Their vocal responses were recorded via an MRI-compatible microphone (OptimicTM MEG, Optoacoustics, Israel) and the manual responses via MRI-compatible 4-button bimanual boxes (HHSC-2 × 2, Current Designs, USA).
MRI univariate data analysis and ROI definition
Data was converted into BIDS format using dmc2bids (Version 2.1.6; Boré et al., 2023) and preprocessed using fMRIprep [Version 21.0.2; Esteban et al. (2019)], including 3D motion correction and slice-time correction. BIDS transformed raw data was uploaded to OpenNeuro, including a tsv file indicating the scanner type for each participant (https://doi.org/10.18112/openneuro.ds005038.v1.0.1). All functional data were aligned to a generated reference image, co-registrated, and transformed to standard space. Anatomical T1-weighted data were resembled into standard MNI space. For more details, please see the generated output script provided by fMRIprep in the preregistration. BOLD runs of the localizer task were smoothed with SPM12 and 8 mm FWHM Gaussian. We used SPM12 to conduct the first-level analysis on all normalized BOLD runs using a block design and a general linear model separately for the localizer runs and the single task runs, the latter also separate for each timepoint (Pre and Post-timepoint). In the localizer model, we included six motion parameters (3× rotation, 3× translation) and the combined measure for head movement framewise displacement as regressors of no interest. For each participant, statistical parametrical maps with contrasts between the stimulus modalities (visual vs. auditory), response modalities (vocal vs. manual), and single vs. dual-task were generated. For the group analysis, the individual maps were averaged and voxel-wise tested with a one-sample t-test between the defined contrasts. A cluster-wise FWE-corrected significant threshold (p = .05) on the voxel level was used. Note that we restricted the cluster selection to the frontal lobe for the single vs. dual-task contrast, as previous dual-task studies (Schubert and Szameitat, 2003; Stelzel et al., 2006; Worringer et al., 2019) consistently showed frontoparietal activity when contrasting dual and single tasks. As the focus of the ROI analysis was on the sensory regions, we selected only one cluster in the frontoparietal, namely the highest in the lateral frontal cortex. Additionally, we added the fronto-parietal-subcortical cubic ROIs defined in a previous study on multitasking training (Garner and Dux, 2015) to compare our task-specific ROIs with those task-independent ones. Beta-images used to define the group-based activity clusters were uploaded to Neurovault (https://identifiers.org/neurovault.collection:16842).
The resulting highest activation clusters for each contrast (visual, auditory, manual, vocal, and frontal) were used to define the ROI . The clusters served as boundaries to determine the group-based activation peak. This voxel was defined as the center for a 10 mm sphere (compare Table 2 for peak coordinates). A group-based sphere for each contrast was defined as well. We included post hoc an individual differences approach and also identified individual peaks within the group clusters. We are aware that the sample size is not ideal for an individual differences approach but included several control measurements (i.e., compare results on individual level with group sphere and group cluster) to ensure the robustness of the findings). In preparation for the MVPA analysis, for the single-task model, we included only regressors of each single-task combination (visual-manual, visual-vocal, auditory-manual, auditory-vocal, each in both difficulty levels) but without motion as a regressor.
Center coordinates for spheric ROIs in MNI space
Multivoxel pattern analysis
After performing the first-level analysis on the single-task runs, we submitted the resulting subject-specific beta images, per run, single-task combination and based on the block-design to the Decoding Toolbox (Hebart et al., 2014) to create individual decoding maps for each modality mapping. We used the default methods in the Decoding Toolbox, the support vector machines (SVMs) as a decoding method, a leave-one-run-out-cross-classification, and an ROI analysis. This resulted in one decoding accuracy value for each ROI (auditory, visual, vocal, manual, frontal) and each modality mapping (modality compatible and modality incompatible) with a chance level of 50%. We compared our results with the spheric ROI defined by the group activation and with the whole activation cluster as ROI to ensure that our results are not dependent on the small spheres (compare Fig. 3). We also included a whole brain searchlight (radius 11 mm) to rule out the option that other brain regions not identified in the univariate analysis contained information about the modality mapping. This was not the case. Only auditory regions were significantly different for the two modality mappings (Fig. 3). To further ensure that our results were not influenced by differences in task difficulty and type of instruction (instruction as text or image), we employed a cross-classification on the different difficulty levels and types of instructions, respectively. Specifically, we trained the classifiers to decode between visual-manual-easy vs. auditory-vocal-difficult single tasks and tested on visual-manual-difficult vs. auditory-vocal-easy (respectively for the modality incompatible mapping). This procedure eliminates the influence of task difficulty on the decoding accuracy between the two tasks. In both analyses, the difference between the modality mappings in the auditory regions was still significant (pairwise t-test, corrected with the Benjamini–Hochberg procedure Benjamini and Hochberg, 1995). Those results ruled out the explanation that the classifier only differentiated between difficulty or instruction type.
Results
Influence of modality overlap on behavior and practice-related changes
We assessed behavioral performance as a balanced integration score (BIS) of reaction times and accuracies (Liesefeld and Janczyk, 2019) (see Table 3 for reaction times and error rates. Please find the statistics and graphs on the following document https://osf.io/tahds), to account for the dependence of the two parameters and to have a single parameter for the following correlational analyses with the neural decoding parameter. Dual-task costs were calculated as the difference between single and dual tasks, in which higher values of the BIS parameter indicate higher dual-task costs (i.e., higher reaction times and error rates in dual tasks than in single tasks).
Reaction times and error rates per task type, modality mapping, timepoint and intervention group
At baseline, all three groups showed a robust behavioral effect of modality mapping, F(1, 54) = 178.92, p < .001,
Neural overlap of task representations
We investigated the overlap of task representations at baseline in task-relevant regions with MVPA on subject-specific beta images from first-level single task analysis. We trained linear SVMs using a leave-one-run-out cross-classification (implemented in The decoding toolbox Hebart et al., 2014) to distinguish between the two single tasks from fMRI activity pattern in the modality compatible and in the modality incompatible mapping, respectively (compare Fig. 1B). Task-relevant regions were defined by task-related univariate clusters in two separate localizer runs in which participants performed the same single and dual tasks as in the main experiment in a block design (Fig. 1A. We contrasted the input modalities (visual vs. auditory), the output modalities (manual vs. vocal), and single vs. dual tasks (frontal region as multitasking specific region) to create five task-relevant clusters per hemispheres (gray cluster in Fig. 2A) as a basis for the ROI analysis. Within each cluster, we defined a sphere (10 mm radius) centered at the maximum voxel on the group level (orange sphere in Fig. 2A and peak coordinates in Table 2).
Neural task representation and the relation to performance. A: Decoding accuracy per modality mapping and brain region at Pre timepoint. Group-based clusters are shown in gray and corresponding group spheres (10mm radius) in orange. FP-SC ROIs are defined by Garner and Dux (2015) as cubic ROIs. The graphs show distribution, boxplot, mean (black dot and line) and individual values. The decoding accuracy in the auditory ROI for the decoding between visual-manual and auditory-vocal tasks (modality compatible) are significantly higher compared to the decoding between visual-vocal and auditory-manual tasks (modality incompatible). All pairwise t-tests are corrected with the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995). Similar effects were found on cluster and whole-brain level (see Figure 3. B: Correlation between multitasking improvements and decoding accuracy gains in the auditory region. Orange spheres (radius 10 mm) indicate individual peak activation within the group-based cluster. The correlation was only significant for participants who completed the modality incompatible intervention. Their multitasking performance improved from Pre to Post if they showed a reduced overlap between incompatible mapping tasks, indicated by higher decoding accuracy at the Post timepoint compared to Pre.
Control analysis for pattern analysis. A: Decoding accuracy per modality mapping and brain region at Pre timepoint. Decoded on group-based cluster, shown in gray in Figure 2A. The graphs show distribution, boxplot, mean (black dot and line) and individual paired values. We found descriptive higher decoding accuracy in the auditory ROI for the decoding between visual-manual and auditory-vocal tasks (modality compatible) compared to the decoding between visual-vocal and auditory-manual tasks (modality incompatible). All pairwise t-tests are corrected with the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995). B: We controlled for task difficulty to ensure that difficulty is not the primary source for the classifier. Here we trained the classifier with a cross-classification again on group-based spheric ROIs (orange dot in Fig. 2A) to distinguish visual-manual-easy and auditory-vocal-difficult and tested on visual-manual-difficult and auditory-vocal-easy. If differences in difficulty are the primary source of task classification, then the crossing would lead to systematic confusion (i.e., below-chance accuracies), which is clearly not the case. We found significant higher decoding accuracies in the auditory ROI for the modality compatible decoding compared to the modality incompatible decoding. C: We also controlled for the modality of task instruction, which was either an image or text. We applied the same cross-classification, which again showed that the primary source of the classification is not the type of instruction. Consistent with the other analysis we again found significant higher decoding accuracies in the auditory ROI for the modality decoding compared to the modality incompatible decoding.
Our results demonstrate that the trained classifiers can robustly distinguish between the two single tasks (i.e., visual-manual vs. auditory-vocal and visual-vocal vs auditory-manual, respectively) in both modality mappings in all the task-relevant regions, decoding accuracies for all ROIs are significantly above chance level (50%), all corrected t-tests p < .001. Note that using the t-test on decoding accuracies leads to a different interpretation than the comparison of standard brain activity. Applying it to decoding accuracies, a significant t-test shows, similar to a fixed-effect analysis, that there is an effect in at least one person and does not allow the inference that the effect is present in the population (Allefeld et al., 2016).
Remarkably, a pairwise t-test, corrected for multiple comparisons, revealed a significant difference between the representation of the modality mappings only in the auditory region, t(68) = 3.64, p = .003. Here, decoding accuracies were higher for modality compatible tasks (
To rule out that the results are only due to our ROI selection, we ran the same analysis in the whole group cluster (Fig. 3A), using a searchlight analysis on the whole brain (Fig. 4) and tested for potential influences of task difficulty (Fig. 3B) and task instruction (Fig. 3C). All analyses confirmed the difference in the auditory region, except on the cluster level where the difference is only numerical (p = .34). Accordingly, the results in the following results sections will focus on this auditory region, which selectively differentiates between the modality compatible and modality incompatible mappings. We further investigated whether the higher neural overlap at baseline is also associated with behavioral performance. Surprisingly, we found no significant correlation between the decoding accuracy and the dual-task costs at baseline, r[ − 0.15, 0.03], p > 0.27. This could be due to two points: Firstly, there might be low reliability of estimates for the behavioral performance in the Pre session. Secondly, the strength of decoding alone might have no primary relevance for the behavior.
WholeBrain searchlight. In addition to all ROI-based analysis, we also implemented a 11mm radius whole brain searchlight to ensure that we captured all relevant brain regions. The whole brain searchlight was applied separately to each modality mapping, then the difference maps between compatible and incompatible were tested against zero. The resulting cluster remained after FWE correction with p = .05, which highly overlaps with the auditory region used as ROI.
In sum, these findings demonstrate that the neural task representations of the two single tasks in the modality incompatible mapping overlap more in the auditory regions than in the modality compatible mapping, supporting the assumed difference in sensory task representation for modality compatible and modality incompatible tasks. For the first time, we provide evidence that the theoretical overlap of stimulus and action-effect modalities is also represented in neural single-task representations in sensory regions instead of general multitasking-related fronto-parietal regions.
Practice-related changes of neural task representations and their relation to multitasking performance
To further substantiate the role of sensory neural overlap for multitasking performance in modality mappings, we examined whether the difference in overlap in the auditory region changes with practice and if this change is related to behavioral change. As the functional organization of the brain is highly variable between individuals, we here used the individual maximum voxel within each group cluster in the localizer task to define individual spheric ROIs for these pre–post comparisons (Fig. 2B). Performance gains and changes in decoding accuracy were defined as the difference between pre and post-timepoint; higher values indicate a performance gain and an increase in decoding accuracy after the practice intervention, respectively.
We did not find a significant effect of time point in the task representation after the practice intervention for any practice group, (main effect timepoint, F(1, 60) = 0.01, p = .928,
This result confirmed that the significant correlation is not due to differences in single-task performance or head movement. Accordingly, the degree of separation of task representations in auditory regions after practicing a modality incompatible mapping for one session can be considered an important predictor for the elimination of modality-specific dual-task interference within a given session.
Discussion
While multimodality is a typical characteristic of most everyday multitasking situations, little is known about modality-specific multitasking costs, going beyond attentional or motor limitations. Here, we investigated how modality-based crosstalk between action-effect modality and stimulus modality (Schacherer and Hazeltine, 2020) evolves on a neural level. Specifically, we examined whether modality-specific neural overlap is coded in general multitasking-related brain regions or modality-specific sensory regions. We further elucidated how it affects multitasking performance, and how practice changes those representations. In line with the modality-based crosstalk assumption (Hazeltine et al., 2006; Schacherer and Hazeltine, 2020), we found a significant difference between modality compatible and modality incompatible single-task representations in modality-specific sensory brain regions (i.e., auditory cortex) and not in multitasking-related regions in frontal and parietal cortex. In addition, practice-related improvements in modality incompatible decoding accuracy were associated with performance gains in the modality incompatible dual task. Individuals who succeeded most in reducing modality-specific dual-task costs were those with the greatest sensory separation, supporting the assumed relevance of sensory representations for multitasking performance. This effect was only present for the group who practiced the modality incompatible mapping during the intervention, suggesting the build-up of highly specific task representations to deal with potential crosstalk.
For the first time, we provide evidence from neural data for the relation between dual-task crosstalk and sensory modalities, specifically in the auditory cortex. This complements previous findings which revealed multitasking-training-related changes of representational overlap in fronto-parietal regions (Garner and Dux, 2015). While it has been discussed that sharing representations may be advantageous for rapid learning and generalization, sharing also facilitates interference and crosstalk, which is reduced by segregation but at the cost of reduced generalizability to other task contexts (Musslick and Cohen, 2021; Garner and Dux, 2023). In their study, Garner and Dux (2015) addressed the representational basis of multitasking costs per se by exclusively investigating dual-task in relation to single-task performance. In our study, in contrast, we directly compared dual tasks with different degrees of modality overlap, addressing specifically the basis of modality-based crosstalk in the context of modality compatibility (Hazeltine et al., 2006; Stelzel et al., 2006). While reducing representational overlap in fronto-parietal cortex may improve the general ability to process two tasks simultaneously (Garner and Dux, 2015, 2023), reducing representational overlap in modality-specific regions seems to reduce modality-specific sources of multitasking-costs such as modality-based crosstalk.
An alternative account for the emergence of the modality-compatibility effect is simply the slower routing of information for modality incompatible mappings (e.g., Greenwald, 1970; Wang and Proctor, 1996) - non-preferred processing routes (e.g., auditory-manual, visual-vocal) may simply be slower and thus lead to greater dual-task interference. However, studies consistently report no difference between single tasks (e.g., Stelzel et al., 2006; Göthe et al., 2016) and persistent dual-task effects when single-task performance is explicitly matched (e.g., Experiment 3 in Hazeltine et al., 2006). In our data, the modality compatible single tasks were even slightly slower than modality incompatible single tasks while still showing the robust effect of modality compatibility in dual-task trials. Thus, on a behavioral level, the difference in dual-task costs can not be attributed solely to differences in single tasks, favoring the modality-based crosstalk account including the role of action effects. This is also in line with Schacherer and Hazeltine (2023) who tested specifically different explanations for modality-specific dual-task costs by manipulating the action effects of an auditory-manual task which was paired with a visual-manual task. They showed that adding auditory action effects, which do not interfere or overlap with the second response to the visual task, led to decreased dual-task costs without a change in single-task performance (Experiment 2), concluding that their results are best aligned with the crosstalk assumption. The decoding results of our study, where we did not artificially manipulate action effects but referred to the expected and learned sensory consequences of actions, can also not be explained by a (non-)preferred routing. This account does not explain why the degree of neural overlap in auditory regions is related to modality-specific dual-task costs. Further studies might shed light on the contribution of non-preferred routing between stimulus and response modalities by applying connectivity approaches to test whether stronger connections between regions involved in visual-manual and auditory-vocal tasks compared to visual-vocal and auditory-manual tasks are related to dual-task costs.
The role of sensory and motor regions for the representation of stimuli and/or responses and fronto-parietal cortex for representing task rules have been studied extensively (see review by Woolgar et al., 2016). Our data support the importance of the fronto-parietal cortex to distinguish between single tasks (significant above-chance decoding), but the decoding seems to be independent of the modality mappings. Previous dual-task research on the role of the fronto-parietal cortex was mostly focused on processes involved in dual-tasking, applying mainly univariate analyses (see review Worringer et al., 2019). Integrating these findings into representational approaches will be a challenge for future studies in the field. Likewise, asymmetries in modality-specific representations need to be considered in more detail. ln the present study, modality-specific effects were exclusively present in the auditory cortex without comparable effects in visual regions. One potential explanation for the significance of auditory brain regions associated with anticipated action effects of vocal responses is provided by the forward model for self-initiated movements. Following this model, sensory cortices receive a copy of a motor command (i.e., efference copy) while planning a movement and its effects (i.e., action effect) (Holst and Mittelstaedt, 1950; Wolpert, 1997; Ody et al., 2023). Several studies demonstrated high modulation of the auditory cortex as a consequence of speech. More specifically, already the anticipated auditory signal from speech changes activity in the auditory cortex when actually hearing it (Ford et al., 2005; Heinks-Maldonado et al., 2005; Niziolek et al., 2013). In contrast, Straube and colleagues used self-generated button presses and manipulated the multisensory consequences (visual and auditory). They provided evidence that a button press leads to a general preparation to process any following stimulus, irrelevant of the modality. In other words, the expected sensory outcome of a manual button press seems to be broader than the auditory action effect of a vocal response. This is also reflected in everyday experiences where pressing a button can result in a visual effect (i.e., turning the light on and off) or in an acoustic effect (i.e., pressing the doorbell). Whereas speech always results in an auditory effect. It might be that the action effects of a button press are more distributed over the cortex and thus more difficult to decode in sensory regions. On the other hand, the specific action effect of speech can reliably be decoded in the auditory cortex.
Importantly, action effects are a matter of learning. According to the ideomotor theory, the process of action selection is based on the sensory effects of this action, suggesting that there is a bidirectional connection between the action and the action effects (Greenwald, 1970; James, 1890; Prinz, 1997). Several studies provided evidence that the association between action and the action effect is not necessarily hard-wired but can be learned and thus affect task performance (Kühn et al., 2010; Schacherer and Hazeltine, 2021, 2023). For example, Kühn and colleagues established images of faces as an artificial action effect of one button press and images of houses for another button press. After this practice phase, the button press was sufficient to activate the neural representations of the previously paired types of images without actually presenting them (Kühn et al., 2010). They concluded that not only action effects guide action selection, but also an action itself activates a corresponding perceptual representation.
Our study provides additional evidence that the connection between action, action effect, and stimulus is relevant for task performance and can be changed, even during a comparatively short practice intervention. Consequently, we assume that in the modality incompatible intervention group, participants have learned to transiently overwrite highly learned modality-specific associations presumably by suppressing the interfering action effect representation and/or by building up a new one. This led to increased performance after the practice intervention, associated with better decodability between single-task representations and decreased performance for the non-practiced modality compatible mapping. Future studies may address the dynamics of suppression and/or building a new association in more detail, and explore further how stable those associations are across time.
Taken together, we could provide evidence that not only fronto-parietal regions but also sensory regions hold information about task representations, including action effects, which may be subject to crosstalk in a multisensory multitasking context. These findings reveal for the first time in humans that the neural representation of tasks in a multimodal setting is malleable through multitasking practice at the individual level.
Footnotes
We thank Elisa Arnold, Friederike Glueck, Gregory Gutmann, Lea Lowak, Max Nowaczyk and Oliver Stegmann for assisting in data collection and preprocessing of the vocal data. Neuroimaging was performed at the Cognitive Center for Neuroscience Berlin and was technically supported by Christian Kainz and Till Nierhaus. This work was financially supported by the German Research Foundation, Priority Program SPP 1772 [grant numbers: STE 2226/4-2; GR 3997/4-2; HE 7464/1-2; RA 1047/4-2].
The authors declare no competing financial interests.
- Correspondence should be addressed to Marie Mueckstein at mariemueckstein{at}gmail.com.