Abstract
hMT+/V5 is a region in the middle occipitotemporal cortex that responds preferentially to visual motion in sighted people. In cases of early visual deprivation, hMT+/V5 enhances its response to moving sounds. Whether hMT+/V5 contains information about motion directions and whether the functional enhancement observed in the blind is motion specific, or also involves sound source location, remains unsolved. Moreover, the impact of this cross-modal reorganization of hMT+/V5 on the regions typically supporting auditory motion processing, like the human planum temporale (hPT), remains equivocal. We used a combined functional and diffusion-weighted MRI approach and individual in-ear recordings to study the impact of early blindness on the brain networks supporting spatial hearing in male and female humans. Whole-brain univariate analysis revealed that the anterior portion of hMT+/V5 responded to moving sounds in sighted and blind people, while the posterior portion was selective to moving sounds only in blind participants. Multivariate decoding analysis revealed that the presence of motion direction and sound position information was higher in hMT+/V5 and lower in hPT in the blind group. While both groups showed axis-of-motion organization in hMT+/V5 and hPT, this organization was reduced in the hPT of blind people. Diffusion-weighted MRI revealed that the strength of hMT+/V5–hPT connectivity did not differ between groups, whereas the microstructure of the connections was altered by blindness. Our results suggest that the axis-of-motion organization of hMT+/V5 does not depend on visual experience, but that congenital blindness alters the response properties of occipitotemporal networks supporting spatial hearing in the sighted.
SIGNIFICANCE STATEMENT Spatial hearing helps living organisms navigate their environment. This is certainly even more true in people born blind. How does blindness affect the brain network supporting auditory motion and sound source location? Our results show that the presence of motion direction and sound position information was higher in hMT+/V5 and lower in human planum temporale in blind relative to sighted people; and that this functional reorganization is accompanied by microstructural (but not macrostructural) alterations in their connections. These findings suggest that blindness alters cross-modal responses between connected areas that share the same computational goals.
Introduction
In everyday life, moving events are often perceived simultaneously across vision and audition. The brain must therefore exchange and integrate visual and auditory signals to create a unified representation of motion. In the visual system, hMT+/V5, a region in the middle occipitotemporal cortex has long been known for its preferential tuning to visual motion (Watson et al., 1993; Tootell et al., 1995) and its axis-of-motion columnar organization that supports visual direction selectivity in human and nonhuman primates (Albright et al., 1984; Diogo et al., 2003; Zimmermann et al., 2011). In the auditory system, the human planum temporale (hPT), engages preferentially in the processing of moving sounds (Baumgart et al., 1999; Krumbholz et al., 2005). Similar to what was found in hMT+/V5 for the processing of visual directions, hPT also codes for the direction of sounds following an axis-of-motion organization (Battal et al., 2019). In addition to their similar coding organization, hMT+/V5 and hPT are part of a network involved in the processing and integration of audiovisual motion information (Gurtubay-Antolin et al., 2021). For instance, in addition to its well documented role for visual motion, hMT+/V5 also responds preferentially to auditory motion (Poirier et al., 2005) and contains information about planes of motion using a similar representational format in vision and audition (Rezk et al., 2020). Examining how audiovisual motion networks develop in people with no functional vision presents a unique opportunity to assess how changes in sensory experience shape multisensory networks.
In case of early visual deprivation, hMT+/V5 shows enhanced response to auditory motion (Poirier et al., 2006; Wolbers et al., 2011; Jiang et al., 2014; Dormal et al., 2016). Which part of the hMT+/V5 shows selectivity to auditory motion in both sighted and blind individuals and which part reorganizes in the early blind (EB) remains, however, a matter of debate. Studies in sighted people have shown that the anterior portion of hMT+/V5, called MTa, shows motion selectivity in both vision and audition (Battal et al., 2019; Rezk et al., 2020). It is therefore possible that blindness triggers a posterior extension of this multisensory motion-selective region, but this was never formally tested. Moreover, whether hMT+/V5 is selectively involved in the processing of auditory motion or whether this region engages with spatial hearing in general remains unknown.
The impact of early acquired blindness on temporal regions typically coding for sounds also remains controversial. Some studies suggested an expansion of tonotopic areas and increased response to sounds in the temporal cortex of blind people (Elbert et al., 2002; Gougoux et al., 2009), while others suggested a reduced involvement of hPT for the processing of moving sounds (Dormal et al., 2016; Jiang et al., 2016).
Recent advances in multivariate pattern analyses (MVPAs) provide the opportunity to go beyond the observation of motion selectivity in a brain region but also probe for the presence of directional information encoded in a region, and whether the region codes spatial sounds following an axis-of-motion organization (Albright et al., 1984; Battal et al., 2019; Rezk et al., 2020). No study so far, however, investigated whether the axis-of-motion organization known to be implemented in hPT and hMT+/V5 is altered in people with early blindness.
Previous studies in nonhuman primates (Palmer and Rosa, 2006; Majka et al., 2019) and recent work in humans (Gurtubay-Antolin et al., 2021) provided evidence for the putative existence of direct occipitotemporal white matter connections between visual and auditory motion-selective regions, which may support the exchange of spatial/motion information across audition and vision. The impact of visual deprivation on these connections remains unexplored.
The present study aimed to functionally and structurally investigate how early blindness affects brain networks involved in spatial hearing. Our first aim was to clarify whether and how auditory motion direction and sound source location are represented in the hMT+/V5 and hPT in both sighted and blind individuals. Second, relying on diffusion-weighted imaging, we investigated whether early blindness affects the macrostructure and microstructure of hMT+/V5–hPT white matter connections.
Materials and Methods
Participants
Sixteen EB participants and 18 sighted control subject (SC) participants were recruited for the study. Participants were matched for age and gender. Sighted participants also participated in an independent visual motion localizer task. One SC participant was excluded because of poor performance on the task within the scanner, and another SC participant was excluded because of excessive motion during the scanning session. This resulted in a total of 32 participants included in the analyses: 16 early blind participants (8 females; age range, 20–46 years; mean ± SD age, 33.7 ± 7.2 years) and 16 sighted participants (8 females; age range, 20–42 years; mean ± SD age, 31.8 ± 5.7 years). An additional 17 sighted participants (10 females; age range, 20–41 years; mean ± SD age, 28 ± 5.3 years) participated in an independent auditory motion localizer experiment. Thirteen of the 16 early blind participants (7 females; mean ± SD age, 31.2 ± 5.4 years) and 12 of the 16 sighted participants (7 females; mean ± SD age, 32.4 ± 6.2 years) underwent diffusion-weighted magnetic resonance imaging (dMRI). Additionally, three sighted participants that did not conduct any functional localizer task were included in the diffusion analyses.
In all cases, blindness was attributed to peripheral deficits with no additional neurologic problems (Table 1, characteristics of blind participants). All of the blind participants had lost sight or had visual problems since birth that evolved toward complete blindness before 4 years of age. Seven blind participants had faint light perception without pattern or color vision. Sighted participants had normal or corrected-to-normal vision. Experiments were undertaken with the understanding and written consent of each subject. All the procedures were approved by the research ethics boards of the Center for Mind/Brain Sciences (CIMeC) and the University of Trento, and in accordance with The Code of Ethics of the World Medical Association, Declaration of Helsinki (Rickham, 1964).
The functional and structural data from sighted participants have been published in two previous studies (Battal et al., 2019; Gurtubay-Antolin et al., 2021). The present study addresses a separate question: how the brain networks supporting spatial hearing change in the case of congenital blindness by acquiring new data on this population. The sighted data used in previous work here mostly serve as a control group to be compared with blind people.
Experimental design
Auditory stimuli.
To infer the direction or location of sounds in the horizontal plane of the head, the auditory system extracts interaural differences in arrival time and sound level (Middlebrooks and Green, 1991; Blauert, 1997). Up-down localization notably relies on the interaction of sound waves within the pinnae, resulting in idiosyncratic direction-dependent spectral acoustic filters (Musicant and Butler, 1984; Wightman and Kistler, 1989; Middlebrooks and Green, 1992; Hofman et al., 1998). To create an externalized ecological sensation of sound source location and motion (Møller et al., 1996), we relied on individual in-ear stereo recordings that were recorded in a semianechoic room and from 30 loudspeakers on horizontal and vertical planes, mounted on two semicircular wooden structures with a radius of 1.1 m (Fig. 1A; Battal et al., 2020). Binaural in-ear recordings allow binaural properties, such as interaural time and intensity differences, as well as participant-specific monaural filtering cues, and serve to create reliable and ecological auditory space (Pavani et al., 2002). Participants were seated in the center of the apparatus with their head on a chin rest, such that the speakers on the horizontal plane were at the participant's ear level and those on the vertical plane were aligned with the participant's mid-sagittal plane.
Auditory stimuli were prepared using a custom-designed MATLAB script (R2013b, MathWorks). During the presentation of stimuli, the audio was recorded using binaural in-ear omnidirectional microphones (“flat” frequency range, 20–20,000 Hz; model SP-TFB-2, Sound Professionals) attached to a portable Zoom H4n digital wave recorder (16 bit, stereo, 44.1 kHz sampling rate). Microphones were positioned at the opening of a participant's left (L) and right (R) auditory ear canals. Then, these recordings were replayed to the participants when they were inside the functional MRI (fMRI) scanner. By using in-ear recordings, auditory stimuli automatically convolved with each individual's own pinna and head-related transfer function to produce a salient auditory perception in external space. The recorded auditory stimuli were used in both the main auditory experiment and the auditory motion localizer. All participants wore a blindfold throughout the auditory recordings and were instructed to keep their eyes closed. Before the recordings, the sound pressure level (SPL) was measured from the subject's head position and ensured that each speaker conveys 65 arbitrary dB SPL.
Stimuli recordings.
Sound stimuli consisted of 1250 ms pink noise (50 ms rise/fall time). In the motion condition, the same pink noise was presented moving in the following four directions: leftward, rightward, upward, and downward. Moving stimuli covered 120° of the space/visual field in horizontal and vertical axes. To create the perception of smooth motion, the 1250 ms of pink noise were fragmented into 15 equal length pieces with each 83.333 ms fragment being played every two speakers, and moved one speaker at a time, from the outer left to the outer right (rightward motion), or vice versa for the leftward motion. For example, for the rightward sweep, sound was played through speakers located at −60° and −52° consecutively, followed by −44°, and so on. A similar design was used for the vertical axis. This resulted in participants perceiving moving sweeps covering an arc of 120° in 1250 ms (speed, 96°/s; fade-in/fade-out, 50 ms) containing the same sounds for all four directions. The choice of the movement speed of the motion stimuli aimed to create a listening experience relevant to conditions of everyday life. Moreover, at such velocity it has been demonstrated that human listeners are not able to make the differences between concatenated static stimuli from motion stimuli elicited by a single moving object (Poirier et al., 2017), supporting the participant's report that our stimuli were perceived as smoothly moving (no perception of successive snapshots). In the static condition, the same pink noise was presented separately at one of the following four locations: left, right, up, and down. Static sounds were presented at the second-most outer speakers (−56° and +56° in the horizontal axis, and +56° and −56° in the vertical axis) to avoid possible reverberation differences at the outermost speakers. The static sounds were fixed at one location at a time instead of presented in multiple locations (Krumbholz et al., 2005; Smith et al., 2004, 2010; Poirier et al., 2017). This strategy was purposely adopted for two main reasons. First, randomly presented static sounds can evoke a robust sensation of auditory apparent motion (Strybel and Neale, 1994; Lakatos and Shepard, 1997; for review, see Carlile and Leung, 2016). Second, presenting static sounds located in a given position and moving sounds directed toward the same position allowed us to investigate whether moving and static sounds share a common representational format using multidimensional scaling (see below), which would have been impossible if the static sounds were randomly moving in space.
Participants were instructed to listen to the stimuli, without performing any task. Stimuli recordings were conducted in a session that lasted ∼10 min, requiring the participant to remain still during this period. All participants reported strong sensation of auditory motion and were able to detect directions and locations with high accuracy during the scanner session (Fig. 1C).
Auditory experiment
Participants completed a total of 12 runs. Auditory stimuli were presented via MR-compatible closed-ear headphones (Serene Sound, Resonance Technology), and the amplitude was adjusted according to each participant's comfort level. To familiarize the participants with the task, they completed a practice outside of the scanner while lying down until they reached >80% accuracy. All participants wore a blindfold with sterile gauze on top of the eyelids during the auditory task and were instructed to keep the eyes closed during the entire duration of the experiment. Each run consisted of the eight conditions (four motion and four static), randomly presented using a block design. Each condition was presented for a 15 s block [12 repetitions of each event of 1250 ms sound, no interstimulus interval (ISI)] and followed by a 7 s gap in which the participant had to indicate the corresponding direction/location in space and 8 s of silence (total interblock interval was 15 s). The ramp applied at the beginning and at the end of each sound creates static bursts and prevented adaptation to the static sounds. During the response gap, participants heard a voice saying “left,” “right,” “up,” and “down” in pseudorandomized order. Participants were asked to press a button with their right index finger when the direction or location of the auditory block matched the auditory cue (Fig. 1B). The number of targets and the order (positions 1–4) of the correct button press were balanced across conditions. This procedure was adopted to ensure that the participants gave their response using equal motor command for each condition and to ensure that the response is produced after the end of the stimulation period. Each scan consisted of one block of each condition, resulting in a total of eight blocks per run, with each run lasting 4 min and 10 s (100 volumes). The order of the blocks was pseudorandomized within each run, and across participants.
Auditory localizer
To localize regions responding to auditory motion, an independent group of sighted participants (n = 17) undertook an auditory motion localizer scan. Individual in-ear recordings of moving and static stimuli were presented in a blocked design. Each block contained 12 repetitions of 1200 ms sounds from one of the following eight conditions: four motion directions and four static locations. Stimuli within a block were separated by 100 ms ISIs, and each block was followed by a 6 s rest period. The localizer had one run and consisted of 13 repetitions of each condition block in a pseudorandom order. The scan lasted a total of 9 min and 48 s (235 volumes). Participants were instructed to indicate via button press with their right index finger when they detected a stimulus with a shorter duration (targets = 600 ms). The number of targets in each block was varied between one and three targets, with the location in the block randomized and balanced across conditions. Participants were familiarized with the task before the fMRI session and were blindfolded throughout the scan.
Visual hMT+/V5 localizer
To identify hMT+/V5, sighted participants undertook an independent visual motion localizer scan. Visual stimuli were back-projected onto a screen (width, 42 cm; frame rate, 60 Hz; screen resolution, 1024 × 768 pixels; mean luminance, 109 cd/m2 via a liquid crystal projector (model OC EMP 7900, Epson Nagano) positioned at the back of the scanner and viewed via mirror mounted on the head coil at a distance of 134 cm. Stimuli were 16 s of random-dot patterns, consisting of a circular aperture (radius, 4°) of radial moving and static dots (moving and static conditions, respectively) with a central fixation cross (Huk et al., 2002). One hundred twenty white dots (diameter of each dot, 0.1 visual degree) were displayed on a gray background, moving 4°/s. In all conditions, each dot had a limited lifetime of 0.2 s. Limited lifetime dots were used to ensure that the global direction of motion was determined by integrating local signals over a larger summation field rather than by following a single dot (Bex et al., 2003). Stimuli were presented for 16 s followed by a 6 s rest period. Stimuli within motion blocks alternated between inward and outward motion (expanding and contracting) once per second. Because the localizer aimed to localize the global hMT+/V5 complex (e.g., including both MT and MST subregions) the static block was composed of dots maintaining their position throughout the block to prevent flicker-like motion (Smith et al., 2006). The localizer consisted of 14 alternating blocks of moving and static dots (7 each) and lasted a total of 6 min and 40 s (160 volumes). To maintain the participant's attention and to minimize eye movement during acquisition throughout the run of the localizer, participants were instructed to detect a color change (from black to red) of a central fixation cross (0.03°) by pressing the response button with the right index finger.
Imaging parameters
Functional and structural data were acquired with a 4 T Bruker MedSpec Biospin MR scanner, equipped with an eight-channel head coil. Functional images were acquired with T2*-weighted gradient echoplanar sequence with fat suppression. Acquisition parameters included a repetition time (TR) of 2500 ms, an echo time (TE) of 26 ms, a flip angle (FA) of 73°, a field of view (FOV) of 192 mm, a matrix size of 64 × 64, and a voxel size of 3 × 3 × 3 mm. A total of 39 slices was acquired in ascending feet-to-head interleaved order with no gap. The three initial scans of each acquisition run were discarded to allow for steady-state magnetization. Before each EPI run, we performed an additional scan to measure the point-spread function of the acquired sequence, including fat saturation, which served for distortion correction that is expected with high-field imaging (Zeng and Constable, 2002).
A high-resolution anatomic scan (Papinutto and Jovicich, 2008) was acquired using a T1-weighted 3D MP-RAGE sequence (176 sagittal slices; voxel size, 1 × 1 × 1 mm; FOV, 256 × 224 mm; repetition time, 2700 ms; TE, 4.18 ms; FA, 7°; inversion time, 1020 ms). Participants were blindfolded and instructed to lie still during acquisition. Foam padding was used to minimize scanner noise and head movement.
Diffusion-weighted images were acquired using an EPI sequence (TR, 7100 ms; TE, 99 ms; image matrix, 112 × 112; FOV, 100 × 100 mm2; voxel size, 2.29 mm isotropic). Ten volumes without any diffusion weighting (b0 images) and 60 diffusion-weighted volumes with a b-value of 1500 s/mm2 were acquired.
Univariate fMRI analysis
Raw functional images were preprocessed and analyzed with SPM8 (Wellcome Centre for Human Neuroimaging, UCL, London, UK; http://www.fil.ion.ucl.ac.uk/spm/software/spm/) implemented in MATLAB R2014b (MathWorks). Before the statistical analysis, our preprocessing steps included slice time correction with reference to the middle temporal slice, realignment of functional time series, coregistration of functional and anatomic data, spatial normalization to an echoplanar imaging template conforming to the Montreal Neurologic Institute space, and spatial smoothing (Gaussian kernel, 6 mm FWHM).
Auditory experiment.
To obtain blood oxygenation level-dependent activity related to auditory spatial processing, we computed single-subject statistical comparisons with fixed-effect general linear model (GLM). In the GLM, we used eight regressors from each condition (four motion directions, four sound source locations). The canonical double-gamma hemodynamic response function implemented in SPM8 was convolved with a boxcar function to model the above-mentioned regressors. Motion parameters derived from realignment of the functional volumes (three translational motion and three rotational motion parameters), button press, and four auditory response cue events were modeled as regressors of no interest. During the model estimation, the data were high-pass filtered with a cutoff at 128 s to remove the scanner drift and low-frequency fluctuations from the time series. To account for serial correlation because of noise in the fMRI signal, an autoregressive (AR 1) model was used.
At the fixed-effect individual-subject level, to obtain activity related to auditory processing in the whole brain, the contrasts tested the main effect of each condition, as follows: left motion, right motion, up motion, down motion, left static, right static, up static, and down static. Next, to identify regions responding preferentially to the auditory motion and static stimuli, we compared the response of all motion conditions to all static conditions (motion > static, and static > motion). These linear contrasts generated statistical parametric maps [SPM(t)] that were further spatially smoothed (Gaussian kernel, 8 mm FWHM) before being entered into a second-level group analysis, using a random effect model, accounting for intersubject variance.
At the group level, a series of one-sample t tests was implemented to examine the main effects of each condition (motion, static) and motion processing (motion > static) for each group. A conjunction analysis isolated brain areas jointly activated for the contrast motion > static in both groups (EB and SC). Two-sample t tests were then performed to compare these effects between groups (SC > EB, EB > SC).
Statistical inferences were performed using familywise error (FWE) correction for multiple comparisons using p < 0.05 over the entire brain volume or over small spherical volumes (radius, 15 mm) located around regions of interest (ROIs; Table 1) using a minimal cluster size threshold of 20 contiguous voxels (Worsley et al., 1996). Significant clusters were anatomically labeled using the xjView MATLAB toolbox (http://www.alivelearn.net/xjview) or structural neuroanatomy information provided in the Anatomy Toolbox (Eickhoff et al., 2007).
Independent visual and auditory motion localizer: region of interest definition.
We used independent auditory and visual motion localizers to functionally define hPT and hMT+/V5 regions. Preprocessing steps were similar to whole-brain univariate analysis (see section Univariate fMRI analysis). Single-subject statistical comparisons were made using a fixed-effect GLM for each participant with two regressors (motion, static), and motion parameters (six regressors of no interest). The canonical double-gamma hemodynamic response function implemented in SPM8 was convolved with a boxcar function for each regressor. Motion parameters derived from realignment of the functional volumes (three translational motion and three rotational motion parameters); button press was modeled as a regressor of no interest. During the model estimation, the data were high-pass filtered with cutoff of 128 s to remove the scanner drift and low-frequency fluctuations from the time series. To account for serial correlation because of noise in fMRI signal, autoregressive (AR 1) was used.
One-sample t tests were conducted to characterize the main effect of motion processing (motion > static). This linear contrast generated statistical parametric maps that were further spatially smoothed (Gaussian kernel 8 mm FWHM) and entered a second-level group analysis using a random-effects GLM. Group-level peak coordinates of bilateral hMT+/V5 and hPT were defined by contrasting the main effects of localizer scan (motion vs static), surviving a whole-brain familywise error correction (p < 0.05). Peak coordinates from the auditory and visual motion localizers were used to create a sphere of 6 mm radius (110 voxels) around the following four ROIs: left hMT+/V5, right hMT+/V5, left hPT, and right hPT. The four ROIs were defined functionally but constrained by anatomic landmarks of the regions. hPT was selected within the triangular region lying caudal to the Heschl's gyrus on the supratemporal plane, while hMT+/V5 was constrained with the ascending limb of the inferior temporal sulcus (Zeki et al., 1991; Watson et al., 1993).
To prevent the possible spurious overlap between the visual and auditory responsive regions arising from smoothing (Jiang et al., 2014), we performed the statistical inferences on the β-parameter estimates that were extracted from the unsmoothed data.
Because defining the visual motion area hMT+/V5 is obviously impossible in EB, we opted for the peak coordinates from the visual motion localizers in sighted subjects in normalized in MNI to be applied to blind people similarly. We, then, extracted data from the same spatially aligned ROIs in both groups to perform functional univariate and multivariate analyses.
Multivariate pattern analyses
To investigate the presence of auditory motion direction and sound source location information, MVPAs were conducted within the independently defined hMT+/V5 and hPT regions (see localizer sections above). All further analyses were conducted on these regions for all sighted and blind participants.
Multiclass direction and location decoding.
To investigate motion direction and static location information in areas hMT+/V5 and hPT in sighted and blind participants, four classes of classifiers were trained and tested to discriminate between the response patterns of the four auditory motion directions and four sound source locations, respectively.
Preprocessing steps were identical to the steps performed for univariate analyses, except for functional volumes that were smoothed with a Gaussian kernel of 2 mm (FWHM). MVPA were performed in CoSMoMVPA [http://www.cosmomvpa.org/ (Oosterhof et al., 2016), which implements LIBSVM software (http://www.csie.ntu.edu.tw/∼cjlin/libsvm)]. A general linear model was implemented in SPM8, where each block was defined as a regressor of interest. A t-map was calculated for each block separately. Two multiclass linear support vector machine classifiers with a linear kernel with a fixed regularization parameter of C = 1 were trained and tested for each participant separately within each group. The two multiclass classifiers were trained and tested to discriminate between the response patterns of the four auditory motion directions and locations, respectively.
For each participant, the classifier was trained using a cross-validation leave-one-run-out procedure where training was performed with n – 1 runs and testing was then applied to the remaining run. In each cross-validation fold, an ANOVA-based feature selection was applied on the training data to select a subset of voxels (n = 110) that showed the most significant variation between the categories of stimuli (in our study, between orientations). The selected features were used for training and testing. This feature selection process not only ensures a similar number of voxels within a given region across participants, but, more importantly, identifies and selects voxels that are carrying the most relevant information across categories of stimuli (Cox and Savoy, 2003; De Martino et al., 2008); therefore, minimizing the chance to include voxels that carry noises unrelated to our categories of stimuli. The t-maps in the training set were normalized (z scored) across conditions, and the estimated parameters were applied to the test set. To evaluate the performance of the classifier and its generalization across all the data, the previous step was repeated 12 times where in each fold a different run was used as the testing data and the classifier was trained on the other 11 runs. For each region per subject, a single classification accuracy was obtained by averaging the accuracies of all cross-validation folds.
Binary direction and location decoding.
To investigate the preference of “axis of motion/space” in both hMT+/V5 and hPT, binary classifiers were used to discriminate brain activity patterns for motion direction within and across axes. Four binary classifiers were used to discriminate brain activity patterns for left versus right motion, left versus right static, up versus down motion, and up versus down static (hereafter called within-axis classification). We used eight additional classifiers to discriminate across axes [left vs up, left vs down, right vs up, and right vs down motion directions; left vs up, left vs down, right vs up, and right vs down sound source locations (hereafter called across-axes classification)].
To assess whether hMT+/V5 and hPT regions demonstrate axis-of-motion characteristic tuning for auditory motion, we averaged the two within-axes motion accuracies from leftward versus rightward, and upward versus downward binary classifications. We also averaged the motion accuracies from four across-axes motion binary classification. To assess the existence of axis-of-space, we performed averaging on the sound source location classification accuracies with two within-axes and four across-axes binary classifications to obtain one accuracy value per subject per axis.
Multidimensional scaling.
To visualize the pairwise decoding accuracies of motion directions and sound source locations, the 28 decoding accuracy values were represented in the form of a dissimilarity matrix. Each column and each row of the matrix represented one motion direction or sound location. The matrix elements were the pairwise decoding accuracies. The accuracy values were used as a dissimilarity measure. We used multidimensional scaling (MDS) to project the high-dimensional dissimilarity matrix space onto two dimensions with the pairwise decodings that were obtained from hMT+/V5 and hPT in both sighted and blind individuals. The 32 neural dissimilarity matrices (1 per participant) for each of the four ROIs were used as neural input for MDS visualization. Note that MDSs are for visualization purposes only and were not used for statistical inferences.
Statistical analysis.
Statistical analyses were performed in MATLAB (for nonparametric tests) and R [for repeated-measures ANOVAs, and linear mixed-effects models (LMMs)]. Tests for normality were conducted using a Shapiro–Wilk test. All the within-group comparisons were conducted using paired t tests.
Statistical significance in the multivariate classification analyses was assessed using nonparametric tests permuting condition labels and bootstrapping (Stelzer et al., 2013). Each permutation step included shuffling of condition labels and rerunning the classification, which was repeated 100 times on the single-subject level. Next, we applied a bootstrapping procedure to obtain a group-level null distribution that is representative of each group. For each group, from each subject's null distribution one value was randomly chosen and averaged across all the subjects. This step was repeated 100,000 times, resulting in a group-level null distribution of 100,000 values. The classification accuracies across subjects were considered as significant if the p value was <0.05 after corrections for multiple comparisons using the falser discovery rate (FDR) method (Benjamini and Yekutieli, 2001). The group comparison was also tested for significance by using permutation (100,000 iterations).
Classification accuracies were entered into a LMM, as computed through the lmer function in R (afex package; Singmann et al., 2021). We performed an LMM with group (EB, SC), condition (motion, static), region (hMT+/V5, hPT), and hemisphere (left, right) as fixed effects, and subject as a random effect.
To assess axis-of-motion preference across groups and across ROIs, the pairwise decoding accuracies were submitted to LMMs. For each ROI, we performed an LMM with group (EB, SC), condition (within axis, across axis), and hemisphere (left, right) as fixed effects, and subject as a random effect.
Diffusion data analysis
Preprocessing of diffusion data.
Data preprocessing was implemented in MRtrix 3.0 (Tournier et al., 2012; www.mrtrix.org), and in FSL version 5.0.9 (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSL). Data were denoised (Veraart et al., 2016), corrected for Gibbs ringing, Eddy current distortions, and head motion (Andersson and Sotiropoulos, 2016), and for low-frequency B1 field inhomogeneities (Tustison et al., 2010). Spatial resolution was upsampled by a factor of 2 in all three dimensions (1.15 mm isotropic) using cubic b-spline interpolation, and intensity normalization across subjects was performed by first deriving scale factors from the median intensity in select voxels of white matter, gray matter, and CSF in b = 0 s/mm2 images, and then applying these across each subject image (Raffelt et al., 2012a). This step normalizes the median white matter b = 0 intensity (i.e., non-diffusion-weighted image) across participants so that the proportion of one tissue type within a voxel does not influence the diffusion-weighted signal in another. The T1-weighted structural images were nonlinearly registered to the diffusion data in ANTs (Avants et al., 2008) using an upsampled FA map (1 × 1×1 mm3) and segmented in maps for white matter, gray matter, CSF, and subcortical nuclei using the FAST algorithm in FSL (Zhang et al., 2001). This information was combined to form a five-tissue type image to be used for anatomically constrained tractography in MRtrix3 (Smith et al., 2012). These maps were used to estimate tissue-specific response functions (i.e., the signal expected for a voxel containing a single, coherently oriented fiber bundle) for gray matter, white matter, and CSF using multishell multitissue (MSMT)-constrained spherical deconvolution (CSD; Jeurissen et al., 2014). Fiber orientation distribution functions (FODFs) were then estimated using the obtained response function coefficients averaged across subjects to ensure subsequent differences in the amplitude of fODFs will only reflect differences in the diffusion-weighted signal. Note that by using MSMT-CSD in our single shell, data benefitted from the hard non-negativity constraint, which has been observed to lead to more robust outcomes (Jeurissen et al., 2014). Spatial correspondence between participants was achieved by generating a group-specific population template with an iterative registration and averaging approach using FOD images from all the participants (Raffelt et al., 2011). Each subject's FOD image was registered to the template using FOD-guided nonlinear registrations available in MRtrix (Raffelt et al., 2012b). These registration matrices were used to transform the seed and target regions from native diffusion space to template diffusion space, where tractography was conducted. We chose to conduct tractography in the template diffusion space, as FOD-derived metrics of microstructural diffusivity can only be computed in that space. Subsequently, we extracted the following quantitative measures of microstructural diffusivity for all the fixels (fiber populations within a voxel) in the brain: fiber density (FD), fiber-bundle cross-section (FC) and a combined measure of FD and cross-section (FDC; Raffelt et al., 2017). For further details on these metrics, see the FOD-derived microstructural metrics section.
Preparation of hMT+/V5 and hPT for tractography.
As for the fMRI analyses, hMT+/V5 was functionally defined using the visual motion localizers in the 13 SCs included in the analysis of the diffusion data. To functionally define hPT, we used the peak coordinate of the motion versus static contrast in the auditory experiment. To avoid the location of hPT being biased toward SC, we used the conjunction of the SC and EB participants who underwent the fMRI and dMRI acquisitions.
We computed the warping images between the standard MNI space and the native structural space of each participant by conducting a nonlinear registration in ANTs (Gurtubay-Antolin et al., 2021). Using these registration matrices, we transformed the group peak coordinates from the standard MNI space to the native structural space of each participant. We then transformed these coordinates from the native structural space to the native diffusion space using the same transformation used to register the T1-weighted structural images to the diffusion data (described in the diffusion data preprocessing section). Once in native diffusion space, the peak coordinates were moved to the closest white matter voxel (FA, >0.25; Blank et al., 2011; Benetti et al., 2018; Gurtubay-Antolin et al., 2021), and a sphere of 5 mm radius was centered there. To ensure that tracking was done from white matter voxels only, we masked the sphere with individual white matter masks. Last, ROIs were transformed from native diffusion space to template diffusion space, where tractography was conducted.
Tractography: hMT+/V5–hPT connections.
Probabilistic tractography between hMT+/V5 and hPT was conducted for both hemispheres. In addition to the hPT, a region just anterior to hMT+/V5 (called hMTa; Rezk et al., 2020) is also selectively recruited for the processing of moving sounds (Poirier et al., 2005; Saenz et al., 2008; Battal et al., 2019; Rezk et al., 2020). To avoid the notion that hMT+/V5–hPT connections could include fibers connecting hPT with hMTa (which lies near hMT+/V5 and responds to auditory motion), individually identified hMTa (as localized by auditory motion localizer task) was used as an exclusion mask.
We computed tractography in symmetric mode (i.e., seeding from one ROI and targeting the other, and conversely). We then merged the tractography results, pooling together the reconstructed streamlines. We used two tracking algorithms in MRtrix (“iFOD2” and “Null Distribution2”). The former is a conventional tracking algorithm, whereas the latter reconstructs streamlines by random tracking. The iFOD2 algorithm (second-order integration over FODs) uses a Bayesian approach to account for more than one fiber orientation within each voxel and takes as input an FOD image. Candidate streamline paths are drawn based on short curved “arcs” of a circle of fixed length (the step size), at a tangent to the current direction of tracking at the current points (Tournier et al., 2010). The Null Distribution2 algorithm reconstructs connections based on random orientation samples, identifying voxels where the diffusion data are providing more evidence of connection than that expected from random tracking (Morris et al., 2008). We used the following parameters for fiber tracking: randomly placed 5000 seeds for each voxel in the ROI, a step length of 0.6 mm, FOD amplitude cutoff of 0.05, and a maximum angle of 45° between successive steps (Tournier et al., 2010, 2012). We applied the anatomically constrained variation of this algorithm, whereby each participant's five-tissue type, segmented T1 image provided biologically realistic priors for streamline generation, reducing the likelihood of false positive results (Smith et al., 2012). The set of reconstructed connections was refined by removing streamlines whose length was 2.5 SDs longer than the mean streamline length or whose position was >2.5 SDs away from the mean position, as in previous studies (Gurtubay-Antolin et al., 2021; Yeatman et al., 2012; Takemura et al., 2016). To calculate the distance of a streamline from the core of the tract, we resampled each streamline to 100 equidistant nodes and treat the spread of coordinates at each node as a multivariate Gaussian. The core of the tract was calculated as the mean of the x, y, and z coordinates of each fiber at each node.
Statistical analysis.
Statistical significance in the presence of hMT+/V5–hPT connections was assessed comparing the number of streamlines reconstructed by random tracking (Null Distribution2 algorithm), with those generated by conventional tracking (iFOD2 algorithm; Morris et al., 2008; McFadyen et al., 2019; Gurtubay-Antolin et al., 2021). We calculated the logarithm of the number of streamlines [log(streamlines)] to increase the likelihood of obtaining a normal distribution (Müller-Axt et al., 2017; Tschentscher et al., 2019), which was tested before application of parametric statistics using the Shapiro–Wilk test in RStudio (Allaire, 2015). The log-transformed number of streamlines were compared using two-sided paired t tests. To control for unreliable connections, we calculated the group mean and SD of the log-transformed number of streamlines for each connection, and we discarded participants whose values were >2.5 SDs away from the group mean for the respective connection. Connections were only considered reliable when the number of streamlines reconstructed with the iFOD2 algorithm were higher than the ones obtained with the null distribution algorithm. Significance was thresholded at p = 0.05, Bonferroni corrected for multiple comparisons (p = 0.025, two hemispheres).
Whole-tract analysis.
Differences between SCs and EBs in hMT+/V5–hPT connections were assessed using macrostructural and microstructural measures. Macrostructural characterization included the trajectory and the connectivity index, whereas we relied on diffusivity measures derived from fiber orientation distribution for the microstructure assessment.
To assess whether hMT+/V5–hPT connections followed the same trajectory in the SC and EB groups, we computed their spatial overlap by means of the Dice similarity coefficient (DSC) (Dice, 1945). The DSC measures the spatial overlap between regions A and B, and it is defined as DSC (A, B) = 2(A ∩ B)/(A + B) where ∩ is the intersection. We calculated the DSC of hMT+/V5–hPT connections, using as region A the sum of binarized tract density images of hMT+/V5–hPT connections (thresholded at six subjects) in standard space and in the SC group. Region B was the analogous image in the EB group.
The connectivity index was determined by the log-transformed number of streamlines from the seed that reached the target divided by the log-transformed product of the generated sample streamlines in each seed/target voxel (5000) and the number of voxels in the respective seed/target mask (Müller-Axt et al., 2017; Tschentscher et al., 2019; Gurtubay-Antolin et al., 2021), as follows:
Microstructure assessment relied on the FOD-derived quantitative metrics FD, FC, and FDC (fixel-based analysis pipeline in www.mrtrix.org). These estimates characterize a microstructure isolating the contribution of crossing fibers within the same voxels of interest, providing more accurate values than tensor-derived metrics such as fractional anisotropy (Behrens et al., 2007; Tournier et al., 2007; Raffelt et al., 2017). Briefly, FD is a quantitative measure of fiber density within a voxel, given that the integral of the FOD along a particular direction is proportional to the intra-axonal volume of axons aligned in that direction. It was calculated as the integral of the FOD along a particular direction, and it is sensitive to alterations at the fixel level (Raffelt et al., 2017). FC reflects changes in the intra-axonal volume of a fiber bundle that are manifested as a difference in the number of fixels in the spatial extent that the fiber bundle occupies (cross-sectional area). FCs for each fixel were calculated in the plane perpendicular to the fixel direction and were estimated by using the nonlinear warps used to register each subject's FOD to the template. Last, multiplying FD and FC, we computed the metric FDC, which combines both sources of information (within-voxel FD and changes in FC). These estimates were computed in all the voxels that were crossed by fibers connecting hMT+/V5 and hPT and then averaged for each bundle.
All measures (i.e., connectivity index, FD, FC, FDC) were compared between SCs and EBs using two-sided paired t tests, after testing for normality using the Shapiro–Wilk test. Significance threshold was set at p = 0.006 (α = 0.05 Bonferroni corrected for 2 hemispheres and four metrics) and subjects' values were >2.5 SDs away from the group mean were considered outliers. A possible lateralization of group differences in the microstructural metrics was tested by a 2 × 2 ANOVA with Group as a between-subject factor (2 levels: SC, EB) and Hemisphere as a within-subject factor (2 levels: R, L)
Along-tract analysis.
An along-tract analysis was performed on the reconstructed tracts using the Along Tract Statistics toolbox (Colby et al., 2012) in MATLAB (R2016b). A mean tract geometry (MTG) was estimated for each tract after resampling the streamlines to 50 points, so that all streamlines were composed by the same number of points. For each tract and participant, the mean of diffusion scalar values of corresponding points (i.e., streamline points assigned to the same MTG point) was computed per each MTG point, and individual along-tract profiles were estimated per each tract. To control for the possibility that group differences could be driven by scalar outliers, we excluded tracts whose mean scalar values deviated >2.5 SDs from the mean of each subject or deviated for >10 consecutive points from the group mean. To further exclude aberrant values, we rejected participants presenting profiles in which at five consecutive positions along the tract the profile values deviated ±2.5 SDs for the respective scalar analysis (in a group-wise fashion; Novello et al., 2018).
Per each of the considered diffusion scalars, a linear mixed-effects model was then adopted with fixed effects of group, position (i.e., MTG point), and a group and position interaction to assess possible diffusion scalar changes affecting the two groups differently. A group and position interaction was considered significant for p < 0.008 (α = 0.05. Bonferroni-corrected two hemispheres and three diffusion scalars). We adopted a permutation (n = 5000) approach to control the type 1 error and adjust p-values accordingly: a null distribution of maximum T-values across all MTG points under the null hypothesis of no group differences was empirically estimated by permuting group labels and recording the maximum observed T-statistics. For each MTG point, the group analysis T-statistics were then compared against the null distribution to get an adjusted p-value (Colby et al., 2012). Results were then considered significant for an FEW-corrected p value <0.05.
Brain–Behavior correlation analysis
We investigated the link between behavioral performance and neural activity of hMT+/V5 and hPT regions by performing between-subject Pearson's correlation on behavioral performance with (1) extracted β-parameter estimates, and (2) extracted decoding accuracies in both the EB and SC groups. We also investigated the link between behavioral performance and white matter microstructure by performing between-subject Pearson's correlation analysis.
Behavioral performance was measured as the accuracy of detecting motion directions and sound source locations during the fMRI session. For every ROI (left and right hemispheres in hMT+/V5 and hPT), we created a sphere of 3 mm radius around the group-level coordinates, extracted the mean β-parameter estimates for auditory motion and static conditions, and performed between-subject correlations.
Classification accuracies obtained from multiclass classification from each ROI of each subject was correlated with the overall performance of motion direction and sound source location discrimination. Statistical results were corrected for multiple comparisons using the FDR method (Benjamini and Yekutieli, 2001).
Results
Behavioral
Behavioral performance in all of the eight conditions in both groups was >80% of correct responses, demonstrating that we were able to trigger salient and reliable auditory motion/location percepts while the subjects were inside the scanner. To determine whether there were any differences between groups or conditions in the target detection task performed during the auditory experiment, accuracy scores were entered into a 2 × 2 × 4 repeated-measures ANOVA to test the interaction among group (EB, SC; between-subject factor), condition (motion, static; within-subject factor), and orientation (left, right, up, and down; within-subject factor). Importantly, this showed no main effect of group (F(1,30) = 0.401, p = 0.5), indicating that the overall accuracy, while detecting direction of motion or location of sound source, did not differ between the blind and sighted groups. There was a significant main effect of condition (F(1,30) = 11.49, p = 0.002), which was caused by higher accuracy in the motion condition compared with the static condition. There was a significant main effect of orientation (F(1.6,48.3) = 14.24, p < 0.001), caused by greater accuracy in the horizontal orientations (left and right) compared with the vertical orientations (up and down). Post hoc two-tailed t tests (p < 0.05, Bonferroni corrected for multiple comparisons) showed that this main effect was because of a significant difference between left orientation with up (t(15) = 5.22, p < 0.001) and down (t(15) = 3.87, p = 0.001) orientations, and between right orientation with up (t(15) = 5.17, p < 0.001) and down (t(15) = 3.81, p = 0.001) orientations. No condition × orientation interaction was observed.
Univariate analyses
Whole-brain analyses
To identify brain regions that are preferentially recruited for auditory motion processing in SC and EB groups, we performed a univariate whole-brain analysis. Figure 2 shows the response to motion and static auditory stimuli in EB and SC participants. Consistent with previous studies (Pavani et al., 2002; Warren et al., 2002; Poirier et al., 2005), a preferential response to auditory moving stimuli (motion > static) was observed for SC participants in the superior temporal gyri, bilateral hPT, precentral gyri, and anterior portion of middle temporal gyrus in both hemispheres (Fig. 2A). A similar response was observed in EB participants, with a reliable extension toward the occipital cortex (Fig. 2B).
To identify regions responding more to moving than static sounds in both EB and SC participants, we ran a conjunction (AND; Nichols et al., 2005) analysis [SC (motion > static) ∩ EB (motion > static)]. This showed that both groups activated the superior temporal gyrus, bilateral hPT, and the anterior portion of middle temporal gyrus bilaterally. The right middle temporal gyrus region partially overlapped with the functionally defined hMT+/V5 identified visually (motion > static) at the group level in SC participants (Fig. 2C, white outline).
To identify which regions activated more for moving than static sound in EB versus SC participants, we performed a two-sample t test [EB (motion > static) > SC (motion > static)]. This revealed enhanced activity for EB participants in regions including the precuneus, the cuneus extending into the intraparietal sulci, and bilateral posterior middle temporal gyrus (Table 2, whole-brain univariate analyses results).
ROI univariate analysis: ROI definition
To avoid the circularity that can arise from the selection of ROIs, more particularly “double dipping” (Kriegeskorte et al., 2009)—the use of the same dataset for selection and specific analysis—we independently localized visual and auditory motion-responsive areas. Whole-brain univariate analyses for independent visual and auditory motion localizers were performed to acquire the peak coordinates of hMT+/V5 and hPT, selective to visual and auditory motion, respectively. The obtained stereotactic MNI coordinates were as follows: L hMT+/V5: [−46, −72, −2]; R hMT+/V5: [40, −76, −2]; and L hPT: [−48, −30, 8]; R hPT: [62, −36, 10].
To investigate whether auditory motion and static sound elicited responses within ROIs across groups, β-parameters extracted from hPT and hMT+/V5 regions were entered into two separate 2 × 2 × 2 repeated-measures ANOVA, group (EB, SC) as the between-subject factor, and hemisphere (left, right) and condition (motion, static) as within-subject factors.
For the hPT region, we observed a main effect of condition (F(1,30) = 148.4, p < 0.001, η2 = 0.2). The main effect of condition was caused by a greater response to motion > static stimuli. The interaction of group × condition (F(1,30) = 5.6, p = 0.03, η2 = 0.01) was significant. Post hoc two-tailed t tests (p < 0.05, Bonferroni corrected) showed that the interaction was caused by greater responses to motion over static in the SC group (t(30) = 10.29, p < 0.001) than in the EB group (t(30) = 6.94, p < 0.001). A significant interaction of hemisphere × group (F(1,30) = 4.7, p = 0.04, η2 = 0.04) was because of the right hPT in the EB group showing higher activity compared with left hPT (t(30) = 1.2, p = 1), while the SC group showing higher activity in the left hPT when compared with the right hPT (t(30) = 1.9, p = 0.4; p < 0.05, Bonferroni corrected post hoc two-tailed t tests). The interaction between hemisphere × condition (F(1,30) = 20.5, p < 0.001, η2 = 0.02) was driven by the difference between motion and static conditions that was larger in the left hPT (t(30) = 11.4, p < 0.001) compared with the right hPT (t(30) = 6.77, p <0.001).
For the hMT+/V5 region, we observed an interaction of group × condition (F(1,30) = 4.7, p = 0.04, η2 =0.02) showing (p < 0.05, Bonferroni corrected) greater responses to motion over static in the EB group (t(30) = 2.86, p = 0.04). The interaction of hemisphere × condition (F(1,30) = 5.6, p = 0.025, η2 = 0.03) was driven by increased motion selectivity in the left hemisphere (t(30) = 2.3, p = 0.057, post hoc two-tailed t tests; p < 0.05, Bonferroni corrected).
As expected from univariate analyses (Battal et al., 2019), β-parameter estimates did not show any evidence for motion direction or sound source location-specific activity.
ROI multivariate pattern analyses
Multiclass motion direction and static location decoding
To further investigate the presence of information about auditory motion direction and sound source location, we ran multiclass MVP-decoding in four ROIs identified using independent auditory and visual motion localizers (hMT+/V5 and hPT in both hemispheres). Figure 2, F and G, shows decoding accuracies for motion (leftward, rightward, upward, downward) and static (left, right, up, down) stimuli in the four regions of interest for EB and SC participants. For motion stimuli, permutation testing (FDR corrected) revealed that classification accuracies in hMT+/V5 were significantly above chance for EB participants in both hemispheres (left: mean ± SD = 32.4 ± 8, p < 0.001; right: mean ± SD = 33.1 ± 6.9, p < 0.001). In SC participants, decoding accuracy was significantly above chance in the left hMT+/V5 but not in the right hMT/V5 (left: mean ± SD = 30.5 ± 6.6, p = 0.002; right: mean ± SD = 26.4 ± 4.5, p = 0.184). In hPT, decoding accuracy was significantly above chance in both hemispheres in both groups (EB left: mean ± SD = 32 ± 6.2, p < 0.001; EB right: mean ± SD = 29.7 ± 7, p = 0.003; SC left: mean ± SD = 40.6 ± 8, p < 0.001; SC right: mean ± SD = 35.3 ± 8.9, p < 0.001). Permutation of two-sample t tests revealed that decoding accuracy was higher for the EB group compared with the SC group in the right hMT+/V5 (p = 0.02), but not in the left hMT+/V5 (p = 0.62). In contrast, decoding accuracy was greater in the SC group than in the EB group in the left hPT (p = 0.016), but not in the right hPT (p = 0.101; Fig. 2G).
For static location stimuli, decoding accuracies were significant within hMT+/V5 in the right hemisphere (mean ± SD = 29.7 ± 9.6, p = 0.003) and very close to the cutoff significance value in the left hemisphere (mean ± SD = 27.6 ± 5.8, p = 0.054) of EB participants, while decoding was not significantly greater than chance in either the left or right hMT+/V5 for SC participants (left hMT+/V5: mean ± SD = 26.3 ± 7.6, p = 0.2; right hMT+/V5: mean ± SD = 25 ± 5.4, p = 0.458). In the hPT, classification accuracy was significantly above chance in both hemispheres in SC participants (left hPT: mean ± SD = 31.3 ± 8.8, p < 0.001; right hPT: mean ± SD = 28.7 ± 6.8, p = 0.023), but only in the right hemisphere of EB (right hPT: mean ± SD = 29.4 ± 8.3, p = 0.007; left hPT: mean ± SD = 25.3 ± 7.3, p = 0.458). The decoding accuracy from the two groups was compared using a two-sample t test, and the statistical significance was assessed using permutations. The results revealed that decoding accuracy for sound source locations was higher for the SC group compared with the EB group in the left hPT (p = 0.002), but not in the right PT (p = 0.05).
Finally, we assessed differences between decoding accuracies across groups and regions using an LMM. Group, region, and hemisphere predictors were entered as fixed effects, and subject predictor was entered as a random effect. This revealed a main effect of condition (F(1,210) = 26.5, p < 0.001) because of higher accuracies for motion over static stimuli across all regions. We also observed a main effect of region (F(1,210) = 9, p = 0.003) showing overall higher decoding in hPT than in hMT+/V5. Crucially, we observed a significant interaction of group × region (F(1,30) = 22.9, p < 0.001). Post hoc two-tailed t tests (p < 0.05, Bonferroni corrected for multiple comparisons) showed that decoding accuracy in hMT+/V5 was significantly greater for the EB group over the SC group (t(78.3) = 2.6, p = 0.02), while decoding accuracy in hPT was significantly greater for the SC group over the EB group (t(78.3) = 3.5, p = 0.002). The significant interaction between hemisphere and group (F(1,210) = 6.34, p = 0.013) was because of the higher decoding accuracies in the left hemisphere in the SC group compared with the right hemisphere (t(210) = 5.5, p < 0.001), and in the EB group showing marginally higher decoding accuracy in right hemisphere compared with the SC group (t(78.3) =2.6, p = 0.05), while the SC group showed higher decoding accuracy in the left hemisphere compared with the EB group (t(78.3) = 3.5, p = 0.035). The lack of significant group × region × condition interaction indicates that differences in the decoding accuracies between groups and regions are similar for motion and sound source processing.
Axis-of-motion/location preference
To assess the presence of an axis-of-motion organization, we averaged the binary decoding accuracies of auditory motion directions for each axe type (within and across axes), for each group, and for each ROI. While higher decoding accuracies between pairs of directions (e.g., leftward vs upward) indicate that the representation of directions is more distinct, lower decoding accuracies among the directions (e.g., leftward vs rightward) suggests more similar or overlapping representation (Fig. 3A,B). The averaged within-axes and across-axes binary decoding accuracies (dissimilarity) entered a linear mixed model with group, axes, and hemisphere as fixed effects, and subject as a random effect.
In hPT region, this analysis revealed a significant effect of axes (F(1,90) = 84.8, p < 0.001) indicating that across-axes directions are more distinct compared with within-axis directions across hemispheres and across groups. We also observed a main effect of group (F(1,30) = 22, p <0.001), revealing overall higher binary decoding accuracies in the SC group. The group × axes interaction (F(1,90) = 7.4, p = 0.008) showed that while both groups show an axis-of-motion preference (SC group across vs within axes: t(90) = 8.4, p < 0.0001; EB group across vs within axes: t(90) = 4.6, p = 0.0001), the difference between across-axes and within-axes decoding accuracy is higher in the sighted group than in the blind group (t(76.9) = 5.3, p < 0.0001; p < 0.05, Bonferroni corrected for multiple comparisons).
In hMT+/V5 region, we observed a main effect of axes (F(1,90) = 25.5, p < 0.001), implying that across-axis directions have higher decoding accuracies compared with within-axis directions across hemispheres and across groups. The significant group × hemisphere interaction (F(1,90) = 6.3, p = 0.01) was driven by the SC group, showing higher decoding accuracies in the left hemisphere, while the EB group showed higher decoding accuracies in the right hemisphere (p < 0.05, Bonferroni corrected for multiple comparisons).
LMM on sound source location distances did not reveal significant results involving the groups in both hPT and hMT+/V5 (Fig. 3C,D).
Multidimensional scaling
We used the binary accuracy values to build neural dissimilarity matrices for each subject and each ROI (Fig. 3E,F). Visualization of the binary decoding accuracies of the motion direction and sound source locations further supported a separation between static and moving sounds in both hMT+/V5 and hPT regions and in both groups, and a stronger axis-of-motion preference in the SC group compared with the EB group in the hPT region.
Diffusion-weighted imaging
hMT+/V5–hPT tractography
For the left hemisphere, the number of reconstructed streamlines was significantly above chance in both groups, with the iFOD2 algorithm generating a significantly higher number of streamlines compared with the null distribution algorithm [SC: log streamlines(iFOD2) = 5.2 ± 1.0; log streamlines(null distribution) = 3.7 ± 0.8; paired t test, t(14) = 7.7, p = 2 × 10−6, d = 2.0; EB: log streamlines(iFOD2) = 5.2 ± 1.3; log streamlines(null distribution) = 3.8 ± 1.3; paired t test, t(12) = 3.9, p = 2 × 10 −3, d = 1.1]. The same results were obtained for right hMT+/V5–hPT connections [SC: log streamlines(iFOD2) = 4.8 ± 1.1; log streamlines(null distribution) = 1.4 ± 1.2; paired t test, t(14) = 8.1, p = 1 × 10−6, d = 2.1; EB: log streamlines(iFOD2) = 4.2 ± 1.1; log streamlines[null distribution] = 1.6 ± 0.9; paired t test, t(12) = 6.8, p = 2 × 10−5, d = 1.9]. hMT+/V5–hPT connections in a representative EB participant as well as group-averaged structural pathways between hMT+/V5 and hPT for the EB group can be seen in Figure 4, A and B (the single-subject data for the EB group is available at https://osf.io/7w8gu/).
The trajectory of the hMT+/V5–hPT connections in the EB group was highly similar to that of the SC group (Fig. 4C). This was addressed by the Dice similarity coefficient. For the left hemisphere, the spatial overlap of the sum of binarized individual tract-density images was 0.84. For right hMT+/V5–hPT connections, the spatial overlap between group-averaged tracts was equally high, with a DSC value of 0.80.
Whole-tract analysis
The whole-tract analysis revealed significant differences between EB and SC participants in hMT+/V5–hPT connections. We did not find differences between groups in the connectivity index [CI; L: CI(SC) = 0.4 ± 0.1; CI(EB) = 0.4 ± 0.1; paired t test, t(26) = 0.01, p = 0.9, d = 0.01; R: CI(SC) = 0.4 ± 0.1; CI(EB) = 0.3 ± 0.1; paired t test, t(26) = 1.52, p = 0.1, d = 0.57]. However, we did find differences between groups in diffusion scalar values, specifically in the right hemisphere. For left hMT+/V5–hPT connections, we did not find differences between SCs and EBs in FD [L: FD(SC) = 0.43 ± 0.04; FD(EB) = 0.41 ± 0.03; paired t test, t(26) = 1.31, p = 0.2, d = 0.49; Fig. 4F]. For right hMT+/V5–hPT connections, SCs showed higher FD values (mean ± SD, 0.42 ± 0.02) than EBs (0.38 ± 0.03; paired t test, t(24) = 4.31, p = 2 × 10−4, d = 1.67). Neither did we find differences between groups in FC for left hMT+/V5–hPT connections [L: FC(SC) = 1.06 ± 0.09; FC(EB) = 0.96 ± 0.11; paired t test, t(26) = 2.62, p = 0.01, d = 0.99]. Similar to the results obtained for FD, SC participants (mean ± SD, 1.07 ± 0.08) showed higher FC values than EB participants in right hMT+/V5–hPT connections (0.91 ± 0.06; paired t test, t(23) = 5.89, p = 5 × 10−6, d = 2.36). For the combined diffusion scalar FDC, SC participants presented higher values than EB participants for hMT+/V5–hPT connections in both hemispheres [L: FDC(SC) = 0.48 ± 0.07; FDC(EB) = 0.41 ± 0.05; paired t test, t(25) = 3.14, p = 4 × 10−3, d = 1.24; R: FDC(SC) = 0.48 ± 0.04; FDC(EB) = 0.37 ± 0.02; paired t test, t(23) = 7.48, p = 1 × 10−7, d = 3.03]. Note that when the lateralization of the group differences was tested using a 2 × 2 ANOVA (group × hemisphere), no interaction was found to be significant.
Along-tract analysis
The along-tract analysis revealed the location of the between-group differences at the sub-tract level. It confirmed hMT+/V5–hPT connections in the right hemisphere to be more affected than in the left hemisphere. Whereas no differences between groups were found on FD and FC scalars for left hMT+/V5–hPT connections, local differences in FDC were observed in the anterior parts of the tract (Fig. 5, top row). For the right hMT+/V5–hPT connections, structural effects for all diffusion scalars appear to be localized in posterior and central segment parts of the tract (Fig. 5, bottom row).
Brain–behavior correlation
To explore whether the brain activity elicited by our moving and static sounds in hMT+/V5 and PT links to the ability of the listener to discriminate the direction and location of these sounds, we conducted between-subject correlation analyses. The multiclass decoding accuracies and β-parameter estimates were extracted from the peak coordinates of independently defined hMT+/V5 and PT regions. Behavioral accuracies recorded during fMRI data acquisition for discriminating motion directions (motion condition) and sound source locations (static condition) were correlated with decoding accuracies and β-parameter estimates separately. We also correlated microstructural diffusion values (FD, FC, and FDC) with behavioral performance. No significant correlation was observed between behavioral performance and the neural activity of hMT+/V5 and PT across groups. We also did not find any significant correlation between behavioral performance and microstructural diffusion values (a full statistical report can be accessed at https://osf.io/7w8gu/).
Discussion
We adopted a multimodal imaging approach to investigate how early blindness alters the functional organization of the hMT+/V5–hPT network for spatial hearing and the structural connectivity between those regions.
Whole-brain univariate analyses revealed that while both groups showed the strongest auditory motion selectivity in the bilateral superior temporal and precentral gyri, moving sounds also evoked preferential responses in a region anterior to hMT+/V5 in both sighted and blind individuals, a region previously termed hMTa (Rezk et al., 2020; Gurtubay-Antolin et al., 2021). hMTa overlaps with the anterior portion of hMT+/V5 defined in vision (Fig. 2C), suggesting that hMT+/V5 and hMTa are joined to form a continuous map. This observation relates to the evidence that category-selective regions in vision are bound to a region just anterior to them showing amodal selectivity to the same category (Popham et al., 2021). Blind participants showed additional auditory motion selectivity in the posterior portion of hMT+/V5 suggesting that in the absence of sight, the posterior portion of hMT+/V5—which is chiefly visual in sighted people—gets invaded by sounds coming from the anterior multisensory portion of hMT+/V5.
Going beyond univariate analyses, MVPA revealed motion direction and sound source location information in both hPT and hMT+/V5 of sighted and blind. The right hMT+/V5 showed enhanced decoding of auditory directions in the blind group. Interestingly, motion direction could also be decoded in the hMT+/V5 of sighted individuals, even if to a lower extent than in blind people. These results contrast with previous studies finding no significant decoding of motion directions in hMT+/V5 of sighted people (Alink et al., 2012; Jiang et al., 2014, 2016). In those studies, however, directional selectivity was investigated exclusively in the horizontal axis, while the present study contained both horizontal and vertical auditory stimuli. This difference is crucial since we observed that activity patterns elicited in hMT+/V5 and hPT across the vertical and horizontal axes of motion differ to a much larger extent from activity patterns elicited by sounds within each axis of motion (Battal et al., 2019; Rezk et al., 2020). The dissimilarity analysis further supported an axis-of-motion organization and a separation between static and moving sounds in hPT and hMT+/V5 in both groups. In hPT, these results confirm previous observations for axis of motion (Battal et al., 2019). Interestingly, this organization was weaker in the blind group compared with the sighted group. In hMT+/V5, the observation of an axis-of-motion organization for sounds is reminiscent of the columnar organization observed in vision (Albright et al., 1984; Diogo et al., 2003) and the high-field fMRI studies suggesting a large-scale axis-of-motion organization in hMT+/V5 in humans (Zimmermann et al., 2011). This brings the resemblance between the coding of hMT+/V5 in vision and audition to an additional and finer-grained level, suggesting that the organization principle of hMT+/V5 might be applied to the representation of auditory motion directions in blind and sighted people. The axis-of-auditory-motion preference in hMT+/V5 in blind and sighted people suggests that such organization is independent of visual experience.
In sighted individuals, even if hMT+/V5 shows preferential responses to visual motion, it also contains location-selective representations of visual stimuli (Amano et al., 2009). If auditory information is being processed in hMT+/V5 of blind people using a computationally analog structure as the one observed in vision in sighted people, one may expect to find traces of sound source location in this region. In our study, we observed sound source location information in bilateral hMT+/V5 in the EB group, but not in the SC group. Our results therefore confirm and extend previous studies demonstrating that the right dorsal extrastriate occipital cortex in blind people contributes to spatial processing of sounds (Collignon et al., 2007, 2009, 2011).
Interestingly, we observed that the enhanced auditory tuning in hMT+/V5 in the blind co-occurs with a reduced decoding in hPT regions. The decreased decoding accuracies in the hPT for spatial hearing in the blind suggests that the absence of visual experience since birth not only influences the response properties of “visual” areas but also alters the functioning of the regions supporting the remaining senses. This redistribution is not limited to auditory motion direction but is also observed for sound source location. It therefore seems that early visual deprivation triggers a network-level reorganization between occipital and temporal regions typically dedicated to spatial hearing.
Large-scale connectivity between separate sensory regions that are involved in related function could be a determining factor for the expression of cross-modal plasticity (Dormal and Collignon, 2011; Hannagan et al., 2015; Mattioni et al., 2020). Enhanced nonvisual responses for moving stimuli observed in early blinds may therefore build on intrinsic connections between auditory and visual motion-processing areas, which is supported by studies showing strong multisensory interactions between visual and auditory motion processing (Kitagawa and Ichihara, 2002). We recently demonstrated that hMT+/V5 contains shared neural representations for auditory and visual motion directions in sighted people (Rezk et al., 2020). We also found evidence for the existence of direct structural connection between hMT+/V5 and hPT in humans (Gurtubay-Antolin et al., 2021). Intrinsic anatomo-functional connections between visual and auditory motion-preferential regions could therefore play a crucial role in constraining the expression of cross-modal plasticity and redistributing spatial hearing computation between hPT and hMT+/V5 in blindness. To further test this idea, we investigated the impact of visual deprivation on the structural connection between hPT and hMT+/V5 (Gurtubay-Antolin et al., 2021). Diffusion-weighted data in the EB and SC groups supported the potential existence of direct hMT+/V5–hPT connections, with the trajectory and connectivity index of these projections being highly similar between SC and EB participants. These results speak in favor of the preservation of this pathway in visually deprived individuals. Our findings are in agreement with previous research conducted in opossums (Karlen et al., 2006) and humans (Shimony et al., 2006; Novello et al., 2018) that points to the overall preservation of structural connections in visual areas even in the absence of visual input since birth. We hypothesize that the hMT+/V5–hPT connections could play an important role in the multisensory integration of auditory and visual moving signals in sighted group and in the expression of functionally specific cross-modal plasticity in the early blind group by constraining auditory moving sounds to specifically engage hMT+/V5 (Gurtubay-Antolin et al.; 2021). It is, however, also possible that auditory information could reach hMT+/V5 through indirect pathways, for instance involving the parietal cortex (Bremmer et al., 2001; Rohe and Noppeney, 2016). Whereas we did not find macrostructural differences between groups in hMT+/V5–hPT connections, whole-tract global white matter analyses showed decreased microstructural diffusion values (FD, FC, and FDC) in EBs. Although the interpretation of diffusion-derived metrics remains controversial because of the indirect nature of the diffusion MRI measurement (Jones and Cercignani, 2010), FD and FC reductions have been related to axonal loss and atrophy of fiber morphology (reflective of the accumulated axon loss; Raffelt et al., 2017). These results fit well with previous studies reporting reduced anisotropy values in visual pathways of EB individuals (Shimony et al., 2006; Ptito et al., 2008; Shu et al., 2009; Wang et al., 2013), which have been associated with deafferentation, demyelination, and neuronal degeneration (Beaulieu et al., 1996; Beaulieu, 2002). Alternatively, early visual deprivation could also lead to atypical myelination (Shimony et al., 2006). There is growing evidence that experience-dependent activity can induce myelination changes in the developing brain (Sampaio-Baptista and Johansen-Berg, 2017). The lack of visual input might have selectively affected the feedback projections connecting the hPT to hMT+/V5, which may play a role in audiovisual integration in sighted people (Ghazanfar and Schroeder, 2006). Since feedback projections appear later in the cortical development and have longer developmental times, they are more likely to be affected by sensory experience (Kral et al., 2017). The fact that this track loses its multisensory function to become mostly unimodal (auditory) in blind people may alter its microstructure (reducing FD and FC) in ways that are not yet well understood and would likely require studies with blind animals. To reveal which segments of the hMT+/V5–hPT connections drive the differences between groups, we implemented an along-tract analysis. Differences between groups emerged mostly in posterior and central parts of the right hMT+/V5–hPT projections. No statistical difference between the reorganization across hemispheres emerged when formally tested. The higher alteration in posterior regions of the tracts might be explained by the higher dependence on visual input to develop.
In conclusion, our findings suggest that blindness triggers a network-level reorganization that enhances the recruitment of (posterior) hMT+/V5 in conjunction with a release in the computational workload of temporal regions (hPT) typically dedicated to spatial hearing. While visual experience does not affect the axis-of-motion organization in hMT+/V5, this functional organization in the hPT region was weaker in the EB group. The structural connections between hMT+/V5 and hPT show similar macrostructure in both groups, despite micro- structural alterations in these connections in visually deprived people.
Footnotes
The project was funded in part by a European Research Council starting grant MADVIS (Project 337573) awarded to O.C., the Belgian Excellence of Science (EOS) program (Project 30991544) awarded to O.C., a Flagship ERA-NET grant SoundSight (FRS-FNRS PINT-MULTI R.8008.19) awarded to O.C., and by the European Union Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 701250 awarded to V.O. Computational resources have been provided by the supercomputing facilities of the Université catholique de Louvain (CISM/UCL) and the Consortium des Équipements de Calcul Intensif en Fédération Wallonie Bruxelles (CÉCI) funded by the Fond de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under convention 2.5020.11 and by the Walloon Region. A.G.-A. is supported by the Wallonie Bruxelles International Excellence Fellowship and the FSR Incoming PostDoc Fellowship by Université Catholique de Louvain. O.C. is a research associate, C.B. is postdoctoral researcher, and M.R. is a research fellow at the Fond National de la Recherche Scientifique de Belgique (FRS-FNRS). We thank Marco Barilari, Stefania Benetti, and Stephanie Cattoir, who have helped with the data acquisition; and Pietro Chiesa for continuing support with the auditory hardware.
The authors declare no competing financial interests.
- Correspondence should be addressed to Ceren Battal at ceren.battal{at}uclouvain.be or Olivier Collignon at olivier.collignon{at}uclouvain.be