Abstract
Previous cross-sectional functional magnetic resonance imaging studies have shown that performance monitoring functions continue to develop well into adolescence, associated with increased activation in brain regions important for cognitive control (prefrontal cortex, anterior cingulate cortex, and parietal cortex). To date, however, the development of performance monitoring has not yet been studied longitudinally, which leaves open the question whether changes can be detected within individuals over time. In the present study, human boys and girls, between ages 8 and 27 years, performed a child-friendly rule-switch task in the scanner on two occasions ∼3.5 years apart. Change versus stability was examined using two methods: (1) repeated-measures analyses and (2) test-retest reliabilities of blood oxygenation level-dependent responses. Results showed that with increasing age, participants performed better on the task. The changes in neural activation associated with the processing of performance feedback were, however, more reliably correlated with changes in performance than with age. Test-retest reliability was at least fair to good for adults and adolescents, but poor to fair for the youngest age group. Substantially more variability was observed in the pattern and magnitude of children compared with adults, which may be interpreted as proxy for developmental change. Together, the results show that (1) change within individuals is variable, and more so for children than for adolescents and adults, and (2) performance is a better predictor for change in neural activation over time. These findings set the stage for studying developmental change in the perspective of multiple predictors, rather than solely by divisions based on age groups.
Introduction
Performance monitoring is one of the main components of successful learning. It involves the monitoring for and detection of errors and change signals, which then allows for adjustment of ongoing behavior to optimize subsequent performance (Holroyd and Coles, 2002). Previous studies indicate that performance monitoring improves steadily throughout development, but does not reach adult levels until late childhood or early adolescence (Bunge et al., 2002; Crone et al., 2008; Luna, 2009).
Evidence from functional magnetic resonance imaging (fMRI) studies shows that children, adolescents and adults recruit a similar network of brain areas during performance monitoring, including the lateral prefrontal cortex (LPFC), anterior cingulate cortex (ACC)/presupplementary motor area (preSMA), and parietal cortices. However, the pattern of activation differs between children and adults, such that there is an increase in activation in LPFC and parietal cortex following feedback indicating performance adjustment (Crone et al., 2008; van Duijvenvoorde et al., 2008; van den Bos et al., 2009; Tau and Peterson, 2010). These findings are consistent with results of other developmental neuroimaging studies, showing that children and adolescents have immature activation in the frontoparietal network when performing tasks requiring cognitive control (Bunge et al., 2002; Rubia et al., 2006; Velanova et al., 2008).
These previous studies have provided the building blocks for understanding the neural substrates involved in the development of cognitive control and performance monitoring. However, these studies were all cross-sectional, therefore only providing a proxy of development. To truly assess developmental trajectories, longitudinal studies are of paramount importance. Compared with cross-sectional research, longitudinal research has several advantages and provides additional information. First, longitudinal studies can overcome problems with differential sampling across age groups, masking of within-individual change by variability across individuals, and difficulty in identifying complex developmental trajectories (Kraemer et al., 2000). Second, longitudinal methods test for individual patterns of change rather than group differences. Third, longitudinal research has more power to detect small developmental differences in behavior and in task-related brain activation (Durston et al., 2006). Fourth, the test of within-subjects change may contribute to the question how differences in brain activity are associated with age vis-à-vis performance changes over time.
The aim of this study was to examine within-subject changes in brain activity when performing a feedback-based rule-switching task using a longitudinal design in participants aged 8–27. The advantage of this approach is that it allows for the assessment of change across a wide age range using a relatively limited time scale (3.5 years). We used two methods to test change versus stability over time. First, changes were studied using repeated-measures analyses. Second, stability was examined using an accurate assessment of test-retest reliability of functional activity. Previous studies have demonstrated that fMRI signals provide relative reliable measures over sessions in adults (Bennett and Miller, 2010), but the reliability of fMRI signals has not yet been reported in children, or in studies with a time interval longer than one year. Thus, we tested both change (repeated measures) and stability (test-retest reliability of fMRI activation levels) over time in the same rule-switching task for all age groups. Our prediction was that a larger change in brain activation over time would be observed for the children relative to adolescents and adults (Ferrer et al., 2009). Consequently, we predicted that stability would be lower for children than for adolescents and adults.
Materials and Methods
Participants.
A 3.5 year longitudinal functional magnetic resonance imaging study was carried out, including healthy adults, adolescents and children (see Table 1 for demographic information). In the full baseline sample of Crone and colleagues (Crone et al., 2008), 20 adults (12 females) age 18–25, 20 adolescents (9 females) age 14–15, and 17 children (8 females) age 8–11 were included in the study. Thirty-two participants; 10 adults (6 females), 12 adolescents (6 females) and 10 children (4 females) completed the longitudinal study and were rescanned after an interval of ∼3.5 years. The scan interval for adolescents was significantly longer compared with adults (p = 0.01), and there was no difference in scan interval length between children and adults (p = 0.22) or between children and adolescents (p = 0.36). The full baseline data have been published previously (Crone et al., 2008; Zanolie et al., 2008). Standard intelligence scores were obtained from each participant using the Raven's Progressive Matrices test (Raven et al., 1998). All estimated IQ scores were within the normal range and there were no significant differences between age groups (F(2,29) = 1.61; p = 0.22) (Table 1).
Demographics at both time points
The participants, all of whom received course credit or a fixed payment, were healthy right-handed volunteers with no history of neurological or psychiatric problems. Informed consent was obtained and the study was approved by the Internal Review Board at the Leiden University Medical Center.
Procedure and experimental design.
All participants were tested individually and were trained to lie still in a mock scanner, which simulated the environment and sounds of an actual MRI scanner. Details of the child-friendly rule-switch task have been published previously (Crone et al., 2004, 2008; Zanolie et al., 2008). In short, participants were asked to respond to a stimulus that could appear in one of four horizontally presented locations on the screen by pressing appropriate buttons (Fig. 1). Before scanning, participants were trained to perform three spatial stimulus–response rules. Each response was followed by a positive or a negative feedback signal. Feedback stimuli were displayed as a “plus” (representing positive feedback) or “minus” (representing negative feedback) sign. Next, the participants were instructed that in the real experiment they had to find the correct rule themselves, which could be any of the three spatial rules they had just learned. They were instructed (1) to find the rule by using positive and negative feedback, (2) to apply that rule which would yield positive feedback, and (3) that the rules could change unexpectedly. Therefore, they had to apply the correct rule until a rule switch occurred, which was signaled by a negative feedback sign.
Display of task sequence (top) and rule types (bottom). Subjects were told to infer one of the spatial mapping rules that were trained before scanning. The task was changed into a prosocial game by explaining to the subjects that they should help the dog find its way back home. The dog could appear in one of four locations, and the subjects were instructed to open one of the four doors (locations) by pressing the corresponding response key. Their selection was followed by a visually presented feedback sign (+ or −). The rules changed unannounced following two, three, or four consecutive correct sorts. The spatial mapping rules are displayed in the bottom part the figure (see Procedure and experimental design for explanations).
The four possible answers were mapped to the four buttons which were mapped to the index and middle fingers from the left and right hand (Fig. 1, top). Following rule 1, stimuli that appeared in one of the four locations designated a response with the finger compatible to the location. Thus, spatially compatible responses were required in response to the location of the stimulus. Following rule 2, stimuli that appeared in any of the four locations designated a response with the opposite finger of the same hand. Following rule 3, stimuli that appeared in any of the four locations designated a response with the finger that was assigned to the location two positions from the stimulus location (Fig. 1, bottom). Participants were told that rules could switch from time to time without a warning and they were instructed to use the trial-to-trial feedback to infer the correct response rule. Rules changed in a pseudorandomized order when participants had correctly applied the previous response rule for two to four consecutive trials. On each trial, a 2.5 s stimulus display was presented that required a button-press. If a response was not made within 2.5 s, a warning was presented indicating that the response was too slow and that faster responses were required on the next trial. These trials were not included in the analysis and consisted of <1% of the trials. When the participant responded within the 2500 ms timeframe, the feedback display consisted of two similar houses with four doors and a fixation mark, with an additional plus or minus sign placed in the door selected by the participant during the response time. After the feedback an intertrial interval jitter varying from 2000 to 8000 ms was presented in 25% of trials.
Feedback scoring.
The five feedback types were determined post hoc for each individual separately. Their definitions were as follows: (1) First warning negative feedback was the first negative feedback that followed a successfully completed sequence of rule applications. This negative feedback was given unannounced once the rule had been correctly applied on two, three, or four consecutive trials (randomly determined for each rule separately); it indicated that the previously applied response rule was no longer correct, and thus indicated a rule switch. (2) Efficient negative feedback indicated that the rule chosen when searching for the appropriate rule was incorrect. After receiving efficient negative feedback participants had to apply the correct rule in the next trial. (3) Other error negative feedback trials consisted of those trials in which the participant failed to apply the correct response when the response rule had not changed. It also included those trials in which the participant perseverated in using the previously correct rule after a rule change. (4) First positive feedback indicated that the correct rule had been found following a rule switch, and (5) positive feedback indicated correct rule use.
Data acquisition.
Trials were presented in three scans of 8.2 min each. During scanning, 300 trials were presented and the rules were switched in pseudorandomized order. The order of trials within each scan was determined by using an optimal sequencing program designed to maximize efficiency of recovery of the blood oxygenation level-dependent (BOLD) response (Dale, 1999). Scanning was performed with a standard whole-head coil on a Philips 3.0 Tesla scanner at the Leiden University Medical Center. Functional data were acquired using T2*-weighted echoplanar imaging (EPI) during three functional runs of 232 volumes each, of which the first 2 volumes were discarded to allow for equilibration of T1 saturation effects [repetition time = 2.211 s (2.2 s at follow-up), echo time = 30 ms, ascending interleaved acquisition, 38 slices of 2.75 mm, field of view 220 mm, 80 × 80 matrix, in-plane resolution 2.75 mm]. High resolution T1-weighted anatomical images were also collected after the functional runs. Head motion was restricted using a pillow and foam inserts that surrounded the head. Visual stimuli were projected onto a screen that was viewed through a mirror.
fMRI data analysis.
All data (i.e., those who participated at the second measurement, and those who dropped out) (Crone et al., 2008) were reanalyzed using SPM5 (Wellcome Department of Cognitive Neurology, London, UK). Images were corrected for differences in timing of slice acquisition, followed by rigid body motion correction. Structural and functional volumes were spatially normalized to T1 and EPI templates, respectively. Translational movement parameters never exceeded 1 voxel (<3 mm) in any direction for any participant or scan. There were no significant differences in movement parameters between age groups [time point 1 (TP1): F(2,29) = 2.70; p = 0.10; TP2: F(2,29) = 2.32; p = 0.12]. The normalization algorithm used a 12-parameter affine transform together with a nonlinear transformation involving cosine basis functions and resampled the volumes to 3 mm cubic voxels. Templates were based on the MNI305 stereotaxic space (Cocosco et al., 1997), an approximation of Talairach space (Talairach and Tournoux, 1988). Functional volumes were spatially smoothed with an 8 mm full-width half-maximal isotropic Gaussian kernel. Statistical analyses were performed on individual subjects data using the general linear model in SPM5. The fMRI time series data were modeled by a series of events convolved with a canonical hemodynamic response function. The feedback stimulus of each trial was modeled as an event of interest. The trial functions were used as covariates in a general linear model, along with a basic set of cosine functions that high-pass filtered the data, and a covariate for session effects. The least-squares parameter estimates of height of the best-fitting canonical hemodynamic response function for each condition were used in pairwise contrasts. The resulting contrast images, computed on a subject-by-subject basis, were submitted to group analyses. Task-related responses were considered significant if they consisted of at least 10 contiguous voxels that exceeded a stringent threshold p < 0.001 [false discovery rate (FDR) corrected] (Genovese et al., 2002). Analyses were also performed with less stringent thresholds, p < 0.01 and p < 0.05, FDR and cluster corrected, but there were no differences compared with our stringent threshold.
In the fMRI analyses, we focused on the contrast first warning > positive feedback. This contrast provides the cleanest form of the processing of a change cue indicating performance adjustment relative to a low level task application baseline (positive feedback). Positive feedback indicated only those trials where the correct rule was applied, thereby excluding the first positive feedback-type due to possible novelty effects after finding the correct rule. Whole-brain, voxelwise between-group repeated measures analyses were performed for activation patterns associated with first warning versus positive feedback processing with and without adding age or performance at TP1 as a covariate.
Region of interest (ROI) analyses were performed to further characterize rule sensitivity of predicted brain regions based on the first measurements. ROI analyses were performed with the MarsBaR toolbox in SPM5 (Brett et al., 2002) (http://marsbar.sourceforge.net/). ROIs that spanned several functional brain regions were subdivided by sequentially masking the functional ROI with each of several anatomical MarsBaR ROIs. The contrast used to generate functional ROIs was based on a t test for first warning versus positive feedback stimuli based on the first measurement for only those participants who participated at the second measurement. For all ROI analyses, effects were considered significant at α of 0.001 [FDR and cluster corrected (at least 10 contiguous voxels)]. Results were similar with less stringent thresholds, therefore we only report findings with α of 0.001 Although there was a significant difference in scan interval between the three age groups (i.e., the scan interval was shorter for adults compared with adolescents), in none of the performed analyses there was a significant effect of scan interval. Therefore, all analyses are reported without scan interval as a covariate. There were also no significant differences in neural activity between those participants which dropped out after the first measurement and the participants that participated in the second scanning session.
Reliability measurements.
Reliability of brain activation was analyzed by calculating intraclass correlation coefficients (ICCs). We calculated measures of intravoxel reliability on individual contrast values for each ROI by using the ICC toolbox provided by Caceres et al. (2009). The same ROIs were used as for the functional analyses. By analyzing only ROIs based on the first measurement we could test whether the level of group activation of the first session could predict the consistency in participant activations. Although no consensus has been achieved regarding reliability criteria for fMRI studies, previous studies have proposed different criteria. We followed the guidelines proposed by Cicchetti for quantifying reliability: poor (<0.4), fair (0.41–0.59), good (0.60–0.74) or excellent (>0.75) (Cicchetti and Sparrow, 1981; Cicchetti, 2001). These proposed criteria parallel suggested acceptance levels of the neuroimaging community of critical ICC values of 0.4 (Eaton et al., 2008) or 0.5 (Aron et al., 2006).
One problem with this method is that ICC measures can be biased due to between-subject variability. Although, the reported median ICC is thought to be more robust for between-subjects variance, another form of assessing reliability is to solely examine the within-subject SD (Zandbelt et al., 2008). The reliability metric reported is the SD of the change score (σw-corrected).
Results
All results reported below include only those participants who participated on both measurements.
Behavior
Performance differences between age groups were examined by comparing the number of feedback observations over the course of the experiment. An age group (8–11 years, 14–15 years, 18–24 years) by feedback type (first warning, efficient, other, first positive, and positive feedback) interaction showed that age groups differed in the number of feedback observations (age by feedback interaction, F(8,116) = 23.70, p < 0.001). Comparisons for each feedback type separately showed that adolescents did not differ from adults in performance. In contrast, children had fewer first warning (indicating rule switches), efficient and positive feedback observations, but more other error feedback compared with adolescents and adults (all p values <0.001). An age group (8–11 years, 14–15 years, 18–24 years) by time (TP1, TP2) by feedback type (first warning, efficient, error, first positive, and positive feedback) interaction indicated that over time children improved more compared with adolescents and adults (F(8,116) = 3.10, p < 0.05).
To examine the within-subject changes in performance, difference scores (TP2 − TP1) were calculated for all feedback types for each participant separately (Fig. 2). One-sample t tests showed that the absolute difference scores for each feedback type were different from zero (all p values <0.01), indicating that participants' performance changed from TP1 to TP2. Regression analysis showed that age at TP1 was a predictor of the difference in number of first waning (β = −0.46, p < 0.01, observed power = 0.88, other error (β = 0.37, p < 0.05, observed power = 0.7) and positive feedback observations (β = −0.41, p < 0.05, observed power = 0.81), indicating that the within-person improvement was larger for younger children than for older children and adults.
Number of feedback observations as a function of age for all feedback types. Each line represents an individual participant at time point 1 and time point 2.
Whole-brain analyses
Whole-brain activation patterns were examined cross-sectionally at both time points using one-sample t tests (FDR corrected, p < 0.001, 10 contiguous voxels). The results showed similar whole-brain activation patterns at TP1 and TP2 (Fig. 3), including activation in preSMA (middle) frontal regions and bilateral parietal cortices. These findings indicate that participants of all age groups recruited largely overlapping brain regions at baseline as well as at the follow-up session. An analysis with performance as a predictor at TP1 indicated that the same areas (LPFC, preSMA/ACC, and bilateral parietal cortex) were correlated with task performance at TP1, p < 0.001, FDR corrected with at least 10 contiguous voxels (Fig. 4).
Cross-sectional feedback-locked whole-brain contrasts showing effects of first warning > positive feedback (FDR corrected, p < 0.001; 10 contiguous voxels). Group-activation patterns were similar across the two sessions except for slightly stronger overall activation on TP2 compared with TP1.
Feedback-locked whole-brain contrasts showing effects of first warning > positive feedback (yellow) and overlap with task performance (orange). A, Overlap in bilateral parietal cortices (y = −37). B, Overlap in LPFC and preSMA/ACC (y = 13). Both activation patterns are thresholded at p < 0.001, FDR correction, with at least 10 contiguous voxels.
Longitudinal analyses
First, we tested change in neural activation by performing a repeated-measures ANOVA for the contrast first warning > positive feedback, directly comparing TP1 and TP2, across all participants. This analysis did not show significant change over time. Second, we added age as a linear and log-linear covariate to the analyses, but again no significant changes were found. We reran the analyses with only the children and the adolescents (whole group and per subsample), since we expected the adults to show the least change and therefore they could bias the sample. Again, no significant changes in brain activity were found.
ROI analyses
ROI selection was based on the first warning > positive feedback contrast at baseline and included a wide range of areas, including preSMA/ACC, inferior and superior parietal cortices, bilateral superior/middle frontal gyrus, bilateral inferior frontal gyrus/inferior operculum BA44, and bilateral insula.
First, to examine within-subject changes in neural activation over time we calculated difference scores for all ROIs for each participant separately. One-sample t tests showed that the absolute difference scores for all ROIs were different from zero (all p values <0.001), indicating that participants' neural activation associated with first warning feedback processing changed from TP1 to TP2. However, the direction of activation change differed across participants. Regression analyses showed that age at TP1 was not a significant predictor of the within-subject change in feedback-related activation.
Second, we investigated the association between difference scores in performance (i.e., number of rule shifts) with differences scores of activation over time in the selected ROIs. Table 2 shows that change in the number of rule shifts was significantly positively associated with change in neural activation related to first warning negative feedback, even when age at TP1 was entered as a covariate to the analysis. These findings are also illustrated in Figure 5, and demonstrate that performance is a better predictor of neural changes over time than age. Notably, change in brain activity in bilateral middle frontal regions and left parietal cortex, brain areas previously associated with rule-switching, showed the highest correlations with performance change.
Association between performance change over time and change in neural activation associated with first warning relative to positive feedback processing
Correlations between performance change over time (Diff FW) and changes in neural activation were associated with first warning relative to positive feedback processing over time. A selection of brain regions with the strongest association is shown clockwise, starting top left: left superior/middle frontal gyrus (−25, −1, 56), left superior parietal cortex (−22, −70, 48), preSMA/ACC (3, 23, 42), and left inferior parietal cortex (−37, −48, 45) are shown. L, Left; FG, frontal gyrus.
Testing for stability: Reliability measurements
The reliabilities of the first warning > positive feedback contrast for all ROIs were consistently in the fair to good range for the whole group (Fig. 6A; Table 3). Since we expected differences in reliability between the different age groups, intravoxel reliabilities were also calculated for each group separately (Fig. 6B–D; Table 3) (plots for each ROI per participant of the voxel values for TP1 against TP2 are available upon request). Adults and adolescents both showed at least fair reliabilities and good reliabilities for bilateral precuneus, left inferior and superior parietal cortices and right angular gyrus. In contrast, children demonstrated poor ICCs and higher SE bands in all regions compared with the older groups.
Intra-voxel reliability (ICC) measures based on ROIs at time point 1 (FDR corrected, p < 0.001, 10 contiguous voxels). ICCs were computed for each participant and population estimate was based on bootstrap methods. A displays ICC values with SE bands for the whole sample. In B–D, the bars indicate ICC values for each age group: adults (B), adolescents (C), and children (D).
Reliability measurements of ROIs for first warning > positive feedback contrast
Since children performed less well on the task, it could be argued that lower reliability estimates in the ROIs are due to different (and lower) activation patterns. Therefore, we also analyzed ICCs for the contrast negative > positive feedback, in which the total observations of negative (first warning, efficient and other error feedback types) and positive (positive and first positive feedback types) feedback are similar across all age groups. For these analyses we redefined the ROIs specific for this contrast. Again, ROIs were based on TP1, FDR corrected, p < 0.001 with at least 10 contiguous voxels. There was considerable overlap between the ROIs from both contrasts (these ICC values are available upon request). Compared with the ICC values for first warning > positive feedback, values tended to be higher for all groups in the negative > positive feedback contrast. More specifically, the two youngest age groups showed higher reliability values up to one- to two-tenths for all ROIs averaged.
Finally, within-subject variability was also calculated in an attempt to protect against bias from between-subjects variance (Table 3). Similar to the ICC measurements, σw values are given for the whole sample and age groups separately. Since there were large differences in height of activation between the age groups, we corrected the individual σw values for mean activation per group across sessions and values are reported as σw-corrected. Although, σw-corrected values vary widely across brain regions, adults show overall the least within-subject variability compared with the adolescents and children, with children showing the largest within-subject variability. To create again equal number of trials for each age group, the σw-corrected values were also calculated for the negative > positive feedback contrast. Similar as the ICC values, less within-subject variability was observed in the negative > positive feedback contrast compared with the first warning > positive feedback contrast (these σw-corrected values are available upon request).
Discussion
The aim of this three year longitudinal study was to investigate developmental and within-subject changes in task performance and feedback-related neural activation patterns. We tested this question using two types of analyses; repeated-measures ANOVAs and test-retest reliability of fMRI activation levels over time using a rule-switching task. Two principal findings emerged from this study. First, while on the behavioral level participants performed more rule switches with increasing age; change in performance (i.e., number of rule-shifts) was a better predictor for change in activity over time than age. Second, test-retest reliability was at least fair to good for adults and adolescents, and poor for the youngest age group. Substantially more variability was observed in the pattern and magnitude of children compared with adults, which may be interpreted as proxy for developmental change.
Change in performance and feedback-related activity over time
Behaviorally, task performance improved as indexed by an increase in rule shifts with increasing age. This is in line with previous research showing that the ability to switch between rules continues to develop through childhood and adolescence (Crone et al., 2006; Somsen, 2007; Kalkut et al., 2009; Anokhin et al., 2010). Moreover, the increase in rule switching over time was greatest for the youngest participants, indicating that the ability to switch between rules shows a large developmental increase during childhood and adolescence, and continues to develop throughout adolescence. These findings are in accordance with those of Kalkut et al. (2009) indicating that set-shifting abilities develop from late childhood to early adulthood. Other behavioral longitudinal studies have reported change in early childhood and adolescence in a variety of cognitive domains (Thomas et al., 1999; Ferrer et al., 2007; Kail and Ferrer, 2007; De Brauwer and Fias, 2009; Luna, 2009).
On both time points participants recruited a network of brain regions, including lateral PFC, preSMA/ACC, inferior and superior parietal cortex, and bilateral insula when processing first warning negative feedback (Crone et al., 2008; van Duijvenvoorde et al., 2008; van den Bos et al., 2009; Tau and Peterson, 2010). Here, we tested the predictive value of age and performance on within-subject changes in feedback-related neural activation longitudinally. In contrast to the behavioral findings, the fMRI data indicated that age was not a good predictor of change in brain activity between the two sessions. However, we found significant correlations between performance and activation change over time in the feedback processing network, including bilateral middle frontal gyrus, left inferior and superior parietal cortices and preSMA/ACC. These findings suggest that performance is a better predictor of activation change over time than age. This is in line with previous research indicating that performance can explain age-related differences in neural activation (Bunge et al., 2002; Booth et al., 2004). The following question that needs to be answered is: Does increased brain activity lead to increased performance or vice versa? This question also raises the issue of neural efficiency: less is more, more is less or more is more. Although there are no suitable answers yet for both questions (Poldrack, 2010), our findings build upon recent developmental studies in which increased activation patterns were associated with increased performance (Klingberg et al., 2002; Olesen et al., 2003; Vannest et al., 2010).
The current findings further implicate that future developmental neuroimaging studies should take task performance into account when examining age-related differences in neural activation. To overcome these differences, one option would be to match age groups based on performance scores (Schlaggar et al., 2002). Another alternative would be to use a parametric manipulation of task difficulty. This kind of manipulation allows for post hoc comparisons between age groups at different levels of task difficulty, while controlling for task performance (Durston and Casey, 2006). The relationship between performance and neural activation is complicated and might differ between cognitive domains (Booth et al., 2004). The findings of this study show that age in years may not be the best predictor for developmental change, rather, future studies should examine the possibility of characterizing individuals according to the way they perform complex tasks and learn information (Schmittmann et al., 2006).
Test-retest reliability
Change versus stability was further tested using test-retest reliability in children, adolescents and adults. Two methods were used to assess reliability in this study. First, ICC values indicated that there were differences in reliability between the age groups. Adults and adolescents showed good reliabilities for the inferior and superior parietal cortices, bilateral precuneus and right angular gyrus. Fair to good reliabilities were found for the other areas. However, the youngest age group showed poor test-retest reliability for all ROIs. The differences in reliability could not be explained by differences in the number of observations (see also Genovese et al., 1997; Friedman and Glover, 2006).
Second, we used within-subject variation in fMRI signal changes across measurements to protect against between-subjects variance. Considerable within-subject variation in fMRI signal changes was found in the brain areas associated with the first warning > positive feedback contrast. There was a similar pattern compared with the ICC measurements, showing relatively low within-subject variation in adults (44% mean for all areas) compared with adolescents (49%) and children (99%). When within-subjects variance for the negative > positive feedback contrast was calculated, the variance reduced similar to the ICC findings.
Both measurements indicate that even after a 3.5 year scan interval reliability is fair to good for adults and adolescents, but this is accompanied with considerable within-subject variation. Although the youngest group showed low ICC values and large within-subject variation, this does not necessarily imply low reliability. In our opinion, these findings can also be interpreted in terms of ongoing maturational processes. Previous research has shown that the human brain continues to mature through early adulthood (Gogtay et al., 2004; Giedd, 2008) and this maturation is hypothesized to influence both performance and functional activation (Blakemore, 2008; Casey et al., 2008). Therefore, if there are ongoing functional and structural changes in the youngest age ranges, then it is reasonable to expect that reliability measurements will show weak results. This suggests that the traditional test-retest reliability analyses could serve as a proxy for development. Hence, recruitment of brain areas can be quantified in similar or different activational patterns within brain regions. Different usage will lead to low reliability measurements, possibly indicative for different strategy use and/or ongoing maturational processes. Additional support for this conclusion can be found from the individual intravoxel reliability data, in which large differences were shown between activity patterns in the ROIs between both time points especially for the younger age groups.
Although several studies have investigated the reliability and reproducibility of fMRI over time (for review, see Bennett and Miller, 2010), it has to be noted that the scan intervals that have been studied vary extensively and are either on the order of a few days, weeks or one year and focused mainly on young and middle aged (healthy) adults. Therefore, it is difficult to compare our reliability findings with previous studies.
Limitations and Future directions
Reliability can be affected by technical, physiological, and psychological factors. In part we controlled for this, by using exactly the same paradigm, scanner and scanning protocol (including mock-scanner training) on both measurements. In addition, it is unlikely that our study suffered from learning-effects, because al participants were trained to learn the three rules before scanning. Moreover, test-retest reliability of fMRI BOLD also depends on several psychophysiological effects such as changes in arousal, attention, fatigue, task acquaintance, and heart rate, and these might also contribute to an increased variability and therefore lower reproducibility. Despite these difficulties, fMRI results were satisfactorily reliable.
So far, the number of longitudinal functional neuroimaging studies pale compared with the vast amount of cross-sectional studies. Only two studies examined within-subject changes in a cognitive control task over time in 9-year-old children (Durston et al., 2006) and 15-year-old adolescents (Finn et al., 2010). The current study is the first to study changes in a much wider age range and with a larger sample. Together with the findings from this study, both studies showed that the longitudinal findings provide additional information and differences in activation patterns compared with the cross-sectional findings. Age and performance effects can be assessed in either a cross-sectional or longitudinal fashion. Cross-sectional imaging studies provide insights into age and/or performance differences in brain function; however, longitudinal studies are required to provide a true measure of the functional change over time.
In summary, the present 3.5 year longitudinal fMRI study in healthy children, adolescents and young adults provided important evidence of behavioral and brain activity-related change over time and test-retest reliability of fMRI in young age groups. The most notable finding was that performance on a feedback-based rule-switching task is a better predictor than age of changes over time in feedback-related brain activation. A next step in developmental neuroimaging work could be to characterize participants based on learning types, as these may be stronger predictors of neural change than age alone.
Footnotes
This research was supported by a VIDI grant (no. 91786368) from the Netherlands Organization for Scientific Research (NWO) to E.A.C. We thank Dr. Silvia Bunge for reading and commenting on the manuscript. We are thankful to Rosa Meuwese and Sandy Overgaauw for their help with data acquisition.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. P. Cédric M. P. Koolschijn, Institute of Psychology; Brain and Development Laboratory, Leiden University, 3B33.A, Wassenaarseweg 52, 2333 AK Leiden, The Netherlands. koolschijnpcmp{at}fsw.leidenuniv.nl