Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Behavioral/Cognitive

Learning Predictive Statistics: Strategies and Brain Mechanisms

Rui Wang, Yuan Shen, Peter Tino, Andrew E. Welchman and Zoe Kourtzi
Journal of Neuroscience 30 August 2017, 37 (35) 8412-8427; https://doi.org/10.1523/JNEUROSCI.0144-17.2017
Rui Wang
1Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China,
2Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rui Wang
Yuan Shen
3Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China, and
4School of Computer Science, University of Birmingham, Birmingham B15 2TT, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter Tino
4School of Computer Science, University of Birmingham, Birmingham B15 2TT, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew E. Welchman
2Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrew E. Welchman
Zoe Kourtzi
2Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zoe Kourtzi
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory–motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions.

SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to changes in the environment's statistics. We provide evidence for an alternate route for learning complex temporal statistics: extracting the most probable outcome in a given context is implemented by interactions between executive and motor corticostriatal mechanisms compared with visual corticostriatal circuits (including hippocampal cortex) that support learning of the exact temporal statistics.

  • fMRI
  • learning
  • prediction
  • vision

Introduction

Making predictions about future events challenges us to extract structure from streams of sensory signals that initially appear incomprehensible. Typically, event structures in the natural environment contain regularities of variable complexity from simple repetitive patterns to more complex probabilistic combinations. For example, when learning a new piece of music or a new language, we extract simple repetitive patterns (e.g., tones, syllables) and more complex contingencies (e.g., melodies or phoneme pairs) that determine the probability with which events occur. Learning to extract these statistics allows us to interpret incoming signals rapidly and predict upcoming events. Despite the fundamental importance of this type of statistical learning for sensory interpretation and prediction, we know surprisingly little about its neural basis.

Previous work on statistical learning has focused on simple repetitive patterns or associative pairings. Behavioral studies provide evidence that mere exposure (i.e., without explicit feedback) to co-occurring stimuli can drive learning of contingencies (for reviews, see Perruchet and Pacton, 2006; Aslin and Newport, 2012). For example, observers become familiar with structured patterns after exposure to items (e.g., shapes, tones, or syllables) that co-occur spatially or appear in a temporal sequence (Saffran et al., 1999; Chun, 2000; Fiser and Aslin, 2002; Turk-Browne et al., 2005). Here, we investigate the functional brain mechanisms that mediate our ability to adapt to changes in the environment's statistics and learn behaviorally relevant structures for making predictions.

We combine behavioral measures with multisession fMRI (before and after training) to examine the neural mechanisms that mediate learning of temporal sequences that change in their statistics from repetitive patterns to more complex probabilistic contingencies. To do so unencumbered by past experience, we tested participants with sequences of unfamiliar symbols, in which the complexity of the sequence structure changed unbeknownst to the participants (see Fig. 1). We increased sequence complexity by manipulating the memory order (i.e., context length) of the Markov model used to generate the sequences. In particular, we presented participants first with sequences that were determined by frequency statistics (i.e., occurrence probability per symbol) and then by more complex, context-based statistics (i.e., the probability of a given symbol appearing depends on the preceding symbol). Participants performed a prediction task in which they indicated which symbol they expected to appear after exposure to a sequence of variable length. Following previous statistical learning paradigms, participants were exposed to the sequences without trial-by-trial feedback.

Our behavioral results show that individuals adapt to the environment's statistics; that is, they are able to extract predictive structures of different complexity. Further, we show that learning of predictive structures relates to individual decision strategy; that is, individuals differed in their decision strategies, favoring either probability maximization (i.e., extracting the most probable outcome in a given context) or matching the exact sequence statistics. We used this variability in decision strategy to investigate fMRI activity. We find that distinct corticostriatal mechanisms mediate the two strategies: matching engages occipitotemporal regions (including the hippocampus) and ventral caudate, whereas maximizing engages dorsolateral prefrontal, cingulate, sensory–motor regions, and basal ganglia (dorsal caudate, putamen). This provides evidence for differentiated corticostriatal mechanisms that support learning of behaviorally relevant statistics for making predictions.

Materials and Methods

Observers

Thirty-four participants (mean age = 21.8 years, male and female) participated in the experiments (main experiment: n = 23; control experiment: n = 11). The data from two participants were excluded from further imaging analysis due to excessive head movement (>3 mm). All observers were naive to the aim of the study, had normal or corrected-to-normal vision, and gave written informed consent. This study was approved by the University of Birmingham Ethics Committee.

Stimuli

Stimuli comprised four symbols chosen from the Ndjuká syllabary (Turk-Browne et al., 2009; see Fig. 1a). These symbols were highly discriminable from each other and were unfamiliar to the observers. Each symbol subtended 8.5° of visual angle and was presented in black on a midgray background. Experiments were controlled using MATLAB and the Psychophysics toolbox 3 (Brainard, 1997; Pelli, 1997). For the behavioral training sessions, stimuli were presented on a 21-inch CRT monitor (ViewSonic P225f 1280 ×1024 pixel, 85 Hz frame rate) at a distance of 45 cm. For the pretraining and posttraining fMRI scans, stimuli were presented using a projector and a mirror setup (1280 × 1024 pixel, 60 Hz frame rate) at a viewing distance of 67.5 cm. The physical size of the stimuli was adjusted so that angular size was constant during behavioral and scanning sessions.

Sequence design

To generate probabilistic sequences of different complexity, we used a temporal Markov model and manipulated the memory order of the sequence, which we refer to as the context length.

The Markov model consists of a series of symbols in which the symbol at time i is determined probabilistically by the previous k symbols. We refer to the symbol presented at time i, s(i), as the target and to the preceding k-tuple of symbols (s(i − 1), s(i − 2), … s(i − k)) as the context. The value of k is the order or level of the following model: Embedded Image The simplest k = 0th order model is a random memory-less source. This generates, at each time point i, a symbol according to symbol probability P(s) without taking account of the previously generated symbols.

The order k = 1 Markov model generates symbol s(i) at each time i conditional on the previously generated symbol s(i − 1). This introduces a memory in the sequence; that is, the probability of a particular symbol at time i strongly depends on the preceding symbol s(i − 1). Unconditional symbol probabilities P(s(i)) for the case k = 0 are replaced with conditional ones, P(s(i) | s(i − 1)).

At each time point, the symbol that follows a given context is determined probabilistically, making the Markov sequences stochastic. The underlying Markov model can be represented through the associated context-conditional target probabilities. We used four symbols that we refer to as stimuli A, B, C, and D. The correspondence between stimuli and symbols was counterbalanced across participants.

For Level 0, the Markov model was based on the probability of symbol occurrence: one symbol had a high probability of occurrence, one low probability, whereas the remaining two symbols appeared rarely (see Fig. 1b). For example, the probabilities of occurrence for the four symbols A–D were 0.18, 0.72, 0.05, and 0.05, respectively. Presentation of a given symbol was independent of the stimuli that preceded it.

For Level 1, the target depended on the immediately preceding stimulus (see Fig. 1b). Given a context (the last seen symbol) only one of two targets could follow; one had a high probability of being presented and the other a low probability (e.g., 80% vs 20%). For example, when symbol A was presented, only symbols B or C were allowed to follow, and symbol B had a higher probability of occurrence than symbol C.

Task design

We tested learning of temporal structures that differed in their complexity; that is, sequences determined by simple frequency statistics (Level 0) and more complex sequences defined by context-based statistics (Level 1). To define the complexity of our sequences, we quantified the average past–future mutual information in the sequences generated by stochastic sources (Grassberger, 1986), providing a statistic that has been applied in a number of probabilistic contexts (Shaw, 1984; Li, 1991). For Markov models of order 0 or 1, complexity is expressed as the difference between the entropy of the marginal symbol distribution and the entropy rate of the Markov chain (Li, 1991). This measure quantifies the average reduction in uncertainty of the next symbol in a sequence when the memory of the generating source is taken into account. For 0-order Markov models, the complexity is 0 because the source itself is memory-less. For Markov models of order 1, conditioning on the last symbol will reduce the uncertainty. For example, for the first-order Markov model that we used, the marginal symbol probabilities are equal, resulting in entropy close to the maximum value of two bits. However, conditional on the last symbol, only two symbols are allowed with unequal probabilities, resulting in lower entropy rate and therefore higher complexity (1.28).

To investigate whether participants adapt to changes in the temporal structure, we ensured that the sequences across levels were matched for properties (i.e., number or identity of symbols) other than complexity. Further, we designed the stochastic sources from which the sequences were generated so that the context-conditional uncertainty remained highly similar across levels. In particular, for the zero-order source, only two symbols were likely to occur most of the time; the remaining two symbols had very low probability (0.05). This was introduced to ensure that there was no difference in the number of symbols presented across levels. Of the two dominant symbols, one was more probable (probability 0.72) than the other (probability 0.18). This structure is preserved in Markov chain of order 1, in which, conditional on the previous symbols, only two symbols were allowed to follow, one with higher probability (0.80) than the other (0.20). This ensures that the structure of the generated sequences across levels differed predominantly in memory order (i.e., context length) rather than context-conditional probability.

Procedure

Observers were initially familiarized with the task through a brief practice session (8 min) with random sequences (i.e., all four symbols were presented with equal probability 25% in a random order). After this, observers participated in multiple behavioral training and fMRI scanning sessions that were conducted on different days (see Fig. 1c). Participants were trained with structured sequences and tested with both structured and random sequences to ensure that training was specific to the trained sequences.

In the first scanning session, participants were presented with zero- and first-order sequences and random sequences. Observers were then trained with zero-order sequences and subsequently with first-order sequences. For each level, observers completed a minimum of three and a maximum of five training sessions (840–1400 trials). Training at each level ended when participants reached plateau performance (i.e., performance did not change significantly for two sessions). A posttraining scanning session followed training per level (i.e., on the following day after completion of training), during which observers were presented with structured sequences determined by the statistics of the trained level and random sequences. The mean time interval (±SE) between the pretraining session and the final test session was 23.5 ± 0.5 d.

Psychophysical training

Each training session comprised five blocks of structured sequences (56 trials per block) and lasted 1 h. To ensure that sequences in each block were representative of the Markov model order per level, we generated 10,000 Markov sequences per level comprising 672 stimuli per sequence. We then estimated the Kullback–Leibler (KL) divergence between each example sequence and the generating source. In particular, for Level 0 sequences, this was defined as follows: Embedded Image and for Level 1 sequences, this was defined as follows: Embedded Image where P( ) refers to probabilities or conditional probabilities derived from the presented sequences and Q( ) refers to those specified by the source. We selected 50 sequences with the lowest KL divergence (i.e., these sequences matched closely the Markov model per level). The sequences presented to the participants during the experiments were selected randomly from this sequence set.

For each trial, a sequence of 8–14 stimuli appeared in the center of the screen one at a time in a continuous stream, each for 300 ms followed by a central white fixation dot (interstimulus interval, ISI) for 500 ms (see Fig. 1a). This variable trial length ensured that observers maintained attention during the whole trial. Each block comprised equal number of trials with the same number of stimuli. The end of each trial was indicated by a red dot cue that was presented for 500 ms. After this, all 4 symbols were shown in a 2 × 2 grid. The positions of test stimuli were randomized from trial to trial. Observers were asked to indicate which symbol they expected to appear after the preceding sequence by pressing a key corresponding to the location of the predicted symbol. Observers learned a stimulus-key mapping during the familiarization phase: keys 8, 9, 5, and 6 in the number pad corresponded to the four positions of the test stimuli: upper left, upper right, lower left, and lower right, respectively. After the observer's response, a white circle appeared on the selected stimulus for 300 ms to indicate the observer's choice, followed by a fixation dot for 150 ms (intertrial interval) before the start of the next trial. If no response was made within 2 s, a null response was recorded and the next trial started. Participants were given feedback (i.e., score in the form of PI, see “Data analysis” section) at the end of each block, rather than per-trial error feedback, which motivated them to continue with training.

Scanning sessions

The pretraining scanning session (Pre) included six runs (i.e., three runs per level), the order of which was randomized across participants. Scanning sessions after training per level (denoted as Post0, Post1) included nine runs of structured sequences determined by the same statistics as the corresponding trained level and random sequences. Each run comprised five blocks of structured and five blocks of random sequences presented in a random counterbalanced order (two trials per block; a total of 10 structured and 10 random trials per run), with an additional two 16 s fixation blocks, one at the beginning and one at the end of each run. Each trial comprised a sequence of 10 stimuli that were presented for 250 ms each, separated by a blank interval during which a white fixation dot was presented for 250 ms. After the sequence, a fixation screen (central red dot) appeared for 4 s before the test display (comprising 4 test stimuli) appeared for 1.5 s. Observers were asked to indicate which symbol they expected to appear after the preceding sequence by pressing a key corresponding to the location of the predicted symbol. A white fixation was then presented for 5.5 s before the start of the next trial. In contrast to the training sessions, no feedback was given during scanning.

fMRI data acquisition

The experiments were conducted at the Birmingham University Imaging Centre using a 3 T Philips Achieva MRI scanner. T2*-weighted functional and T1-weighted anatomical (1 × 1 × 1 mm resolution, slices = 175) data were collected with a 32-channel SENSE head coil. Echoplanar imaging data (gradient echo pulse sequences) were acquired from 32 slices (whole-brain coverage; TR = 2000 ms; TE = 35 ms; 2.5 × 2.5 × 4 mm resolution).

Behavioral data analysis

Performance index (PI).

We assessed participant responses in a probabilistic manner. For each context, we computed the absolute Euclidean distance between the distribution of participant responses and the distribution of presented targets estimated across 56 trials per block as follows: Embedded Image where the sum is over targets from the symbol set A–D. We estimate AbDist per context for each block. We quantified the minimum overlap between these two distributions by computing a Performance Index (PI) per context as follows: Embedded Image Note that PI(context) = 1 − AbDist(context)/2. The overall PI is then computed as the average of the performance indices across contexts, PI(context), weighted by the corresponding stationary context probabilities as follows: Embedded Image To compare across different levels, we defined a normalized PI measure that quantifies participant performance relative to random guessing. We computed a random guess baseline, PIrand, which reflects participant responses to targets with equal probability of 25% for each target per trial for Level 0, (PIrand = 0.53) and equal probability for each target for a given context for Level 1 (PIrand = 0.45). To correct for differences in random guess baselines across levels, we subtracted the random guess baseline from the PI (PInormalized = PI − PIrand).

Strategy choice and strategy index.

To quantify each observer's strategy, we compared individual participant response distributions (response-based model) to two baseline models: (1) probability matching, in which probability distributions are derived from the Markov models that generated the presented sequences (model matching), and (2) a probability maximization model, in which only the single most likely outcome is allowed for each context (model maximization). We used KL divergence to compare the response distribution to each of these two models. KL is defined as follows: Embedded Image for the Level 0 model and Embedded Image for the Level 1 model, where R( ) and M( ) denote the probability distribution or conditional probability distribution derived from the human responses and the models (i.e., probability matching or maximization), respectively, across all the conditions.

We quantified the difference between the KL divergence from model matching to the response-based model and the KL divergence from model maximization to the response-based model. We refer to this quantity as strategy choice indicated by ΔKL (model maximization, model matching). Negative strategy choice values indicate a strategy closer to matching, whereas positive values indicate a strategy closer to maximization. We computed strategy choice per training block, resulting in a strategy curve across training for each individual participant. We then derived an individual strategy index by calculating the integral of each participant's strategy curve and subtracting it from the integral of the exact matching curve, as defined by model matching across training. We defined the integral curve difference between individual strategy and exact matching as the individual strategy index. We used this index to investigate the relationship of individual strategy and fMRI signals.

fMRI data analysis

Data preprocessing.

MRI data were processed using Brain Voyager QX (Brain Innovation). T1-weighted anatomical data were used for coregistration, 3D cortex reconstruction, inflation, and flattening. Preprocessing of the functional data involved slice scan time correction, head motion correction, temporal high-pass filtering (three cycles), and removal of linear trends. Spatial smoothing (Gaussian filter; 5 mm FWHM kernel) was performed for group random-effect analysis. The functional images were aligned to anatomical data and the complete data were transformed into Talairach space. For each observer, the functional imaging data between sessions were coaligned by registering all volumes of each observer to the first functional volume acquired during the first session.

Whole-brain general linear model (GLM).

BOLD responses for each trial comprising structured or random sequences were modeled separately for each session using a GLM. To search for brain regions that showed learning-dependent changes across sessions, we constructed a multiple regression design matrix that included the two stimulus conditions (structured vs random sequences) for each of the scanning sessions (Pre, Post0, Post1) as regressors. Each regressor was time locked to trial onset and included a range of volumes (see Figs. 3 and 4: five volumes and Fig. 5b: three volumes). To remove residual motion artifacts, the six zero-centered head movement parameters were also included as regressors. Serial correlations were corrected using a second-order autoregressive model AR(2). The resulting parameter estimates (β value) were used in a voxelwise mixed-design ANOVA with sequence (structured vs random) and scanning session (Pre, Post0, Post1). Statistical maps were cluster threshold corrected (p < 0.005) using Monte Carlo simulations (5000 iterations; Forman et al., 1995; Goebel et al., 2006) for multiple-comparison correction that confirmed a familywise error threshold of p = 0.05. Note that our results also hold for a more conservative threshold (p < 0.001), as recommended by recent studies (Woo et al., 2014; Eklund et al., 2016), but small-volume correction is required for small structures (i.e., putamen) at this threshold.

Covariance analysis.

To examine the relationship between brain activation and observers' performance, we conducted a voxelwise covariance analysis. In particular, we used individual strategy index as covariate in a GLM model of fMRI responses. That is, for each voxel, we correlated fMRI signal difference between structured and random sequences before versus after training with the strategy index. We calculated a Pearson correlation coefficient (R) for each voxel across the whole brain and identified voxel clusters showing significant correlations (p < 0.05, cluster threshold corrected). Positive correlations indicate increased activations after training that relate to maximization, whereas negative correlations indicate increased activations after training that relate to matching because a negative strategy index indicates matching.

Results

Behavioral results

Previous studies have compared learning of different spatiotemporal contingencies in separate experiments across different participant groups (Fiser and Aslin, 2002, 2005). Here, to investigate whether individuals extract changes in structure, we presented the same participants with sequences that changed in complexity unbeknownst to them (Fig. 1a). We parameterized sequence complexity based on the memory order of the Markov models used to generate the sequences (see Materials and Methods); that is, the degree to which the presentation of a symbol depended on the history of previously presented symbols (Fig. 1b). We first presented participants with simple zero-order sequences (Level 0), followed by more complex first-order sequences (Level 1; Fig. 1c) because previous work has shown that temporal dependencies are more difficult to learn as their length increases (van den Bos and Poletiek, 2008) and training with simple dependencies may facilitate learning of more complex contingencies (Antoniou et al., 2016). Zero-order sequences (Level 0) were context-less; that is, the presentation of each symbol depended only on the probability of occurrence of each symbol. For first-order sequences (Level 1), the presentation of a particular symbol was conditionally dependent on the previously presented symbol (i.e., context length of one).

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Trial and sequence design. a, Trial design. 8–14 stimuli were presented sequentially, followed by a cue and the test display. b, Sequence design. Markov models comprised two levels of complexity. For the zero-order model (Level 0), different states (A–D) are assigned to four symbols with different frequencies. For the first-order model (Level 1), a diagram indicates states (circles) and transitional probabilities (black arrow: high probability, e.g., 80%; gray arrow: low probability, e.g., 20%). Transitional probabilities are shown in a four-by-four conditional probability matrix, with rows indicating temporal context and columns indicating the corresponding target. c, Experimental protocol. Observers underwent multiple days of behavioral training first with zero-order sequences and then with first-order sequences. For each level, observers completed three to five training sessions (an average of four sessions is shown for illustration purposes). Three fMRI scanning sessions were conducted before (Pre) and immediately after training per level (Post0, Post1).

Because the sequences we used were probabilistic, we developed a probabilistic measure to assess participants' performance in the prediction task. Specifically, we computed a Performance Index (PI) that indicates how closely the distribution of participant responses matched the probability distribution of the presented symbols. This is preferable to a simple measure of accuracy because the probabilistic nature of the sequences means that the “correct” upcoming symbol is not uniquely specified; therefore, designating a particular choice as correct or incorrect is often arbitrary.

Comparing normalized performance (i.e., after subtracting performance based on random guessing) before and after training per level (Fig. 2) showed that observers improved substantially and learned the probabilistic structures (i.e., mean improvement >20% for both levels). A repeated-measures ANOVA with session (Pre, Post) and level (Level 0, Level 1) showed a significant effect of session (F(1,20) = 82.0, p < 0.001), but no significant effect of level (F(1,20) < 1, p = 0.358) and no significant interaction (F(1,20) < 1 p = 0.664), indicating that observers improved similarly at both levels through training. Interestingly, performance during the pretraining test session was higher than random guessing (F(1,20) = 42.8, p < 0.001), suggesting that fast learning of structured sequences is consistent with the learning time course reported in previous perceptual learning studies (Karni and Sagi, 1993). However, improvement continued during training across blocks; that is, mean performance for the last two training blocks was significantly higher than the mean performance for the first two training blocks (F(1,20) = 12.8, p = 0.002).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Behavioral performance. a, Mean PI across participants for test (open symbols) and training (solid symbols) blocks for Level 0 and Level 1. Data are fitted (least-squares nonlinear fit) across training blocks. Random guess baseline is indicated by dotted lines. b, Normalized PI during scanning. Data are shown before (gray bars) and after (black bars) training for each level. Error bars indicate SEM.

We then tested whether this learning-dependent improvement was specific to the trained structured sequences. First, we compared performance accuracy (i.e., proportion of correctly predicted trials based on the predefined sequences) for structured and random sequences. A repeated-measures ANOVA showed a significant interaction of Session (Pre, Post) and sequence type (structured vs random) for Level 0 (F(1,20) = 24.1, p < 0.001) and Level 1 (F(1,20) = 54.5, p < 0.001), suggesting that learning improvement was specific to structured sequences. Second, we conducted a no-training control experiment, during which participants (n = 11) were tested in two separate behavioral sessions but did not participate in any training sessions. The two test sessions were spaced apart by a period (27.9 ± 1.9 days on average), comparable to the main experiment (23.5 days on average). Our results showed that there were no significant differences in performance between the two test sessions. In particular, a repeated-measures ANOVA with session (Session 1, Session 2) and Level (Level 0, Level 1) did not show any significant effect of session (F(1,10) < 1, p = 0.736) or level (F(1,10) = 1.84, p = 0.205) and no significant interaction (F(1,10) = 1.16, p = 0.308). These results suggest that the improvement that we observed in the main experiment was specific to training, rather than simply being due to repeating the test session twice (before and after training). Comparing PI between experiments (main vs no-training control experiment) showed a significant interaction between experiment and session (Level 0: F(1,30) = 15.1, p = 0.001, Level 1: F(1,30) = 7.95, p = 0.008), consistent with training-induced behavioral improvement.

fMRI analysis: learning-dependent activation changes

To investigate the brain mechanisms that mediate our ability to adapt to changes in temporal statistics, we performed fMRI on participants before and after training on each level with structured and random sequences. To assess learning-dependent changes in fMRI signals, we conducted a whole-brain voxelwise GLM analysis (RFX group analysis). In particular, we tested for brain regions that showed a significant interaction (p < 0.005, cluster threshold corrected) between sequence (structured vs random) and scanning session (Pre, Post0, Post1). This analysis revealed a network of dorsal frontal, cingulate, posterior parietal, occipital, and temporal regions, as well as subcortical (basal ganglia) and cerebellar regions (Fig. 3a, Table 1).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

fMRI results. a, GLM maps for the two-way interaction between scanning session (Pre, Post0, Post1) and sequence (structured vs random), at p < 0.005 (cluster threshold corrected). Only the first five volumes were included in the analysis that correspond to the presentation of sequence, the participants' prediction, and the test display presentation to avoid confounding the results by the participants' response. Similar results were observed at a more conservative threshold (p < 0.001), but small volume correction was necessary for small structures (i.e., putamen) at this threshold. b, PSC index (percentage signal change for structured sequences compared with random sequences) before and after training for Level 0 and Level 1. Data are shown for ROIs that showed a significant interaction between session (pretraining vs posttraining) and sequence (structured vs random). Error bars indicate SEM. Note that different number of runs were scanned before and after training (i.e., pretraining scan comprised three runs per level, whereas posttraining scans comprised nine runs per level). To compare equal amounts of data before and after training, we selected three of the nine runs from each posttraining scan; that is, we divided each session into two time periods and selected randomly one run per time period to match the order in which data were collected during the pretraining scan. Whole-brain voxelwise GLM analysis showed significant interactions for sequence (structured vs random) and scanning session (Pre, Post0, Post1) in the frontal, parietal, and subcortical regions, which is consistent with our main result.

View this table:
  • View inline
  • View popup
Table 1.

Brain regions showing significant interaction between scanning session (Pre, Post0, Post1) and sequence (structured vs random), p < 0.005, cluster corrected

We next investigated whether functional signals in these regions change from learning frequency (Level 0) to learning context-based statistics (Level 1) over time. In particular, we compared fMRI responses for structured and random sequences before and after training for each level (Level 0 vs Level 1) separately. For each participant and brain region identified by the GLM analysis, we calculated normalized fMRI responses [i.e., percentage signal change (PSC) index]; that is, we subtracted mean fMRI responses to random sequences from mean fMRI responses to structured sequences and divided by the average fMRI responses to random sequences. Note that this PSC analysis is complementary to the GLM analysis used to define regions of interest (ROI); it was conducted separately for each level, whereas the GLM tested for differences across sessions (i.e., Pre, Post0, Post1) rather than levels.

Comparing normalized fMRI responses before and after training for Level 0 (Fig. 3b) showed that bilateral dorsal frontal regions (medial: SFG: superior frontal gyrus; MeFG: medial frontal gyrus, lateral: MFG: middle frontal gyrus, PrG: precentral gyrus, and IFG: inferior frontal gyrus) and right posterior parietal regions (IPL: inferior parietal lobule, AnG; angular gyrus, and SMG: supramarginal gyrus) were involved in learning frequency-based statistics. These regions showed increased fMRI responses to structured sequences during the pretraining scanning session in contrast to decreased responses after training (i.e., posttraining scanning session). In particular, a repeated-measures ANOVA with session (Pre, Post) and ROI showed a significant main effect of session in the frontal (F(1,20) = 7.59, p = 0.012) and posterior parietal (F(1,20) = 6.58, p = 0.018) regions.

In contrast, learning context-based statistics (Level 1) engaged dorsal medial frontal (SFG and MeFG), limbic (CG: cingulate gyrus, ACC: anterior cingulate cortex), and subcortical (Pu: Putamen) areas (Fig. 3b). Similar to the fMRI activation patterns for Level 0, dorsal frontal regions showed enhanced responses to structured compared with random sequences for the pretraining scan that decreased after training. This was supported by a repeated-measures ANOVA that showed a significant session effect (frontal: F(1,20) = 6.36, p = 0.020; limbic: F(1,20) = 5.36, p = 0.031). In contrast, we observed the opposite pattern of results in putamen (paired t test, t(20) = −3.31, p = 0.003); that is, enhanced activations for structured sequences after training. Activation patterns differed significantly between putamen and frontal limbic regions (i.e., significant interactions of region and session: frontal vs putamen, F(1,20) = 16.22, p < 0.001; limbic vs putamen, F(1,20) = 16.34, p < 0.001). In a complementary analysis to the GLM analysis, comparing activations across levels showed significant differences in prefrontal regions (interaction of session and level, F(1,20) = 4.83, p = 0.040), right posterior parietal regions (main effect of level, F(1,20) = 7.41, p = 0.013) and putamen (main effect of level, F(1,20) = 4.56, p = 0.045). Consistent with the GLM analysis, these results support differential involvement of frontoparietal and striatal regions in learning frequency compared with context-based statistics.

Interestingly, the GLM analysis showed activation changes across sessions in the visual cortex (IOG: inferior occipital gyrus, MOG: middle occipital gyrus, LiG: lingual gyrus). Comparing fMRI responses in these regions across sessions did not show any significant differences for either of the two levels (Level 0: F(1,20) < 1, p = 0.429; Level 1: F(1,20) < 1, p = 0.531), suggesting that fMRI responses for structured sequences did not change significantly with training in the visual cortex. For learning frequency statistics (Level 0) visual cortex showed stronger activations for random than structured sequences (i.e., negative PSC index values) both before (main effect of sequence, F(1,20) = 6.04, p = 0.023) and after (F(1,20) = 32.7, p < 0.001) training, suggesting decreased activation due to repetition (i.e., repetition suppression) of symbols that appeared more frequently in structured than random sequences (Summerfield and Egner, 2009). This effect was not observed for first-order sequences (Level 1; before training, F(1,20) < 1, p = 0.981; after training, F(1,20) = 1.87, p = 0.187), consistent with higher repetition of single symbols in zero-order than first-order sequences.

Next, we asked whether the differences we observed in the activation patterns between levels were due to differences in sequence predictability. To measure sequence predictability, we computed the entropy rate of the probability distribution of all possible sequences. For Level 0, the entropy rate is defined as the entropy of the stationary distribution of symbols in the sequence. For Level 1, the entropy rate is a weighted sum of the entropies of all context-conditional distributions where the weights are given by the stationary distribution of contexts. We calculated the entropy rate for each sequence; we then conducted the whole brain voxelwise GLM analysis using entropy rate as regressor. This analysis showed significant interactions (p < 0.001, cluster threshold corrected) between sequence (structured vs random) and scanning session (Pre, Post0, Post1) in similar regions as the main analysis (Fig. 4a), making it unlikely that our results were confounded by differences in sequence predictability between levels.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

fMRI results controlled for differences in sequence entropy across levels. a, GLM maps (p < 0.001, cluster threshold corrected) for 2-way interaction between scanning session (Pre, Post0, Post1) and sequence (structured vs random) including entropy rate as a regressor. b, PSC index before and after training for Level 0 and Level 1. Error bars indicate SEM. Data are shown for ROIs that showed a significant interaction between session (pre vs posttraining) and sequence (structured vs random).

Comparing normalized fMRI responses before and after training (Fig. 4b) for Level 0, we observed increased fMRI responses to structured sequences before than after training (F(1,20) = 5.18, p = 0.034) in bilateral frontal regions (SFG; PrG, and IFG). In contrast, learning context-based statistics (Level 1) engaged dorsal frontal (SFG), limbic (ACC) and subcortical (putamen) areas. Dorsal frontal and limbic regions showed enhanced responses to structured compared with random sequences for the pretraining scan that decreased after training (F(1,20) = 5.76, p = 0.026). In contrast, putamen showed enhanced activations for structured sequences after training (paired t test, t(20) = −2.78, p = 0.012). Activation patterns differed significantly between putamen and frontal limbic regions (i.e., significant interactions of region and session: F(1,20) = 13.9, p < 0.001), in support of differential involvement of frontal and striatal regions in learning temporal statistics.

Our results so far suggest that dorsal corticostriatal mechanisms mediate learning of behaviorally relevant statistics. In particular, frontoparietal and cingulate regions showed higher fMRI responses for structured than random sequences during the pretraining scan. This is consistent with the role of dorsal prefrontal cortex in decision making (Heekeren et al., 2008; Rushworth and Behrens, 2008) and predictive coding (Monchi et al., 2001; Bar, 2009); that is, processes that are involved in both learning of frequency and context-based statistics. Further, our results show that cingulate cortex is involved in learning more complex context-based statistics that may relate to its involvement in learning under increased uncertainty (Kahnt et al., 2011; Nastase et al., 2014). Higher fMRI responses for structured sequences in these regions at the beginning of training may reflect processing of novel structures (i.e., temporal regularities in the form of single- or paired-item repetition). Significantly higher performance for structured sequences than random guessing during the first scanning session suggests that participants extract these statistics early in the training. Interestingly, fMRI responses for structured sequences decreased as these sequences became familiar with training. This decreased signals can be understood in the context of repetition suppression previously observed for predictable events (Raichle et al., 1994; den Ouden et al., 2009; Summerfield and Egner, 2009; Alink et al., 2010; Kok et al., 2012). In contrast, dorsal striatal regions (i.e., putamen), which have been implicated in learning probabilistic associations (Rauch et al., 1997; Poldrack and Packard, 2003), showed higher fMRI responses for structured compared with random sequences after training with first-order sequences, suggesting that representations of context–target contingencies were acquired through training.

Control analyses

We conducted a number of additional analyses and experiments to help rule out alternative explanations of our results.

First, we investigated whether the differences that we observed in fMRI responses between structured and random sequences were due to the participants attending more to the structured sequences either as the novel stimulus before training or the familiar stimulus after training. Comparing response times to structured and random sequences in the pretraining and posttraining session (3-way mixed-design ANOVA: session × sequence × level) showed decreased response times after training (main effect of session: F(1,20) = 8.63, p = 0.008), but no significant differences between structured and random sequences (main effect of sequence, F(1,20) = 0.152, p = 0.700), suggesting that participants engaged with the task when both structured and random sequences were presented. Importantly, there was no significant interaction among session, sequence, and level (F(1,20) = 1.72, p = 0.205), suggesting that differences in activation patterns across levels could not be simply due to differences in attention or task difficulty. Further, analysis of eye movement data collected during scanning did not show any significant differences between structured and random sequences for Level 0 or Level 1. There were no significant interactions observed (p > 0.10), suggesting that it is unlikely that our findings were confounded significantly by eye movements.

Second, we tested whether the learning-dependent fMRI changes that we observed could be confounded by differences in the number of training sessions across participants. Training duration varied from three to five sessions per level across participants, with most participants completing four training sessions (Level 0, n = 12; Level 1, n = 17) before reaching plateau performance. An ANCOVA analysis on the behavioral data using the number of training sessions as covariate did not show any significant interactions between session and number of training sessions (Level 0: F(1,19) = 0.479, p = 0.497; Level 1: F(1,19) = 0.089, p = 0.768). Similar analysis on the fMRI data did not show any significant interaction between session and number of training sessions (Level 0: frontal, F(1,19) = 0.001, p = 0.874, parietal, F(1,19) = 0.447, p = 0.512; Level 1: frontal, F(1,19) = 0.473, p = 0.500, limbic, F(1,19) = 0.705, p = 0.412, subcortical regions, F(1,19) = 3.53, p = 0.076). Together, these analyses suggest that it is unlikely that our fMRI results were confounded by differences in training duration across participants.

Third, we investigated whether the activation patterns that we observed relate to learning-dependent changes in the representation of the trained sequences or simply the participants' responses. In our design, the ISI jitter in each trial is too short to isolate the fMRI signal per stimulus in the sequence. However, the design of the paradigm allows us to analyze our fMRI data related to sequence presentation separately from participant prediction. First, we compared PSC for the first two volumes related to the presented sequences and the fourth and fifth volume related to the participants' prediction (i.e., the third volume was not included in this analysis because the sequences lasted 2.5 volumes). This analysis (Fig. 5a) showed that activation patterns for fMRI signals related to the sequence presentation and the participants' prediction were similar to those observed in our main analysis (Figs. 3b, 4b). In particular, we observed a significant effect of session (i.e., pre vs posttraining; Level 0: frontal: F(1,20) = 4.97, p = 0.037; Level 1: frontal limbic: F(1,20) = 5.95, p = 0.024, putamen: F(1,20) = 7.29, p = 0.014), but no significant effect of processing stage (i.e., sequence vs prediction; Level 0: frontal: F(1,20) = 0.004, p = 0.951: Level 1: frontal limbic: F(1,20) = 0.399, p = 0.535: putamen: F(1,20) = 3.29, p = 0.085). There was no significant interaction of session and processing stage (Level 0: frontal: F(1,20) = 0.003, p = 0.954; Level 1: frontal limbic: F(1,20) = 0.496, p = 0.490; putamen: F(1,20) = 1.68, p = 0.209). Second, a whole-brain voxelwise GLM analysis using only the volumes that corresponded to the sequence presentation showed significant interactions (p < 0.001, cluster threshold corrected) between sequence (structured vs random) and scanning session (Pre, Post0, Post1) in similar regions as the main analysis (Fig. 5b). Together, these analyses of fMRI signals related to the sequence presentation showed similar activation patterns as the main analysis (Fig. 3a), which included fMRI signals from both the sequence presentation and the participant prediction. Therefore, the learning-dependent changes that we observed in the main analysis relate to the sequence structure and could not be simply driven by the participants' prediction or response because fMRI signals related to the sequence presentation were recorded before the participants responded to the test stimulus.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

fMRI results for sequence presentation and participants' prediction. a, PSC index for sequence presentation (volumes 1–2) and participant prediction (volumes 4–5) before and after training for Level 0 and Level 1. Data are shown for the representative ROIs from Figure 4b. Error bars indicate SEM. b, GLM maps for the 2-way interaction between scanning session and sequence at p < 0.005 (cluster threshold corrected) using only the volumes that correspond to sequence presentation.

Response strategies: matching versus maximization

Previous work (Shanks et al., 2002; Rieskamp and Otto, 2006; Eckstein et al., 2013; Acerbi et al., 2014; Fulvio et al., 2014; Murray et al., 2015) on probabilistic learning and decision making has proposed that individuals use two possible response strategies when making a choice: matching and maximization. Observers have been shown to either match their choices stochastically according to the underlying input statistics or to maximize their reward by selecting the most probable positively rewarded outcomes. In the context of our task, because the Markov models that generated stimulus sequences were stochastic, participants needed to learn the probabilities of different outcomes to succeed in the prediction task. It is possible that participants used probability maximization in which they always select the most probable outcome in a particular context. Alternatively, participants might learn the relative probabilities of each symbol [e.g., p(A) = 0.18, p(B) = 0.72, p(C) = 0.05, p(D) = 0.05)] and respond so as to reproduce this distribution, a strategy referred to as probability matching.

To quantify the participants' strategies, we computed a strategy index that indicates participant's preference (on a continuous scale) for responding using probability matching versus maximization. Figure 6 illustrates individual strategy at the beginning (first two blocks) and end (last two blocks) of training. Comparing individual strategy across levels showed significantly higher values after training for Level 1 compared with Level 0 (F(1,20) = 26.2, p < 0.001). This shift in individual strategy was evident mainly after training (F(1,20) = 35.8, p < 0.001); that is, participants shifted more toward maximization when learning context-based rather than frequency statistics. Note that this relationship was not confounded by differences in performance because there were no significant correlations (Level 0: r = 0.31, p = 0.17; Level 1: r = 0.22, p = 0.34) of PI at the end of training (mean PI for the last two blocks of training) and strategy index. Interestingly, despite greater maximization for more complex structures than frequency statistics, we note that participants did not achieve optimal maximization performance. Maximization is typically observed under supervised or reinforcement learning paradigms (Shanks et al., 2002), so it is perhaps not surprising that our participants did not achieve exact maximization because trial-by-trial feedback was not provided.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Strategy choice. Strategy choice is shown at the beginning (first two runs) and end (last two runs) of training for Level 0 (squares) and Level 1 (circles). Open symbols indicate individual participant data; closed symbols indicate mean date per level. Strategy choice was measured by comparing participant responses to two possible strategies: matching (i.e., predicting the presented target distribution) versus maximization (i.e., predicting the high probability targets per context). Negative values indicate a strategy closer to matching, whereas positive values indicate a strategy closer to maximization. Error bars indicate SEM.

fMRI covariance analysis with strategy

To investigate the relationship between brain activations and individual strategy, we conducted a voxelwise GLM covariance analyses. In particular, we correlated learning-dependent changes in fMRI signal (Posttraining PSC minus Pretraining PSC) for structured (compared with random) sequences with individual strategy. We calculated a Pearson correlation coefficient (R) for each voxel across the whole brain and identified voxel clusters showing significant correlations (p < 0.05) for learning frequency (Level 0) and context-based statistics (Level 1), respectively. Positive correlations indicate increased activations after training that relate to maximization, whereas negative correlations indicate increased activations after training that relate to matching because negative strategy values indicate strategy toward matching.

First, we observed negative correlations between learning-dependent fMRI changes and strategy index in occipitotemporal (including hippocampal regions), basal ganglia (ventral caudate), and thalamic regions (Fig. 7). These correlations indicate that increased activations for structured sequences after training in these regions relate to matching. Further, these correlations were observed for both levels, suggesting that learning frequency or context-based statistics by matching involves regions in visual corticostriatal circuits that have been implicated previously in the implicit learning of temporal sequences (Hindy et al., 2016; Rosenthal et al., 2016) and novel categories (Ashby and Maddox, 2005; Seger, 2013). In particular, previous work has implicated the striatum and the medial temporal lobe (i.e., hippocampus; Rauch et al., 1997; Poldrack and Packard, 2003; Schendan et al., 2003; Cools et al., 2004; Gheysen et al., 2011; Rose et al., 2011; Schapiro et al., 2012; Hsieh et al., 2014) in learning probabilistic associations. Further, medial temporal cortex has been implicated in explicit rule-based categorization, whereas caudate in categorization based on information integration (Nomura et al., 2007).

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

Brain activations correlating with matching. Covariance analysis shows significant (p < 0.05, cluster threshold corrected) negative correlations (R correlation coefficient) between individual strategy index and learning-dependent fMRI change (i.e., after vs before training) for Level 0 (a) and Level 1 (b). Whole-brain maps and plots show negative correlations between strategy index and PSC index change (posttraining vs pretraining) for representative ROIs, as derived from the covariance analysis (note that these correlation plots are only presented for demonstration purposes; no additional statistical analysis was performed in these ROIs after the covariance analysis to avoid circularity). Cd, Caudate: b, body; t, tail; Th, thalamus; PHG, parahippocampal gyrus; Hipp, hippocampus.

In contrast, we observed positive correlations between learning-dependent fMRI changes and strategy index, indicating that increased activations for structured sequences after training relate to maximizing (Fig. 8). In particular, for Level 0, we observed positive correlations in dorsolateral prefrontal areas (MFG/IFG), the dorsal caudate and the cingulate (including anterior cingulate) cortex. For Level 1, we observed positive correlations in dorsolateral prefrontal (MFG/IFG) and posterior parietal regions, as well as cingulate and temporal cortex. Interestingly, we also observed positive correlations for sensory–motor cortex (precentral and postcentral gyrus) and basal ganglia (putamen). Our results are consistent with the role of prefrontal and cingulate cortex in decision making, monitoring performance, correcting errors, and switching between associations and strategies. Previous work on humans and animals emphasizes the role of the caudate in switching between strategies (Monchi et al., 2001; Cools et al., 2004; Seger and Cincotta, 2006) and learning after a rule reversal (Cools et al., 2002; Pasupathy and Miller, 2005). This tonic and fast learning in the caudate is thought to train slower learning mechanisms in the frontal cortex that may facilitate generalization and abstraction of learned associations. Finally, the putamen, which is known to be involved in skilled and habitual performance (Daw et al., 2005; Balleine and O'Doherty, 2010), may facilitate learning by maximizing. That is, once participants have extracted the most probable outcome for a given context, they may then select it habitually as the predicted outcome.

Figure 8.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 8.

Brain activations correlating with maximization. Covariance analysis shows significant (p < 0.05, cluster threshold corrected) positive correlations (R correlation coefficient) between individual strategy index and learning-dependent fMRI change (i.e., posttraining vs pretraining) for Level 0 (a) and Level 1 (b). Whole-brain maps and plots show positive correlations between strategy index and PSC index change (posttraining vs pretraining) for representative ROIs, as derived from the covariance analysis (note that these correlation plots are only presented for demonstration purposes; no additional statistical analysis was performed in these ROIs after the covariance analysis to avoid circularity).

Discussion

Here, we investigated the brain mechanisms that medicate our ability to adapt to changes in the environment's statistics and make predictions. To test how individuals extract structure changes, we manipulated the complexity of temporal sequences during training from simple frequency to context-based statistics. Our results provide evidence for dissociated corticostriatal mechanisms that mediate our ability to extract behaviorally relevant statistics. We found that frontoparietal activity decreases for frequency-based learning, whereas context-based learning is associated with decreased frontocingulate activity and increased striatal activity. Decreased fMRI signals in frontoparietal circuits can be understood in the context of predictive coding as repetition suppression for predictable events (Raichle et al., 1994; den Ouden et al., 2009; Summerfield and Egner, 2009; Alink et al., 2010; Kok et al., 2012). In contrast, increased fMRI signals in putamen, which are implicated in learning probabilistic associations (Rauch et al., 1997; Poldrack and Packard, 2003), suggest representations of predictive structures acquired through training.

Importantly, our approach allows us to track participants' predictions and their decision strategies during training. We demonstrated that learning predictive structures relates to decision strategies; that is, learning complex structures relates to extracting the most probable target per context (i.e., maximizing) than matching the exact sequence statistics. Importantly, these decision strategies engage distinct corticostriatal circuits: performance based on probability matching engages occipitotemporal and basal ganglia (ventral caudate) regions, whereas performance based on maximizing engages dorsolateral prefrontal, cingulate, sensory–motor regions, and basal ganglia (dorsal caudate, putamen). Recent work has focused on the role of the hippocampus in learning temporal sequences (Hsieh et al., 2014; Rosenthal et al., 2016) and predictive associations (Hindy et al., 2016). Our findings suggest an alternate route to learning via maximizing that is implemented by interactions between executive and motor corticostriatal mechanisms rather than visual corticostriatal circuits (including hippocampal cortex) that support learning by matching.

Previous studies have implicated these corticostriatal circuits in reinforcement learning (for reviews, see Robbins, 2007; Balleine and O'Doherty, 2010). We show here that learning predictive statistics may proceed without explicit trial-by-trial feedback and involve interactions between corticostriatal circuits similar to those known to support reward-based learning (Alexander et al., 1986; Lawrence et al., 1998). In particular, we show that dorsal frontoparietal regions are involved in extracting novel regularities, monitoring and adjusting strategy throughout training. In contrast, striatal regions represent context-based statistics learned through bootstrap training (i.e., multiple sessions of exposure to structured sequences) that may optimize the selection of the most probable outcome in a given context. Previous work investigating learning of sequential contingencies in the context of the serial reaction time task suggests that striatal versus hippocampal circuits relate to distinct error-driven learning processes and operate at different learning rates (Bornstein and Daw, 2012). In particular, fast learning was shown to engage striatal regions (i.e., putamen), whereas slow learning engages the hippocampus. Although our paradigm does not dissociate learning time course from structure complexity, it is possible that learning of temporal structures proceeds from corticostriatal to hippocampal circuits.

Further, we considered whether the learning that we observed occurred in an incidental manner or involved explicit knowledge of the underlying sequence structure. Previous studies have suggested that learning of regularities may occur implicitly in a range of tasks: visuomotor sequence learning (Nissen and Bullemer, 1987; Seger, 1994; Schwarb and Schumacher, 2012), artificial grammar learning (Reber, 1967), probabilistic category learning (Knowlton et al., 1994), and contextual cue learning (Chun and Jiang, 1998). This work has focused on implicit measures of sequence learning, such as familiarity judgments or reaction times. In contrast, our paradigm allows us to test directly whether exposure to temporal sequences facilitates the observers' ability to predict the identity of the next stimulus in a sequence explicitly. Although, our experimental design makes it unlikely that the participants memorized specific stimulus positions or the full sequences, debriefing the participants suggests that most extracted some high-probability symbols or context–target combinations. Therefore, it is possible that prolonged exposure to probabilistic structures (i.e., multiple sessions in contrast to single exposure sessions typically used in statistical learning studies) in combination with prediction judgments (Dale et al., 2012) may evoke some explicit knowledge of temporal structures, in contrast to implicit measures of anticipation typically used in statistical learning studies.

Finally, previous work has implicated additional brain regions related to learning modality-specific regularities (Nastase et al., 2014); that is, visual cortex is implicated in learning visual statistical regularities (Aizenstein et al., 2004; Turk-Browne et al., 2010; Meyer and Olson, 2011), whereas inferior frontal and temporal regions in learning temporal regularities related to music and language (Bahlmann et al., 2009; Leaver et al., 2009; Karuza et al., 2013; Koelsch et al., 2013). Our results provide evidence for corticostriatal mechanisms that mediate learning of predictive statistics. We speculate that these mechanisms may mediate domain-general learning of complex structures that can be specialized to support higher cognitive functions such as learning music or language.

Footnotes

  • This work was supported by the Engineering and Physical Sciences Research Council (Grant EP/L000296/1 to P.T.), ZK from the Biotechnology and Biological Sciences Research Council (Grant H012508 to Z.K.), the Leverhulme Trust (Grant RF-2011-378 to Z.K.) and the European Community's Seventh Framework Programme (Grant FP7/2007–2013 under agreement PITN-GA 2011-290011 to Z.K.), and the Wellcome Trust (Grant 095183/Z/10/Z to A.E.W.). We thank Caroline di Bernardi Luft for help with data collection and Matthew Dexter for help with software development.

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Zoe Kourtzi, Department of Psychology, University of Cambridge, Downing Site, Cambridge CB2 3EB, UK. zk240{at}cam.ac.uk

This is an open-access article distributed under the terms of the Creative Commons Attribution License Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. ↵
    1. Acerbi L,
    2. Vijayakumar S,
    3. Wolpert DM
    (2014) On the origins of suboptimality in human probabilistic inference. PLoS Comput Biol 10:e1003661. doi:10.1371/journal.pcbi.1003661 pmid:24945142
    OpenUrlCrossRefPubMed
  2. ↵
    1. Aizenstein HJ,
    2. Stenger VA,
    3. Cochran J,
    4. Clark K,
    5. Johnson M,
    6. Nebes RD,
    7. Carter CS
    (2004) Regional brain activation during concurrent implicit and explicit sequence learning. Cereb Cortex 14:199–208. doi:10.1093/cercor/bhg119 pmid:14704217
    OpenUrlCrossRefPubMed
  3. ↵
    1. Alexander GE,
    2. DeLong MR,
    3. Strick PL
    (1986) Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annual review of neuroscience 9:357–381. doi:10.1146/annurev.ne.09.030186.002041 pmid:3085570
    OpenUrlCrossRefPubMed
  4. ↵
    1. Alink A,
    2. Schwiedrzik CM,
    3. Kohler A,
    4. Singer W,
    5. Muckli L
    (2010) Stimulus predictability reduces responses in primary visual cortex. J Neurosci 30:2960–2966. doi:10.1523/JNEUROSCI.3730-10.2010 pmid:20181593
    OpenUrlAbstract/FREE Full Text
  5. ↵
    1. Antoniou M,
    2. Ettlinger M,
    3. Wong PC
    (2016) Complexity, training paradigm design, and the contribution of memory subsystems to grammar learning. PLoS One 11:e0158812. doi:10.1371/journal.pone.0158812 pmid:27391085
    OpenUrlCrossRefPubMed
  6. ↵
    1. Ashby FG,
    2. Maddox WT
    (2005) Human category learning. Annu Rev Psychol 56:149–178. doi:10.1146/annurev.psych.56.091103.070217 pmid:15709932
    OpenUrlCrossRefPubMed
  7. ↵
    1. Aslin RN,
    2. Newport EL
    (2012) Statistical learning from acquiring specific items to forming general rules. Curr Dir Psychol Sci 21:170–176. doi:10.1177/0963721412436806 pmid:24000273
    OpenUrlCrossRefPubMed
  8. ↵
    1. Bahlmann J,
    2. Schubotz RI,
    3. Mueller JL,
    4. Koester D,
    5. Friederici AD
    (2009) Neural circuits of hierarchical visuo-spatial sequence processing. Brain Res 1298:161–170. doi:10.1016/j.brainres.2009.08.017 pmid:19686704
    OpenUrlCrossRefPubMed
  9. ↵
    1. Balleine BW,
    2. O'Doherty JP
    (2010) Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35:48–69. doi:10.1038/npp.2009.131 pmid:19776734
    OpenUrlCrossRefPubMed
  10. ↵
    1. Bar M
    (2009) The proactive brain: memory for predictions. Philos Trans R Soc Lond B Biol Sci 364:1235–1243. doi:10.1098/rstb.2008.0310 pmid:19528004
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. Bornstein AM,
    2. Daw ND
    (2012) Dissociating hippocampal and striatal contributions to sequential prediction learning. Eur J Neurosci 35:1011–1023. doi:10.1111/j.1460-9568.2011.07920.x pmid:22487032
    OpenUrlCrossRefPubMed
  12. ↵
    1. Brainard DH
    (1997) The psychophysics toolbox. Spat Vis 10:433–436. doi:10.1163/156856897X00357 pmid:9176952
    OpenUrlCrossRefPubMed
  13. ↵
    1. Chun MM
    (2000) Contextual cueing of visual attention. Trends Cogn Sci 4:170–178. doi:10.1016/S1364-6613(00)01476-5 pmid:10782102
    OpenUrlCrossRefPubMed
  14. ↵
    1. Chun MM,
    2. Jiang Y
    (1998) Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cogn Psychol 36:28–71. doi:10.1006/cogp.1998.0681 pmid:9679076
    OpenUrlCrossRefPubMed
  15. ↵
    1. Cools R,
    2. Clark L,
    3. Owen AM,
    4. Robbins TW
    (2002) Defining the neural mechanisms of probabilistic reversal learning using event-related functional magnetic resonance imaging. J Neurosci 22:4563–4567. pmid:12040063
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Cools R,
    2. Clark L,
    3. Robbins TW
    (2004) Differential responses in human striatum and prefrontal cortex to changes in object and rule relevance. J Neurosci 24:1129–1135. doi:10.1523/JNEUROSCI.4312-03.2004 pmid:14762131
    OpenUrlAbstract/FREE Full Text
  17. ↵
    1. Dale R,
    2. Duran ND,
    3. Morehead JR
    (2012) Prediction during statistical learning, and implications for the implicit/explicit divide. Adv Cogn Psychol 8:196–209. doi:10.5709/acp-0115-z pmid:22723817
    OpenUrlCrossRefPubMed
  18. ↵
    1. Daw ND,
    2. Niv Y,
    3. Dayan P
    (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat neuroscience 8:1704–1711. doi:10.1038/nn1560 pmid:16286932
    OpenUrlCrossRefPubMed
  19. ↵
    1. den Ouden HE,
    2. Friston KJ,
    3. Daw ND,
    4. McIntosh AR,
    5. Stephan KE
    (2009) A dual role for prediction error in associative learning. Cereb Cortex 19:1175–1185. doi:10.1093/cercor/bhn161 pmid:18820290
    OpenUrlCrossRefPubMed
  20. ↵
    1. Eckstein MP,
    2. Mack SC,
    3. Liston DB,
    4. Bogush L,
    5. Menzel R,
    6. Krauzlis RJ
    (2013) Rethinking human visual attention: Spatial cueing effects and optimality of decisions by honeybees, monkeys and humans. Vision Res 85:5–19. doi:10.1016/j.visres.2012.12.011 pmid:23298793
    OpenUrlCrossRefPubMed
  21. ↵
    1. Eklund A,
    2. Nichols TE,
    3. Knutsson H
    (2016) Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proc Natl Acad Sci U S A 113:7900–7905. doi:10.1073/pnas.1602413113 pmid:27357684
    OpenUrlAbstract/FREE Full Text
  22. ↵
    1. Fiser J,
    2. Aslin RN
    (2002) Statistical learning of higher-order temporal structure from visual shape sequences. J Exp Psychol Learn Mem Cogn 28:458–467. doi:10.1037/0278-7393.28.3.458 pmid:12018498
    OpenUrlCrossRefPubMed
  23. ↵
    1. Fiser J,
    2. Aslin RN
    (2005) Encoding multielement scenes: statistical learning of visual feature hierarchies. J Exp Psychol Gen 134:521–537. doi:10.1037/0096-3445.134.4.521 pmid:16316289
    OpenUrlCrossRefPubMed
  24. ↵
    1. Forman SD,
    2. Cohen JD,
    3. Fitzgerald M,
    4. Eddy WF,
    5. Mintun MA,
    6. Noll DC
    (1995) Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold. Magn Reson Med 33:636–647. doi:10.1002/mrm.1910330508 pmid:7596267
    OpenUrlCrossRefPubMed
  25. ↵
    1. Fulvio JM,
    2. Green CS,
    3. Schrater PR
    (2014) Task-specific response strategy selection on the basis of recent training experience. PLoS Comput Biol 10:e1003425. doi:10.1371/journal.pcbi.1003425 pmid:24391490
    OpenUrlCrossRefPubMed
  26. ↵
    1. Gheysen F,
    2. Van Opstal F,
    3. Roggeman C,
    4. Van Waelvelde H,
    5. Fias W
    (2011) The neural basis of implicit perceptual sequence learning. Front Hum Neurosci 5:137. doi:10.3389/fnhum.2011.00137 pmid:22087090
    OpenUrlCrossRefPubMed
  27. ↵
    1. Goebel R,
    2. Esposito F,
    3. Formisano E
    (2006) Analysis of functional image analysis contest (FIAC) data with brainvoyager QX from single-subject to cortically aligned group general linear model analysis and self-organizing group independent component analysis. Hum Brain Mapp 27:392–401. doi:10.1002/hbm.20249 pmid:16596654
    OpenUrlCrossRefPubMed
  28. ↵
    1. Grassberger P
    (1986) Toward a quantitative theory of self-generated complexity. International Journal of Theoretical Physics 25:907–938. doi:10.1007/BF00668821
    OpenUrlCrossRef
  29. ↵
    1. Heekeren HR,
    2. Marrett S,
    3. Ungerleider LG
    (2008) The neural systems that mediate human perceptual decision making. Nat Rev Neurosci 9:467–479. doi:10.1038/nrn2374 pmid:18464792
    OpenUrlCrossRefPubMed
  30. ↵
    1. Hindy NC,
    2. Ng FY,
    3. Turk-Browne NB
    (2016) Linking pattern completion in the hippocampus to predictive coding in visual cortex. Nat Neurosci 19:665–667. doi:10.1038/nn.4284 pmid:27065363
    OpenUrlCrossRefPubMed
  31. ↵
    1. Hsieh LT,
    2. Gruber MJ,
    3. Jenkins LJ,
    4. Ranganath C
    (2014) Hippocampal activity patterns carry information about objects in temporal context. Neuron 81:1165–1178. doi:10.1016/j.neuron.2014.01.015 pmid:24607234
    OpenUrlCrossRefPubMed
  32. ↵
    1. Kahnt T,
    2. Grueschow M,
    3. Speck O,
    4. Haynes JD
    (2011) Perceptual learning and decision-making in human medial frontal cortex. Neuron 70:549–559. doi:10.1016/j.neuron.2011.02.054 pmid:21555079
    OpenUrlCrossRefPubMed
  33. ↵
    1. Karni A,
    2. Sagi D
    (1993) The time course of learning a visual skill. Nature 365:250–252. doi:10.1038/365250a0 pmid:8371779
    OpenUrlCrossRefPubMed
  34. ↵
    1. Karuza EA,
    2. Newport EL,
    3. Aslin RN,
    4. Starling SJ,
    5. Tivarus ME,
    6. Bavelier D
    (2013) The neural correlates of statistical learning in a word segmentation task: an fMRI study. Brain Lang 127:46–54. doi:10.1016/j.bandl.2012.11.007 pmid:23312790
    OpenUrlCrossRefPubMed
  35. ↵
    1. Knowlton BJ,
    2. Squire LR,
    3. Gluck MA
    (1994) Probabilistic classification learning in amnesia. Learn Mem 1:106–120. pmid:10467589
    OpenUrlAbstract/FREE Full Text
  36. ↵
    1. Koelsch S,
    2. Rohrmeier M,
    3. Torrecuso R,
    4. Jentschke S
    (2013) Processing of hierarchical syntactic structure in music. Proc Natl Acad Sci U S A 110:15443–15448. doi:10.1073/pnas.1300272110 pmid:24003165
    OpenUrlAbstract/FREE Full Text
  37. ↵
    1. Kok P,
    2. Jehee JF,
    3. de Lange FP
    (2012) Less is more: expectation sharpens representations in the primary visual cortex. Neuron 75:265–270. doi:10.1016/j.neuron.2012.04.034 pmid:22841311
    OpenUrlCrossRefPubMed
  38. ↵
    1. Lawrence AD,
    2. Sahakian BJ,
    3. Robbins TW
    (1998) Cognitive functions and corticostriatal circuits: insights from Huntington's disease. Trends Cogn Sci 2:379–388. doi:10.1016/S1364-6613(98)01231-5 pmid:21227253
    OpenUrlCrossRefPubMed
  39. ↵
    1. Leaver AM,
    2. Van Lare J,
    3. Zielinski B,
    4. Halpern AR,
    5. Rauschecker JP
    (2009) Brain activation during anticipation of sound sequences. J Neurosci 29:2477–2485. doi:10.1523/JNEUROSCI.4921-08.2009 pmid:19244522
    OpenUrlAbstract/FREE Full Text
  40. ↵
    1. Li W
    (1991) On the relationship between complexity and entropy for Markov chains and regular languages. Complex Systems 5:381–399.
    OpenUrl
  41. ↵
    1. Meyer T,
    2. Olson CR
    (2011) Statistical learning of visual transitions in monkey inferotemporal cortex. Proc Natl Acad Sci U S A 108:19401–19406. doi:10.1073/pnas.1112895108 pmid:22084090
    OpenUrlAbstract/FREE Full Text
  42. ↵
    1. Monchi O,
    2. Petrides M,
    3. Petre V,
    4. Worsley K,
    5. Dagher A
    (2001) Wisconsin Card Sorting revisited: distinct neural circuits participating in different stages of the task identified by event-related functional magnetic resonance imaging. J Neurosci 21:7733–7741. pmid:11567063
    OpenUrlAbstract/FREE Full Text
  43. ↵
    1. Murray RF,
    2. Patel K,
    3. Yee A
    (2015) Posterior probability matching and human perceptual decision making. PLoS Comput Biol 11:e1004342. doi:10.1371/journal.pcbi.1004342 pmid:26079134
    OpenUrlCrossRefPubMed
  44. ↵
    1. Nastase S,
    2. Iacovella V,
    3. Hasson U
    (2014) Uncertainty in visual and auditory series is coded by modality-general and modality-specific neural systems. Hum Brain Mapp 35:1111–1128. doi:10.1002/hbm.22238 pmid:23408389
    OpenUrlCrossRefPubMed
  45. ↵
    1. Nissen MJ,
    2. Bullemer P
    (1987) Attentional requirements of learning: evidence from performance measures. Cognitive Psychology 19:1–32. doi:10.1016/0010-0285(87)90002-8
    OpenUrlCrossRef
  46. ↵
    1. Nomura EM,
    2. Maddox WT,
    3. Filoteo JV,
    4. Ing AD,
    5. Gitelman DR,
    6. Parrish TB,
    7. Mesulam MM,
    8. Reber PJ
    (2007) Neural correlates of rule-based and information-integration visual category learning. Cereb Cortex 17:37–43. pmid:16436685
    OpenUrlCrossRefPubMed
  47. ↵
    1. Pasupathy A,
    2. Miller EK
    (2005) Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433:873–876. doi:10.1038/nature03287 pmid:15729344
    OpenUrlCrossRefPubMed
  48. ↵
    1. Pelli DG
    (1997) The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spat Vis 10:437–442. doi:10.1163/156856897X00366 pmid:9176953
    OpenUrlCrossRefPubMed
  49. ↵
    1. Perruchet P,
    2. Pacton S
    (2006) Implicit learning and statistical learning: one phenomenon, two approaches. Trends Cogn Sci 10:233–238. doi:10.1016/j.tics.2006.03.006 pmid:16616590
    OpenUrlCrossRefPubMed
  50. ↵
    1. Poldrack RA,
    2. Packard MG
    (2003) Competition among multiple memory systems: converging evidence from animal and human brain studies. Neuropsychologia 41:245–251. doi:10.1016/S0028-3932(02)00157-4 pmid:12457750
    OpenUrlCrossRefPubMed
  51. ↵
    1. Raichle ME,
    2. Fiez JA,
    3. Videen TO,
    4. MacLeod AM,
    5. Pardo JV,
    6. Fox PT,
    7. Petersen SE
    (1994) Practice-related changes in human brain functional anatomy during nonmotor learning. Cereb Cortex 4:8–26. doi:10.1093/cercor/4.1.8 pmid:8180494
    OpenUrlCrossRefPubMed
  52. ↵
    1. Rauch SL,
    2. Whalen PJ,
    3. Savage CR,
    4. Curran T,
    5. Kendrick A,
    6. Brown HD,
    7. Bush G,
    8. Breiter HC,
    9. Rosen BR
    (1997) Striatal recruitment during an implicit sequence learning task as measured by functional magnetic resonance imaging. Hum Brain Mapp 5:124–132. pmid:10096417
    OpenUrlCrossRefPubMed
  53. ↵
    1. Reber AS
    (1967) Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal Behavior 6:855–863. doi:10.1016/S0022-5371(67)80149-X
    OpenUrlCrossRef
  54. ↵
    1. Rieskamp J,
    2. Otto PE
    (2006) SSL: a theory of how people learn to select strategies. J Exp Psychol Gen 135:207–236. doi:10.1037/0096-3445.135.2.207 pmid:16719651
    OpenUrlCrossRefPubMed
  55. ↵
    1. Robbins TW
    (2007) Shifting and stopping: fronto-striatal substrates, neurochemical modulation and clinical implications. Philos Trans R Soc Lond B Biol Sci 362:917–932. doi:10.1098/rstb.2007.2097 pmid:17412678
    OpenUrlAbstract/FREE Full Text
  56. ↵
    1. Rose M,
    2. Haider H,
    3. Salari N,
    4. Büchel C
    (2011) Functional dissociation of hippocampal mechanism during implicit learning based on the domain of associations. J Neurosci 31:13739–13745. doi:10.1523/JNEUROSCI.3020-11.2011 pmid:21957237
    OpenUrlAbstract/FREE Full Text
  57. ↵
    1. Rosenthal CR,
    2. Andrews SK,
    3. Antoniades CA,
    4. Kennard C,
    5. Soto D
    (2016) Learning and recognition of a non-conscious sequence of events in human primary visual cortex. Curr Biol 26:834–841. doi:10.1016/j.cub.2016.01.040 pmid:26948883
    OpenUrlCrossRefPubMed
  58. ↵
    1. Rushworth MF,
    2. Behrens TE
    (2008) Choice, uncertainty and value in prefrontal and cingulate cortex. Nat Neurosci 11:389–397. doi:10.1038/nn2066 pmid:18368045
    OpenUrlCrossRefPubMed
  59. ↵
    1. Saffran JR,
    2. Johnson EK,
    3. Aslin RN,
    4. Newport EL
    (1999) Statistical learning of tone sequences by human infants and adults. Cognition 70:27–52. doi:10.1016/S0010-0277(98)00075-4 pmid:10193055
    OpenUrlCrossRefPubMed
  60. ↵
    1. Schapiro AC,
    2. Kustner LV,
    3. Turk-Browne NB
    (2012) Shaping of object representations in the human medial temporal lobe based on temporal regularities. Curr Biol 22:1622–1627. doi:10.1016/j.cub.2012.06.056 pmid:22885059
    OpenUrlCrossRefPubMed
  61. ↵
    1. Schendan HE,
    2. Searl MM,
    3. Melrose RJ,
    4. Stern CE
    (2003) An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron 37:1013–1025. doi:10.1016/S0896-6273(03)00123-5 pmid:12670429
    OpenUrlCrossRefPubMed
  62. ↵
    1. Schwarb H,
    2. Schumacher EH
    (2012) Generalized lessons about sequence learning from the study of the serial reaction time task. Adv Cogn Psychol 8:165–178. doi:10.5709/acp-0113-1 pmid:22723815
    OpenUrlCrossRefPubMed
  63. ↵
    1. Seger CA
    (1994) Implicit learning. Psychol Bull 115:163–196. doi:10.1037/0033-2909.115.2.163 pmid:8165269
    OpenUrlCrossRefPubMed
  64. ↵
    1. Seger CA
    (2013) The visual corticostriatal loop through the tail of the caudate: circuitry and function. Front Syst Neurosci 7.
  65. ↵
    1. Seger CA,
    2. Cincotta CM
    (2006) Dynamics of frontal, striatal, and hippocampal systems during rule learning. Cereb Cortex 16:1546–1555. pmid:16373455
    OpenUrlCrossRefPubMed
  66. ↵
    1. Shanks DR,
    2. Tunney RJ,
    3. McCarthy JD
    (2002) A re-examination of probability matching and rational choice. Journal of Behavioral Decision Making 15:233–250. doi:10.1002/bdm.413
    OpenUrlCrossRef
  67. ↵
    1. Shaw R
    (1984) The dripping faucet as a model chaotic system. Santa Cruz, CA: Aerial.
  68. ↵
    1. Summerfield C,
    2. Egner T
    (2009) Expectation (and attention) in visual cognition. Trends Cogn Sci 13:403–409. doi:10.1016/j.tics.2009.06.003 pmid:19716752
    OpenUrlCrossRefPubMed
  69. ↵
    1. Turk-Browne NB,
    2. Jungé J,
    3. Scholl BJ
    (2005) The automaticity of visual statistical learning. J Exp Psychol Gen 134:552–564. doi:10.1037/0096-3445.134.4.552 pmid:16316291
    OpenUrlCrossRefPubMed
  70. ↵
    1. Turk-Browne NB,
    2. Scholl BJ,
    3. Chun MM,
    4. Johnson MK
    (2009) Neural evidence of statistical learning: efficient detection of visual regularities without awareness. J Cogn Neurosci 21:1934–1945. doi:10.1162/jocn.2009.21131 pmid:18823241
    OpenUrlCrossRefPubMed
  71. ↵
    1. Turk-Browne NB,
    2. Scholl BJ,
    3. Johnson MK,
    4. Chun MM
    (2010) Implicit perceptual anticipation triggered by statistical learning. J Neurosci 30:11177–11187. doi:10.1523/JNEUROSCI.0858-10.2010 pmid:20720125
    OpenUrlAbstract/FREE Full Text
  72. ↵
    1. van den Bos E,
    2. Poletiek FH
    (2008) Effects of grammar complexity on artificial grammar learning. Mem Cognit 36:1122–1131. doi:10.3758/MC.36.6.1122 pmid:18927030
    OpenUrlCrossRefPubMed
  73. ↵
    1. Woo CW,
    2. Krishnan A,
    3. Wager TD
    (2014) Cluster-extent based thresholding in fMRI analyses: pitfalls and recommendations. Neuroimage 91:412–419. doi:10.1016/j.neuroimage.2013.12.058 pmid:24412399
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 37 (35)
Journal of Neuroscience
Vol. 37, Issue 35
30 Aug 2017
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Advertising (PDF)
  • Ed Board (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Learning Predictive Statistics: Strategies and Brain Mechanisms
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Learning Predictive Statistics: Strategies and Brain Mechanisms
Rui Wang, Yuan Shen, Peter Tino, Andrew E. Welchman, Zoe Kourtzi
Journal of Neuroscience 30 August 2017, 37 (35) 8412-8427; DOI: 10.1523/JNEUROSCI.0144-17.2017

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Learning Predictive Statistics: Strategies and Brain Mechanisms
Rui Wang, Yuan Shen, Peter Tino, Andrew E. Welchman, Zoe Kourtzi
Journal of Neuroscience 30 August 2017, 37 (35) 8412-8427; DOI: 10.1523/JNEUROSCI.0144-17.2017
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • fMRI
  • learning
  • prediction
  • vision

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Change of spiny neuron structure in the basal ganglia song circuit and its regulation by miR-9 during song development
  • Increased neuronal expression of the early endosomal adaptor APPL1 replicates Alzheimer’s Disease-related endosomal and synaptic dysfunction with cholinergic neurodegeneration.
  • Presynaptic mu opioid receptors suppress the functional connectivity of ventral tegmental area dopaminergic neurons with aversion-related brain regions
Show more Research Articles

Behavioral/Cognitive

  • Change of spiny neuron structure in the basal ganglia song circuit and its regulation by miR-9 during song development
  • Increased neuronal expression of the early endosomal adaptor APPL1 replicates Alzheimer’s Disease-related endosomal and synaptic dysfunction with cholinergic neurodegeneration.
  • Presynaptic mu opioid receptors suppress the functional connectivity of ventral tegmental area dopaminergic neurons with aversion-related brain regions
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.