Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
  • EDITORIAL BOARD
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
  • SUBSCRIBE

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
  • EDITORIAL BOARD
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
  • SUBSCRIBE
PreviousNext
Research Articles, Behavioral/Cognitive

How the Level of Reward Awareness Changes the Computational and Electrophysiological Signatures of Reinforcement Learning

Camile M.C. Correa, Samuel Noorman, Jun Jiang, Stefano Palminteri, Michael X. Cohen, Maël Lebreton and Simon van Gaal
Journal of Neuroscience 28 November 2018, 38 (48) 10338-10348; DOI: https://doi.org/10.1523/JNEUROSCI.0457-18.2018
Camile M.C. Correa
1Department of Psychology, University of Amsterdam, 1018 WT, Amsterdam, The Netherlands,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Camile M.C. Correa
Samuel Noorman
1Department of Psychology, University of Amsterdam, 1018 WT, Amsterdam, The Netherlands,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Samuel Noorman
Jun Jiang
3Department of Basic Psychology, School of Psychology, Third Military Medical University, Chongqing, People's Republic of China,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jun Jiang
Stefano Palminteri
4Département d'Études Cognitives, École Normale Supérieure, 75005 Paris, France,
5Laboratoire de Neurosciences Cognitives, Institut National de la Santé et de la Recherche Médicale, 75005 Paris, France,
6Université de Recherche Paris Sciences et Lettres, 75006, Paris, France,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael X. Cohen
7Radboud University Medical Center, 6525 GA, Nijmegen, The Netherlands,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael X. Cohen
Maël Lebreton
2Amsterdam Brain and Cognition (ABC), University of Amsterdam, 1001 NK, Amsterdam, The Netherlands,
8Center for Research in Experimental Economics and Political Decision Making, Amsterdam School of Economics, University of Amsterdam, 1001 NJ Amsterdam, The Netherlands, and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Maël Lebreton
Simon van Gaal
1Department of Psychology, University of Amsterdam, 1018 WT, Amsterdam, The Netherlands,
2Amsterdam Brain and Cognition (ABC), University of Amsterdam, 1001 NK, Amsterdam, The Netherlands,
9Donders Institute for Brain, Cognition and Behavior, Radboud University Nijmegen, 6500 HE, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Simon van Gaal
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Article Figures & Data

Figures

  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1.

    Experimental setup and behavior. a, Two response options (white boxes on the left/right of fixation) were shown on the screen until a response was given. A correct response was rewarded with a 70% probability (50 cent coin) and not rewarded with a 30% probability (1 cent coin). Reward visibility was manipulated by masking. Unmasked (long coin presentation, short backward mask presentation) and masked (short coin presentation, long backward mask presentation) reward trials were mixed within blocks and randomly chosen across trials (each with a 50% probability). Which response option was most rewarded changed every 75–125 trials. b, The percentage of switches, at the group level (in black) and for individual subjects (in gray) after specific trials. M: masked; UM: unmasked; +: reward; −: no-reward; error bars represent ± s.e.m.

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2.

    Modeling approach. a, The computational architecture used to build the model space. b, Model space. Eighteen models were built by systematically combining the different options available for the different computational modules. c, Model identifiability analysis. Data from 32 synthetic participants were simulated with each of our 18 models. Bayesian model selection was used to identify the most probable model generating the data, using model exceedance probability. This procedure was repeated 50 times. Overall, all 18 models were correctly identified more than 90% of the time (>45 out of 50 simulations, see top confusion matrix), with an average exceedance probability > 90% (bottom confusion matrix). d, Parameter recovery analysis - general. Overall, data from 1600 synthetic participants (50 simulations × 32 individuals) were simulated with the full model (model 18). The 6 estimated parameters per participants were then regressed against the true parameters used for simulating the data. Results show very good identifiability, with regression intercepts (β0s) close to 0, regression slopes (β1s) close to 1 and highly significant (all p-values lower than Matlab's precision –i.e. reported as = 0). Each dot represents a synthetic individual. The black dotted lines represent the identity line, the red continuous lines the best linear fits, and the shaded grey areas the 95% confidence interval around the best-linear fit. The grey densities represent the probability distributions used to sample the parameters. e, Parameter recovery analysis – individual simulations. The confusion matrices represent summary statistics of the correlations between parameters, estimated over 32-subjects simulations, and averaged over the 50 simulations. Diagonal: correlations between simulated and estimated parameters. Off diagonal: cross correlation between estimated parameters. Top: Pearson correlation (R). Bottom: explained variance (R2).

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3.

    a, Time course of the learning task by three representative participants (participant numbers 10, 20 and 30). The x-axis represents blocks of trials during the experiment and the y-axis represents the local fraction of left-hand responses selected by the participant. Thick black and gray lines represent the reward probability in the different blocks (75–125 trials). Gray-dotted lines represent the local fraction of left-hand responses. Green thick line represent the local probability of left-hand responses predicted by the computational model. Both behavioral choices and model predictions are averaged over 12 trials bins, and aligned on block transitions. b, Model parameters for masked and unmasked conditions. Left: value weight. Middle: learning rate. Right: perseveration weight. M: masked reward, UM: unmasked reward. Histograms and error bars represent mean ± s.e.m. Connected dots represent individual parameters. c, Model comparison. Results of a Bayesian model comparison analysis on our participants′ data. White histograms indicate the exceedance probability of each model, and grey dots their expected frequencies. d, Relative BIC. Bayesian Information Criterion (BIC) of each model, compared to the best fitting model BIC (model 18). BICs are computed at the individual level (random effects). Histogram and error bars represent mean ± s.e.m.

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4.

    ERP results. ERPs for no-reward (red lines) and reward (green lines) for unmasked a, and masked conditions. b, Time = 0 ms is reward presentation. The lower dotted black lines indicate significant time-windows, FDR corrected across the entire ERP time-window (p < 0.05). Topographical distribution maps of the reward valence effect (no-reward minus reward, − vs +) were taken from the three broad time-windows (100–300 ms, 300–500 ms and 500–800 ms; scaling maps unmasked reward from left to right: [−2:2], [−5:5], [−2:2]; scaling maps masked reward: [−2:2]). Error bars represent ± s.e.m.

  • Figure 5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 5.

    Model-based EEG analysis. a, The time courses of regression weights of the signed PE regressed on the reward-locked EEG signal derived from a central ROI. Effects are plotted separately for unmasked (green) and masked (black) reward outcomes. Shaded areas indicate the s.e.m. Topographical maps show the regression weights during the relevant time windows. Both unmasked and masked reward showed early and mid-latency EEG-PE covariations which are shown in b. Note that the polarities of these components are reversed compared to the ERP results, which in accordance with our expectations, because these ERP modulations are all associated with negative PE values, leading to a reversal of the polarities (maps: 100–300 ms and 300–500 ms; scaling: early masked = [−0.5:0.5], mid-latency masked = [−0.5:0.5], early unmasked = [−1:1], middle unmasked = [−3:3]). Bar plots of the signed PE effect for the three time-windows of interest. b, The time courses of regression weights of the unsigned PE, or the level of surprise, regressed on reward-locked EEG signal derived from a central ROI. Both unmasked and masked rewards showed late EEG-surprise covariations (maps: 300–800 ms; scaling: masked = [−0.5:0.5], unmasked = [−2:2]). Bar plots of the surprise effect. c, The time courses of regression weights of switch/stay behavior regressed on the reward-locked EEG signal derived from a central ROI. Both unmasked and masked reward showed late EEG-switch/stay behavior covariations (maps: 300–800 ms; scaling: masked = [−3:3], unmasked = [−3:3]). Bar plots of the switch/stay behavior effect. Error bars represent ± s.e.m. M: masked reward, UM: unmasked reward.

Back to top

In this issue

The Journal of Neuroscience: 38 (48)
Journal of Neuroscience
Vol. 38, Issue 48
28 Nov 2018
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Advertising (PDF)
  • Ed Board (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
How the Level of Reward Awareness Changes the Computational and Electrophysiological Signatures of Reinforcement Learning
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
How the Level of Reward Awareness Changes the Computational and Electrophysiological Signatures of Reinforcement Learning
Camile M.C. Correa, Samuel Noorman, Jun Jiang, Stefano Palminteri, Michael X. Cohen, Maël Lebreton, Simon van Gaal
Journal of Neuroscience 28 November 2018, 38 (48) 10338-10348; DOI: 10.1523/JNEUROSCI.0457-18.2018

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
How the Level of Reward Awareness Changes the Computational and Electrophysiological Signatures of Reinforcement Learning
Camile M.C. Correa, Samuel Noorman, Jun Jiang, Stefano Palminteri, Michael X. Cohen, Maël Lebreton, Simon van Gaal
Journal of Neuroscience 28 November 2018, 38 (48) 10338-10348; DOI: 10.1523/JNEUROSCI.0457-18.2018
del.icio.us logo Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • consciousness
  • decision-making
  • prediction error
  • reinforcement learning

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Prostaglandin E2 induces long-lasting inhibition of noradrenergic neurons in the locus coeruleus and moderates the behavioral response to stressors
  • Detection of spatially-localized sounds is robust to saccades and concurrent eye movement-related eardrum oscillations (EMREOs)
  • Structural and functional development of inhibitory connections from the medial nucleus of the trapezoid body to the superior paraolivary nucleus
Show more Research Articles

Behavioral/Cognitive

  • Prostaglandin E2 induces long-lasting inhibition of noradrenergic neurons in the locus coeruleus and moderates the behavioral response to stressors
  • Detection of spatially-localized sounds is robust to saccades and concurrent eye movement-related eardrum oscillations (EMREOs)
  • Rewarding capacity of optogenetically activating a giant GABAergic central-brain interneuron in larval Drosophila
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Policy
  • Contact
(JNeurosci logo)
(SfN logo)

Copyright © 2023 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.