Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Articles, Systems/Circuits

Goal-Directed Decision Making with Spiking Neurons

Johannes Friedrich and Máté Lengyel
Journal of Neuroscience 3 February 2016, 36 (5) 1529-1546; https://doi.org/10.1523/JNEUROSCI.2854-15.2016
Johannes Friedrich
1Computational and Biological Learning Laboratory, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom, and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Máté Lengyel
1Computational and Biological Learning Laboratory, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom, and
2Department of Cognitive Science, Central European University, Budapest 1051, Hungary
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Máté Lengyel
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Article Figures & Data

Figures

  • Figure 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 1.

    Reinforcement learning benchmark tasks. A, Maze task (see Materials and Methods for details). B, Pendulum swing-up task (see Materials and Methods for details). C, Convergence of the dynamics toward an optimal policy representation with weights set according to the true environment. Values were computed based on spike counts up to the time indicated on the horizontal axis. Performance shows discounted average (±SEM) cumulative reward obtained by the policy based on these values, normalized such that random action selection corresponds to 0 and the optimal policy corresponds to 1. D, Learning the environmental model through synaptic plasticity. In each trial, first several randomly chosen state–action pairs were experienced and weights in the network were updated accordingly, then the dynamics of the network evolved for 1 s and its performance was measured as in C. E, Distributed representation of the continuous state space in the Pendulum task. Ellipses show 3 SD covariances of the Gaussian basis functions of individual neurons (for better visualization only, every second basis is shown along each axis). F, Activity of four representative neurons during planning. Color identifies the neurons' state-space basis functions as in E, and line style shows two different initial conditions (see inset for better magnification). G, Values of the preferred states of the neurons shown in F as represented by the network over the course of its dynamics. Although both initial state values (inset) and steady-state values coincide in the two examples shown (solid vs dashed lines), the interim dynamics differ because of different neural initial conditions (F, inset). H, Policy (colored areas) and state space trajectory (gray scale circles, temporally ordered from white to black) for pendulum swing-up with preset weights. I, Values actually realized by the network. J, True optimal values for the Pendulum task.

  • Figure 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 2.

    Two-step example task. A, The rat moving through the maze can choose the left (L) or right (R) arm at four decision points (states 0, 1, 2, and 3). Turning right in the first step (state 0) leads to a place where one of two doors opens randomly, indicated by the coin flip. The sizes of the cheeses indicate reward magnitudes (see also B). B, The decision graph corresponding to the task in A is a tree for this task. Numerical values indicate rewards (r) and transition probabilities (p) for nondeterministic actions. C, The corresponding neural network: action nodes in B are identified with neurons (colors). Lines indicate synaptic connections, with thickness and size scaled according to their strength. A constant external input (black) signals immediate reward. Synaptic efficacies are proportional to the transition probabilities or the (expected) reward. D, Voltage traces for two neurons in C. E, Spike trains of all neurons. The color code is the same as in C. F, Activity for rate neurons with random initial values. The color code is the same as in C. The line style indicates neurons coding for optimal (solid) and suboptimal (dashed) actions. G, The approximate values Ṽ, represented by the sum of the rates in F, converge to the optimal values (black dashed lines). Values of states 0–3 are shown from the bottom to top. The color code is the same as in B.

  • Figure 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 3.

    Time course of neural activity in a binary choice task. A, The task (top) consisting of a single state (s0) and two actions (A and B) associated with different values (which, in this case, were also their immediate rewards, rA and rB) and the corresponding neural network (bottom). B, Average population activity for offer value cells (dashed line) from the study by Padoa-Schioppa (2013) and simulation results (solid line). Trials were divided into three groups depending on the offer value (colors). C, Average population activity (dashed line) from the study by Roesch and Olson (2003) and model results (solid line). Trials were divided depending on whether the cell encoded the optimal action (blue) or not (purple) and on whether the reward was large (thick) or small (thin). The activity of the reward input used in the simulations is shown as a black curve in B and C with the corresponding y-axis plotted on the right side.

  • Figure 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 4.

    Value dependence of neural firing rates in a binary choice task in experiments (open green circles; adapted from Padoa-Schioppa and Assad, 2006) and simulations (filled blue circles). A, Neuron encoding offer value of option B. One unit of juice A was worth 2.2 units of juice B. B, Neuron encoding chosen value, 1A = 2.5B. Error bars show SEM and were often smaller than the symbols.

  • Figure 5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 5.

    Psychometric and chronometric curves in a binary decision-making task. A, B, Choice probabilities in experiments (open green squares; Padoa-Schioppa and Assad, 2006) and simulations (filled blue squares) for two different relative values of the two juices: 1A = 2.2B (A) and 1A = 2.5B (B). C, Difference between the cumulative spike counts of populations representing the two potential choices in the model. Accumulation starts with sensory delay (dashed line; compare input onset in Fig. 3B). When a threshold (red line) is reached, a decision is made. Colors indicate different value ratios as in D. D, Decision time distributions in the model. Right, Dependence of raw decision times on the value ratio (colored Tukey's boxplots) and their overall distribution across all value ratios (gray histogram). Left, Normalizing function (solid blue line), together with a logarithmic fit (dashed black line), which transforms the raw decision time distribution into a standard normal distribution (gray histogram). E, Normalized reaction times (±SEM) as a function of value ratio in experiments (open green squares; Padoa-Schioppa and Assad, 2006) and simulations (filled blue squares). Lines show least squares fits (dotted green, experiments; solid blue, simulations); the inset shows distribution of residuals after fitting (green bars, experiments; blue bars, simulations).

  • Figure 6.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 6.

    Sequential decision making. A, An example neuron in pre-SMA showing activity modulated by the NRMs (colored lines; Sohn and Lee, 2007): amplitude decreases and delay increases with NRM. The inset shows task structure: colored circles indicate states (numbers show NRMs), arrows show state transitions (colored lines, correct action; black lines, incorrect action), and the gray square represents terminal state with reward (modeled as r = 1). B, Activity time courses of an example model neuron as a function of NRMs. The color code is the same as in A. The black line shows activity of the reward input chosen to fit experimental data. C, Activity time courses of an example model neuron as a function of the number of available actions (1 correct, others incorrect) in the state with NRM = 3. D, Experimental (open green squares; Sohn and Lee, 2007) and simulated (filled blue squares) reaction times increased approximately linearly with NRMs. Error bars (SEM) are all smaller than the symbols. E, F, Predictions of planning-as-inference for neural time courses as a function of NRMs (E) and number of available actions (F). The color code is the same as in B and C.

  • Figure 7.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 7.

    Predictions for a novel sequential decision-making task. A, Task structure with rewards in two distinct steps; symbols are as in Figure 6A (inset). B, Simulation results for the suggested task with added intermediate reward. The color code and activity of the reward input are as in Figure 6B. C, Reaction times (blue squares) and peak firing rates (purple circles) from the simulations in B vary nonmonotonically with NRM. Error bars (SEM) are often smaller than the symbols.

  • Figure 8.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Figure 8.

    Reinforcer devaluation. A, The rat moving through the maze can turn left or right at three decision points (states 0, 1, and 2; colored numbers). The numbers above the terminal positions indicate the corresponding rewards. Devaluation decreases the reward associated with cheese (top left) from a baseline level of 4 to a devalued level of 2. [Adapted from Niv et al. (2006).] B, Simulated firing rates with baseline reward values. Colors indicate the state–action pair encoded by each cell, following the color scheme in A. The activity of the reward input (black) is as in Figure 6B. C, D, Choice probabilities (C) and reaction times (D; ±SEM) in each state. E, Activity profile for a spreading-activation model (darker means increasing activity). The path of an agent following the activity gradient (green) yields only a reward of 3 instead of the optimal 4. F–H, Same as in B–D following devaluation in our model. Note the change in the choice at the initial decision point (state 0).

Back to top

In this issue

The Journal of Neuroscience: 36 (5)
Journal of Neuroscience
Vol. 36, Issue 5
3 Feb 2016
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Advertising (PDF)
  • Ed Board (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Goal-Directed Decision Making with Spiking Neurons
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Goal-Directed Decision Making with Spiking Neurons
Johannes Friedrich, Máté Lengyel
Journal of Neuroscience 3 February 2016, 36 (5) 1529-1546; DOI: 10.1523/JNEUROSCI.2854-15.2016

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Goal-Directed Decision Making with Spiking Neurons
Johannes Friedrich, Máté Lengyel
Journal of Neuroscience 3 February 2016, 36 (5) 1529-1546; DOI: 10.1523/JNEUROSCI.2854-15.2016
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Notes
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • computational modeling
  • decision making
  • neuroeconomics
  • planning
  • reinforcement learning
  • spiking neurons

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Articles

  • Memory Retrieval Has a Dynamic Influence on the Maintenance Mechanisms That Are Sensitive to ζ-Inhibitory Peptide (ZIP)
  • Neurophysiological Evidence for a Cortical Contribution to the Wakefulness-Related Drive to Breathe Explaining Hypocapnia-Resistant Ventilation in Humans
  • Monomeric Alpha-Synuclein Exerts a Physiological Role on Brain ATP Synthase
Show more Articles

Systems/Circuits

  • Presynaptic mu opioid receptors suppress the functional connectivity of ventral tegmental area dopaminergic neurons with aversion-related brain regions
  • V2b neurons act via multiple targets to produce in phase inhibition during locomotion
  • Specializations in Amygdalar and Hippocampal Innervation of the Primate Nucleus Accumbens Shell
Show more Systems/Circuits
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.