WWW.JNEUROSCI.ORG
-
The Journal of Neuroscience
 QUICK SEARCH:   [advanced]


     
-


HOME
  |  
SEARCH  |   ARCHIVE  |   SUBSCRIBE  |   CONTACT  |   HELP

The Journal of Neuroscience, October 28, 2009, 29(43):13524-13531; doi:10.1523/JNEUROSCI.2469-09.2009

This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow Submit an eLetter
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Google Scholar
Right arrow Articles by Gershman, S. J.
Right arrow Articles by Daw, N. D.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gershman, S. J.
Right arrow Articles by Daw, N. D.

 Previous Article  |  Next Article 

Behavioral/Systems/Cognitive
Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values

Samuel J. Gershman,1 Bijan Pesaran,1 and Nathaniel D. Daw1,2

1Center for Neural Science, 2Department of Psychology, New York University, New York, New York 10003

Correspondence should be addressed to Samuel J. Gershman at his present address: Department of Psychology, Princeton University, Princeton, NJ 08540. Email: sjgershm{at}princeton.edu

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning—such as prediction error signals for action valuation associated with dopamine and the striatum—can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.


Received May 27, 2009; revised Sept. 3, 2009; accepted Sept. 21, 2009.

Correspondence should be addressed to Samuel J. Gershman at his present address: Department of Psychology, Princeton University, Princeton, NJ 08540. Email: sjgershm{at}princeton.edu






-
-

Home  |   Search  |   Archive  |   Subscribe  |   Contact  |   Help

-
Copyright 2009 by Society for Neuroscience ONLINE ISSN: 1529-2401
-