Abstract
Recent advances in microscopy, genetics, physiology, and data processing have expanded the scope and accelerated the pace of discovery in visual neuroscience. However, the pace of discovery and the ever increasing number of published articles can present a serious issue for both trainees and senior scientists alike: with each passing year the fog of progress thickens, making it easy to lose sight of important earlier advances. As part of this special issue of the Journal of Neuroscience commemorating the 50th anniversary of SfN, here, we provide a variation on Stephen Kuffler's Oldies but Goodies classic reading list, with the hope that by looking back at highlights in the field of visual neuroscience we can better define remaining gaps in our knowledge and thus guide future work. We also hope that this article can serve as a resource that will aid those new to the field to find their bearings.
Introduction
“From a series of well-chosen articles one gains more than perspective: one sees vividly how advances come about…Decisive experiments not only create new knowledge but they also significantly advance a field by creating new and higher standards of acceptable evidence. This in turn forces workers into clearer thinking and experimenting…One measure of the success of the studies…is that they enable us to define more precisely areas of ignorance, and this should help the search for new experimental solutions.”
Stephen Kuffler, Oldies but Goodies
As researchers, which problems should we study? In the total absence of knowledge, asking nearly any question can lead to fruitful answers. However, in an age of information saturation it becomes increasingly difficult to gain enough knowledge on a given topic to be able to step back and identify key questions that remain unresolved. Around 200 years ago, Thomas Young—the polymath often described as “the last man who knew everything”—could likely remember every fact and experiment related to a subject and apply this knowledge to formulate new theories and tests. Attempting such a feat today is much more daunting. Nonetheless, by carefully taking note of transformative work in a given field we can more clearly identify remaining unknowns. Here we endeavor toward such a goal by compiling a community consensus list of 25 of the most important articles in the field of visual neuroscience, with a focus on the neurons and circuits underlying visual perception (Fig. 1; see Materials and Methods). Below, the 25 articles are grouped into thematic sections. For each section, we indicate the specific article(s) from the top 25 list being described, provide historical context, highlight the discoveries, consider the impact on the field, and outline remaining questions.
The early mammalian visual system, from retina to cortex. For brevity, in the legend we only point out features of the figure most relevant to the current article. A drawing by Ramon y Cajal from a rodent, showing retinal circuitry (A), retinal projections (B) to lateral geniculate nucleus (D) and superior colliculus (E). Thalamic neurons then project (K) to visual cortex (G), which in turn projects to higher cortical areas (N). From Cajal, circa 1902. Courtesy of the Cajal Institute, Cajal Legacy, Spanish National Research Council (CSIC), Madrid, Spain.
Figure 1-1
Visual threshold and single-photon responses
Hecht et al. (1942) Energy, quanta and vision
Baylor et al. (1979) Responses of retinal rods to single photons
In the mid-1870s, Franz Boll identified a substance in the frog retina that bleached when exposed to light—he termed it “visual purple.” What he had discovered was rhodopsin (reviewed in Hubbard, 1976). By the middle of the 20th century, it was well established that rods underlie night vision and cones underlie daytime vision, but how each cell was able to convert photons into electrochemical signals was unclear. Selig Hecht and colleagues addressed the question of photoreceptor sensitivity in a clever and groundbreaking study (Hecht et al., 1942). They directed flashes of light of varying luminance into the eye of a subject who was sitting in complete darkness and measured detection performance. By carefully accounting for the stimulus intensity, and reflective and absorptive losses, they made an incredible discovery: a visual stimulus can be detected when it is comprised of as few as five photons. Remarkably, since their visual stimulus encompassed an area of ∼500 rods, it was nearly impossible for any single rod to have absorbed more than one photon. However, what did the amplitude and kinetics of the photon-evoked signal look like, and how likely was it for a single photon response to occur? The answers would have to wait until the development of patch-clamp technology, which Denis Baylor and colleagues applied to record rod photocurrents (Baylor et al., 1979). By recording from rod outer segments in toad retina using suction electrodes while flashing a very dim spot of light, they measured responses exhibiting binary amplitudes, which they defined as successes and failures, with the successes representing the single photon responses predicted by Hecht et al., 1942. The effort to identify the molecular machinery underlying this exquisite light sensitivity would be undertaken by many laboratories, including those of George Wald and Lubert Stryer (Wald, 1968; Stryer, 1987; Dowling, 1997). More recently, it has become clear that dysfunction of the phototransduction machinery underlies many retinal dystrophies (Ferrari et al., 2011). Correcting these mutations with gene therapy and genome-editing tools offers an exciting avenue to interrupt or reverse many blinding disorders (Hohman, 2017; Russell et al., 2017).
Center-surround receptive fields
Kuffler (1953) Discharge patterns and functional organization of mammalian retina
Barlow (1953) Summation and inhibition in the frog's retina
Adrian and Matthews (1927) recorded the first light responses from the optic nerve in the mid-1920s, and Hartline (1938) went on to describe receptive fields, as well as ON, OFF, and ON-OFF responses. Hartline (1938) found that many ganglion cells had receptive fields large enough to overlap with those of their closely spaced neighbors, an arrangement that could blur the image that is focused onto the retina. New techniques permitted both Stephen Kuffler and Horace Barlow to address this issue with significantly higher precision than in earlier studies. Kuffler and S.A. Talbot developed an ophthalmoscope that focused spots of light of various sizes onto different portions of the retinal surface in anesthetized cats (Talbot and Kuffler, 1952). Kuffler discovered that retinal ganglion cells had a central region that excited the neuron and a surrounding region that antagonized the center (Kuffler, 1953). Similar experiments were performed by Barlow (working in Adrian's laboratory), who built a custom stimulator that enabled independent control of the light intensity of a spot and its background. He found that background illumination antagonized center responses (Barlow, 1953). Both studies showed that center and surround had similar stimulus selectivity: ON-center cells could be inhibited by a large bright spot; the opposite was true of OFF-center neurons. As such, a high-contrast edge that encompasses the receptive field center of a ganglion cell also encompasses the surround of its neighbors, thereby sharpening the population response to incoming images. These findings would inspire and inform many of the articles included in our list, and center-surround receptive fields remain a textbook example of how the retina encodes visual scenes.
Linear and nonlinear responses
Enroth-Cugell and Robson (1966) The contrast sensitivity of retinal ganglion cells of the cat
By the mid-1960s, the response of a ganglion cell to a spot of light placed at an arbitrary position over the retina could be predicted with reasonable accuracy. Robert Rodieck had modeled receptive fields as the sum of two Gaussian curves: a positive, narrow, large-amplitude curve for the center, and a negative, wide, low-amplitude curve for the surround (Rodieck, 1965; Rodieck and Stone, 1965). A spot of light placed at a given location over the receptive field thus generated a response based on the sum of these two curves. Could the same model predict responses when different visual stimuli simultaneously activated different portions of the receptive field? Working with television sets at RCA, Otto Schade generated sinusoids of varying spatial frequency and contrast, and proceeded to use these stimuli to test human visual perception (Schade, 1956). Could his psychophysical results be explained by retinal ganglion cell receptive field structures? Enroth-Cugell and Robson decided to test whether ganglion cell responses to Schade's stimuli could be predicted by Rodieck's receptive field model (Enroth-Cugell and Robson, 1966). For instance, would a sinusoid that half-darkens and half-illuminates a ganglion cell receptive field result in no response? Surprisingly, Enroth-Cugell and Robson found that only a subset of ganglion cells (termed X cells) followed this prediction. In contrast, many neurons (termed Y cells) behaved unexpectedly: when adapted to a constant gray background and then stimulated with a sinusoidal grating that went from black to white over the receptive field, Y cells responded vigorously to both the appearance and disappearance of the stimulus. Next, they examined responses when sinusoids of varying spatial frequencies and contrasts were passed over receptive fields of X and Y cells. Enroth-Cugell and Robson concluded that Y cells do not respond to the linear sum of luminance signals across their receptive field. Instead, these neurons sum nonlinearly from receptive field subunits—for intance, so that an ON input to a portion of the receptive field and and OFF input to another portion of the receptive field do not simply cancel each other out. Future work would show that these different cell types appear to form separate but parallel visual streams in the brain (Shapley and Hochstein, 1975; Sherman et al., 1976), though how such X- and Y-cell definitions relate to the increasing number of functionally defined retinal ganglion cell types (Baden et al., 2016) remains unclear. Meanwhile, the field has moved increasingly into studying responses generated by naturalistic scenes (Felsen and Dan, 2005). While there is debate over the ability of such stimuli to reveal coding principles (Rust and Movshon, 2005), there is increasing interest in developing algorithms that endow machines with the impressive abilities our visual system possesses for interpreting natural scenes (Cox and Dean, 2014; Hassabis et al., 2017).
Feature detectors
Lettvin et al. (1959) What the frog's eye tells the frog's brain
Ölveczky et al. (2003) Segregation of object and background motion in the retina
To Jerome Lettvin, the predictions of the center-surround model felt incomplete (Lettvin, 1999): the properties of a visual stimulus included not only its luminance and size, but also features such as its shape, curvature, contrast, and motion. Could retinal ganglion cells encode such additional features? Lettvin and colleagues hypothesized that one reason such diverse ganglion cell responses had not been observed might relate to biased sampling—they noted that most recordings came from large ganglion cells bearing myelinated axons, whereas unmyelinated ones were more numerous and harder to target (Lettvin et al., 1959, 1960; Lettvin, 1999). They thus developed methods to record from these unmyelinated fibers while presenting frogs with images of spots, flies, geometric objects, and looming figures. Their hunch would turn out correct—they identified at least five different kinds of ganglion cell responses, each tuned to a specific aspect of the visual scene (Lettvin et al., 1959). Interestingly, while their article would be highly cited, this line of inquiry (retinal feature detectors) would soon fade from mainstream research for several decades. Slowly, increasing anatomical evidence would begin to reveal the staggering diversity of retinal cell types (Masland, 2001, 2012), which was at odds with the view of the retina as a system that predominantly adjusts the brightness and sharpness of incoming images through linear and nonlinear center-surround filters. The search for function in this growing forest of cell types reawakened research into feature detectors. One captivating example, from Ölveczky and colleagues, describes a ganglion cell that displays the remarkable ability to distinguish object from background motion (Ölveczky et al., 2003). They presented the retina with two gratings superimposed on one another: the object grating was a small circular patch; the background grating filled the remainder of the stimulus area. By moving object and background gratings with varying degrees of coherence, they discovered ganglion cells that respond selectively to object motion that differs from background motion. The intervening time since this study has seen the catalog of retinal ganglion cell types expand to ∼45, each bearing a unique trigger feature, molecular identity, morphology, and, often, central projection. How the brain uses these lower-level retinal features is still not well understood.
Orientation selectivity
Hubel and Wiesel (1959) Receptive fields of single neurons in the cat's striate cortex
Hubel and Wiesel (1962) Receptive fields, binocular interaction and functional architecture in the cat's visual cortex
Visual cortex was localized in the 19th century, based on clinical and experimental lesion studies (Gross, 1998; Finger, 2001). By the middle of the 20th century, it remained unclear how the cortex processed visual stimuli at the level of single neurons. The story of visual feature detectors in the cortex begins in September 1959, when David Hubel (Hubel, 1959) published the first example of a putative direction-selective neuron in visual cortex (in a non-head-fixed cat, with the cell responding to the back and forth movement of his arm, no less!). However, it would be two subsequent studies with Torsten Wiesel in head-fixed anesthetized cats that would change the course of visual neuroscience research (Hubel and Wiesel, 1959, 1962). Here they showed the presence of orientation- and direction-selective responses in primary visual cortex, described simple and complex cells, characterized orientation columns, and proposed a model for orientation selectivity. These “line detectors” provided early clues about the brain's strategy for encoding the visual world. Poetically, there is even a legendary anecdote about their seminal discovery: “Suddenly, just as we inserted one of our glass slides into the ophthalmoscope, the cell seemed to come to life and began to fire impulses like a machine gun. It took a while to discover that the firing had nothing to do with the small opaque spot – the cell was responding to the fine moving shadow cast by the edge of the glass slide” (Hubel and Wiesel, 2005). Follow-up work would reveal that orientation columns are arranged in pinwheel patterns, turning Hubel and Wiesel's original finding into modern art (Blasdel and Salama, 1986; Bonhoeffer and Grinvald, 1991). Furthermore, their model for orientation selectivity in V1—that a cortical neuron receives inputs from several center-surround lateral geniculate nucleus (LGN) neurons whose receptive fields are offset along a particular axis—would be validated as at least one of the ways that cortex generates orientation selectivity (Chapman et al., 1991; Reid and Alonso, 1995; Ferster et al., 1996). In the coming years, research will need to reconcile the view of V1 as primarily a collection of orientation filters, with recent findings indicating that V1 can also encode a variety of other features, such as learning (Khan et al., 2018), subjective spatial position (Saleem et al., 2018), locomotion (Niell and Stryker, 2010), reward timing (Shuler and Bear, 2006), and prediction (Keller et al., 2012; Gavornik and Bear, 2014) - though it should be noted that to date most of these additional findings have only been described in rodents and have yet to be replicated in primate models.
Direction selectivity in the retina
Barlow and Levick (1965) The mechanisms of directionally selective units in the rabbit's retina
Following close on the heels of Hubel and Wiesel's work in cortex, directionally selective responses were soon described anew in retinal ganglion cells of both rabbit (Barlow and Hill, 1963; Barlow et al., 1964) and pigeon (Maturana and Frenk, 1963). A subsequent article from Barlow and Levick featured refined experiments, analyses, and modeling (Barlow and Levick, 1965). It described with impressive clarity that direction selectivity (DS) is computed over small subunits of the receptive field, and outlined how DS mechanisms could be fooled into signaling motion via presentation of temporally offset paired static stimuli. They proposed that direction selectivity arose due to inhibition during null direction, but not preferred direction, motion. It would take many years of research to validate this model and identify the circuitry underlying this computation: the identification of cholinergic cells in the retina (Masland and Mills, 1979); the characterization of cholinergic cells as starburst amacrine cells (Famiglietti, 1983); the finding that starburst amacrine cells intrinsically generate directional signals (Euler et al., 2002); and the discovery that starburst amacrine cells asymmetrically connect to DS ganglion cells and provide null direction inhibition (Fried et al., 2002; Briggman et al., 2011). DS ganglion cells then would be split into different flavors (Oyster and Barlow, 1967; Kim et al., 2008). While ON DS cells control visual field stabilization reflexes (Oyster et al., 1972), a role for ON-OFF DS cells in behavior remains unclear. Further, while most work in this field has been performed in rabbit or mouse, to what extent DS ganglion cells are present in nonhuman primate or human retina is unclear.
Diverse retinal cell types, organization, and responses
Werblin and Dowling (1969) Organization of retina of the mudpuppy, Necturus maculosus. II. Intracellular recordings
Slaughter and Miller (1981) 2-Amino-4-phophonobutyric acid: a new pharmacological tool for retina research
Wässle et al. (1981) Dendritic territories of cat retinal ganglion cells
Berson et al. (2002) Phototransduction by retinal ganglion cells that set the circadian clock
Following a lecture by John Dowling at Johns Hopkins University in the 1960s, a graduate student who had trained as an electrical engineer came to Dowling's office to ask about building a theoretical model of the retina. Dowling told the student, Frank Werblin, that too little was known about electrical responses of retinal neurons to generate such a model (Dowling, 2018). Instead, they would go on to systematically characterize the light responses of all five major retinal neuron classes (photoreceptors, horizontal cells, bipolar cells, amacrine cells, ganglion cells), find evidence for center-surround responses in bipolar cells, outline a diversity of amacrine cell responses, and provide the first evidence that ON and OFF signals were generated in bipolar cells (Werblin and Dowling, 1969). How parallel ON and OFF visual channels arise would remain a mystery until the discovery of a pharmacological agonist (known as L-AP4 or APB) of the ON retinal system (Slaughter and Miller, 1981). These discoveries would enable several additional advances: the discovery that mGluR6 glutamate receptors mediate ON bipolar cell responses (Masu et al., 1995); the deconstruction of complex ganglion cell responses into various combinations of ON versus OFF inputs (Roska and Werblin, 2001); and the search to understand why parallel ON and OFF visual channels exist (Schiller et al., 1986). Increased knowledge of the “vertical” signaling pathway in the retina (photoreceptor→bipolar cell→ganglion cell) revealed our ignorance about how these vertical elements repeat laterally across the retina to cover the entire visual field. Advances in dye filling and tissue imaging revealed a striking repetition of circuit motifs across the retina—for a given ganglion cell type, this involves a tiling interaction between their dendritic arbors and regular spacing of their somata into a mosaic (Wässle et al., 1981). Finally, as the diversity, connectivity, and parallel processing capabilities of the retina were coming into focus, new findings threw a wrench into our understanding of how light responses were generated. This story begins in the 1920s, when Clyde Keeler characterized the maintenance of light-evoked pupillary reflexes in otherwise blind mice (Keeler, 1927), but it would take several decades until retrograde tracing experiments from the suprachiasmatic nucleus identified the third retinal photoreceptor: intrinsically photosensitive retinal ganglion cells (Berson et al., 2002), which express melanopsin (Hattar et al., 2002). While these cells are important for many reflexive and “subconscious” forms of vision, signals from these ganglion cells also appear to make their way to visual cortex (Dacey et al., 2005). More recently, advances in electrophysiology, functional imaging, and genetics have allowed researchers to begin closing in on obtaining a detailed characterization of the light responses of every single retinal cell type to a diverse set of visual stimuli (Baden et al., 2016), potentially putting a comprehensive theoretical model of retinal function within grasp.
The link between eye and brain
Reid and Alonso (1995) Specific monosynaptic connections from thalamus to visual cortex
While some early studies suggested that the LGN's role was primarily to simply relay retinal signals to visual cortex (Hubel and Wiesel, 1962), subsequent work has shown that it is far more than a simple relay. (Sherman and Koch, 1986; Sherman and Guillery, 1996, 2014; Sherman, 2006). First, individual LGN relay cells receive convergent inputs from several retinal ganglion cells (Cleland et al., 1971; Chen and Regehr, 2000). Second, relay cells can switch between tonic and bursting modes (Steriade and Llinás, 1988; Guido and Weyand, 1995) and are highly sensitive to the precise timing of their retinal inputs, making them ideally suited to filter the visual information sent to cortex (Usrey et al., 1998). Third, relay cells do not simply receive inputs from the retina and send outputs to V1, but also receive an enormous amount of feedback from diverse areas (Erişir et al., 1997). Fourth, axons from multiple relay cells can converge onto individual postsynaptic neurons in V1 with spectacular specificity, generating novel feature-selective responses in the postsynaptic target. An exquisite example of this last property comes from Reid and Alonso (1995). Making heroic paired recordings in thalamus and cortex, and using the relative spike timing between pairs of neurons in these two areas to identify putatively monosynaptically connected pairs, they were able to show that the elongated, orientation-selective receptive fields of V1 neurons can arise from the convergence of specifically aligned thalamic neurons with center-surround receptive fields, thus providing experimental validation to the model of Hubel and Wiesel (1962). More recent work has shown that similar precision in the convergence of specific thalamic inputs onto individual V1 neurons appears to underlie some forms of direction selectivity found in mouse visual cortex (Lien and Scanziani, 2018). Still, it remains unclear whether novel visual feature detectors arise de novo in the LGN itself as a result of convergent inputs from different retinal cells types. Furthermore, little is known about the way the LGN uses its massive feedback to filter visual signals being sent to cortex, though some studies suggest that the effect of such filtering may be powerful (Wimmer et al., 2015).
Wiring the visual system
Wiesel and Hubel (1963) Single-cell responses in striate cortex of kittens deprived of vision in one eye
Meister et al. (1991) Synchronous bursts of action potentials in ganglion cells of the developing mammalian retina
Early anecdotal evidence related to vision restoration following long-term vision loss suggested that neuronal activity might play an important role in visual system development (Hebb, 1949). Indeed, subsequent monocular deprivation studies in developing animals supported this idea (Riesen et al., 1953). The cellular basis for this phenomenon remained a mystery, but would be clarified by a study into the effect of early monocular deprivation on V1 responses by Wiesel and Hubel (1963). Here, they showed that monocular deprivation following birth caused cortical neurons to become unresponsive to inputs from the deprived eye. This deficit did not occur when deprivation was performed later in life, indicative of an early critical period for visual development, which later studies would leverage to develop clinical interventions for children with strabismus (Hensch and Quinlan, 2018). Next, interesting work in frogs, in which a third eye was ectopically implanted during development, revealed that the ectopic eye exhibited a segratated projection area, thus showing the robustness of eye-specific segregation in the brain (Constantine-Paton and Law, 1978). Later, studies that silenced retinal activity with tetrodotoxin before eye opening and resulted in altered eye-specific segregation of retino-thalamic projections suggested that spontaneous activity before eye opening was also important for visual circuit development (Shatz and Stryker, 1988; Sretavan et al., 1988). Correlated activity was soon described between pairs of newborn retinal ganglion cells (Maffei and Galli-Resta, 1990), but how such seemingly random patterns of activity could affect the arrangement of eye-specific axonal projection patterns in LGN and V1 seen upon eye opening was unclear. It would take a new way of recording from neurons, a multielectrode array, to reveal the beauty of the signal of the developing retina. Working out of Stanford University, Markus Meister and colleagues would discover wave-like patterns of activity propagating across the developing retina (Meister et al., 1991). This finding provided a potential way for the brain to identify axons from adjacent ganglion cells from the same eye using correlated firing patterns (i.e., spatial location would be mapped via a temporal sequence of action potentials; McLaughlin et al., 2003; Ackman et al., 2012). Subsequent studies would complicate this story, showing that ocular dominance columns are present earlier than had previously been appreciated, and can form to some extent despite early binocular enucleation (Crowley and Katz, 1999, 2000). Thus, it appears that nature and nurture interact in complex ways to refine visual system development. Still many of the molecular factors that help to establish this early wiring remain at large.
Coding in the visual system
Barlow (1961) Possible principles underlying the transformations of sensory messages
Barlow (1972) Single units and sensation: a neuron doctrine for perceptual psychology?
Olshausen and Field (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images
How does the brain generate sight? For the visual system, these types of questions were being asked with increasing frequency by the early 1960s due to a growing consensus that the retinal message encodes behaviorally relevant “trigger” features (Kuffler, 1953; Barlow, 1953; Lettvin et al., 1959). However, in the real world, visual inputs constantly bombard the whole retinal surface with time-varying combinations of both trigger and nontrigger features, resulting in a retinal output that presents a difficult problem for the brain to interpret. How does the brain sort out these diverse and numerous retinal signals in terms of behavioral importance? In 1961, Horace Barlow, borrowing ideas from information theory, provided an early and influential framework addressing this idea (Barlow, 1961; see also Attneave, 1954). Barlow envisioned the visual system as an accountant, apportioning as few spikes across as few fibers (optic nerve fibers, for example) as possible to encode a stimulus. From this premise, he elaborated a scenario in which the most common firing pattern of a pair of input neurons, A and B, should silence a pair of output neurons, X and Y, whereas the least common input firing pattern should drive both output cells. Thus, the firing pattern of X and Y is a rank ordering of the rarity of a stimulus, theoretically simplifying the brain's search through incoming stimuli to find the most relevant (i.e., least redundant) for behavior. Such transformations, which he termed passwords, would predict that the visual system should have evolved filters for extracting informative statistics from the natural world. This would influence theories on efficient coding and stimulate studies of natural statistics of visual scenes and how these are coded by the visual system (Atick and Redlich, 1992). Next, incorporating insights from single-unit recordings from inferior temporal (IT) cortex (Gross et al., 1969), in his 1972 article, Barlow (1972) built upon his earlier efficient coding tome to frame the brain as a hierarchical reducer of information, in which sparseness increases at each level in the hierarchy, until we arrive at “grandmother cells,” (hypothetical neurons at the top of the hierarchy that represent specific concepts or objects) a term he took from Lettvin (Gross, 2002). Following up on these ideas, Olshausen and Field (1996) were able to develop a learning algorithm that, when trained on natural images, evolved filters that bore a striking resemblance to the orientation-selective V1 receptive fields first described by Hubel and Wiesel. Importantly, Olshausen and Field discovered that sparseness was central to the ability of their filters to decorrelate features of natural images, thus providing higher visual areas with a more efficient (and less redundant) signal. Though influential, work over the last decades merging modeling and neural recordings from different areas of the visual system have highlighted some of the limitations of efficient coding and redundancy reduction as theories that fully explain the neuronal code (Barlow, 2001; Rust and DiCarlo, 2012). Additionally, with the snowballing ability to record from larger and larger numbers of neurons, the field has been able to focus increasingly on population coding, which adds another layer of complexity to deciphering the neuronal code (Keemink and Machens, 2019). How such population codes relate to the increasingly narrow definitions that are arising for distinct cell types and cell type-specific circuits is unclear. Finally, how any neuronal code is read out by downstream “decoders” and converted into perception remains to be elucidated.
Two ways to see
Ungerleider and Mishkin (1982) Two cortical visual systems
In the 1950s and 1960s, different visual deficits began to be regularly described for patients with temporal versus parietal cortex lesions (Newcombe and Russell, 1969): temporal lesions often resulted in impaired visual recognition; parietal lesions tended to produce visual spatial impairments. This led to theories of dichotomous visual pathways for processing stimulus location and identity (Schneider, 1967, 1969; Trevarthen, 1968), which were then experimentally validated in a nonhuman primate lesion study (Pohl, 1973). However, it remained unclear how different visual information was relayed to temporal versus dorsal cortex (for instance, Schneider postulated that spatial visual information was processed via the retinal–tectal pathway). To address this issue, Ungerleider and Mishkin (1982) incorporated recent findings with their own tracing and lesion experiments. Importantly, they fully identified all extrastriate (or “prestriate”) cortical areas, carefully lesioned them, and looked for deficits. They found that V1 projected to a much larger extrastriate area than had previously been appreciated, and fully lesioning it resulted in deficits in spatial visual perception. This allowed them to formalize the theory that two different cortical visual streams leave V1: a dorsal pathway primarily concerned with recognition and a ventral pathway primarily concerned with spatial location (these would become known as the “what” and “where” pathways). This theory has been modified by others who suggest that these two pathways may be better defined as being important for perception (ventral) versus action (dorsal; Goodale and Milner, 1992). The segregation of these two streams has been complicated since its first description, with the discovery of lateral connectivity between the two pathways (Felleman and Van Essen, 1991) and the presence of ventral-like information in dorsal areas, and vice versa (Sereno and Maunsell, 1998; Freud et al., 2016). This theory nonetheless remains as a cornerstone of how we understand the visual system, though much remains to be clarified regarding how these two pathways interact, and how feedback circuits modify these feedforward pathways.
Face cells
Kanwisher et al. (1997) The fusiform face area: a module in human extrastriate cortex specialized for face perception
Faces are among the earliest, most intimate and important stimuli encountered by the human visual system. How does the visual system build a detector that cannot only find a familiar face in a crowd, but find many familiar faces in a crowd? The search for face cells arose in part out of interest in Levick's grandmother cells (Barlow, 1972; Gross, 2002) and Konorski's “gnostic cells” (Konorski, 1967; Gross, 2005) - both variations on the idea that there might be high-order visual neurons that encode specific visual objects. Charles Gross, who once worked down the hall from Levick and had visited Konorski's laboratory, was trying to understand what stimuli activate IT cortex. Gross et al. (1969) would first show the existence of visual “hand cells,” providing a compelling update to the hierarchical buildup of complex feature detectors first posited by Hubel and Wiesel. In a follow-up article, Gross et al. (1972) would make the first mention of possible “face cells,” though it would take several more years until they provided more thorough evidence (Bruce et al., 1981; Desimone et al., 1984). Building on this work in nonhuman primates, face selectivity was subsequently shown in human cortex (Haxby et al., 1991; Ojemann et al., 1992). A major milestone came when Nancy Kanwisher and colleagues applied functional magnetic resonance imaging (fMRI) to interrogate neuronal responses to faces in a more systematic manner (Kanwisher et al., 1997). By comparing responses to faces and scrambled faces, they ruled out luminance selectivity; by comparing hands and faces, they established a specificity for faces rather than a general selectivity for body parts; by using hands, faces, and objects in a matching task, they forced attentional recruitment and showed that only faces evoked responses in a small area of IT, termed the fusiform face area. This work opened the door for more precise single-unit recordings, which used fMRI to predefine regions to subsequently study at single-cell resolution (Tsao et al., 2006). More recently, remarkable artificial intelligence-assisted image evolution experiments have corroborated that face cells indeed like to see faces (Ponce et al., 2019). Such progress raises new questions—for instance, how does the brain learn to recognize a face (note that face domains are not present in animals raised without seeing faces; Arcaro et al., 2017), and how is the brain able to distinguish familiar from unfamiliar faces?
From brain to perception
Newsome et al. (1989) Neuronal correlates of a perceptual decision
Salzman et al. (1990) Cortical microstimulation influences perceptual judgements of motion direction
The increasingly complex nature of feature detectors being described throughout the visual system raised the issue of how and when these signals merge with “internal” needs and wants to direct behavior. A breakthrough came when Newsome and colleagues made single-unit recordings during a visual perception task (Newsome et al., 1989). They focused their recordings on direction-selective neurons in area MT, which possess receptive fields that cover wide swaths of the visual field (Dubner and Zeki, 1971; Zeki, 1974; Maunsell and Van Essen, 1983). Importantly, lesion studies had shown area MT to be crucial for the perception of moving visual stimuli (Newsome and Paré, 1988). Newsome and colleagues related neuronal activity to the behavioral performance of macaques, who were asked to judge the collective motion of a cloud of dots that drifted either to the left or right, with varying degrees of coherence. The results from these experiments suggested that individual neurons judge motion direction just as well—or even better—than the animal does, and predicted that perceptual judgment of motion direction might only require a handful of cells (Britten et al., 1992). In a follow-up study, Salzman et al. (1990) would show that local electrical stimulation of physiologically characterized MT neurons within a single “direction column” is enough to influence the perception of motion in the direction associated with the stimulated column. Work over the next decades that measured the responses of individual MT neurons more specifically on the timescale of perceptual decisions, and took noise correlations into account, found that animals tend to perform significantly better than individual MT neurons (Cook and Maunsell, 2002; Cohen and Newsome, 2009). Recent technical developments that enable one to read–write neuronal activity in vivo, at single-cell resolution (Packer et al., 2015; Carrillo-Reid et al., 2019; Marshel et al., 2019), provide an opportunity to further define the link between neuronal activity and perception. What remains is to determine, on single trials, exactly how many cells and of which type and in which brain region are responsible for specific forms of perception, and to what extent such computations are distributed across different levels of the visual hierarchy.
The whole visual system
Felleman and Van Essen (1991) Distributed hierarchical processing in the primate cerebral cortex
Positing distinct roles for different parts of the brain has a long history. Some Greek and Roman thinkers proposed different roles for cerebrum and cerebellum based on differences in how soft they were to touch; in the middle ages, a popular idea ascribed unique brain functions to the ventricles (Gross, 1998; Finger, 2001). Our modern understanding of the localization of function stems from work in human patients exhibiting specific neurological problems (Broca, 1861) and from precise electrical stimulation and lesion studies in both animal models and humans (Penfield and Rasmussen, 1950; Fritsch and Hitzig, 2009). This led to encyclopedic subdivisions of cortex: Ferrier (1886) defined over a dozen distinct functional parts of cortex; Brodmann (1909) divided cerebral cortex into over 50 areas. What Felleman and Van Essen (1991) provided was far more than a simple update to cortical subdivisions, although they did that too. Instead, they provided a hierarchical connectivity map that showed all the known paths taken by information as it passes from eye to brain. This was back-breaking work—they performed a meta-analysis of all previous work on primate visual cortical connectivity, anatomy, and physiology, building upon their earlier work (Van Essen and Maunsell, 1983) as well as work from others (Allman and Kaas, 1976; Zeki, 1978). To generate a functional hierarchy of connected visual areas, they leveraged recently uncovered long-range cortical connectivity principles that had started to coalesce: feedforward projections tend to project to layer 4, whereas feedback projections tend to avoid layer 4 (Rockland and Pandya, 1979). Their opus outlines 32 visual cortical areas organized across nine hierarchical layers, with each layer being highly interconnected. They attempted to generalize their conclusions from vision to other senses and to other species, and set the tone for more recent efforts, such as the Allen Brain Mouse Connectivity Atlas (Oh et al., 2014). However, as we learn more about the diversity of cell types and connectivity profiles even within a single region of the visual system (Zeng et al., 2012; Jiang et al., 2015; Tasic et al., 2016, 2018; Gouwens et al., 2019), we will need to reassess and significantly update our whole-brain functional connectivity atlases and contemplate how this affects our understanding of hierarchies of visual processing.
Conclusion
David Hubel famously described his habit of “reading as little as possible in…neurophysiology,” instead relying on colleagues to keep him informed of important findings (Hubel and Wiesel, 2005). We hope that the 25 articles highlighted above, and the references embedded therein, can serve a similar role as Hubel's helpful colleagues. It is inevitable, though, in putting together a classic reading list that many foundational studies will be left out. We regret that the list did not include work related to many topics, including but not restricted to, functional connectivity (Gilbert and Wiesel, 1989), color processing (Livingstone and Hubel, 1984), binocular or stereo vision (Barlow et al., 1967), attention (Cohen and Maunsell, 2009), eye movements (Wurtz and Goldberg, 1972; Schiller and Stryker, 1972), predictive coding (Rao and Ballard, 1999), circuit development (Rakic, 1974), molecular cues underlying development (Nakamoto et al., 1996), single-cell sequencing (Peng et al., 2019), and recent advances in computational modeling (Yamins et al., 2014). However, regarding working in a field that already possessed a “staggering mass of literature,” in the 1920s Selig Hecht wrote that it was “with much trepidation” that he would write any scientific article about the visual system at all, going so far as to say that his hope was “not to add to the existing material, but rather to subtract from it” (Hecht, 1924). In writing this review, we have tried to embody this spirit.
Materials and Methods
We began this process by contacting, via e-mail, roughly 50 leading visual neuroscientists from around the world. We asked each to provide a top 25 list (though we indicated that this number was flexible) of their favorite articles in the field of visual neuroscience, or which they felt were most influential. Our only, albeit loose, constraint was that we sought to tell the story of “conscious” visual perception, but gave contributors flexibility to detour as they saw fit. In choosing scientists to contact, we tried to select established researchers who focus on various regions of the visual system, who use different model systems, and who use different experimental approaches. In total, we received lists from 23 neuroscientists and used their lists as “votes,” from which we compiled a community consensus list of the top 25 articles. It is important to note that while we tried to limit our bias on the outcome of the final list (eg. we contacted a diverse set of visual neuroscience researchers and did not vote ourselves), there is likely a small bias towards our specific subfields of research, as researchers whom we personally know were more likely to provide lists when contacted. We regret that several important topics in visual neuroscience do not feature in this list, and we have tried to correct this by briefly outlining some of these important topics in the concluding paragraph of the text.
We thought it would be informative to provide a brief meta-analysis of the lists that we received from our contributors. From 23 contributors, over 250 unique articles were selected, spanning in time from 1924 to 2019. Approximately 200 articles were selected by only a single contributor, indicating the diversity of contributor opinions. Within each section of this article, whenever possible we attempted to refer to articles that were selected by our contributors but that did not make the top 25 list. The most highly voted article was Hubel and Wiesel (1962), with 10 votes. A total of 25 articles received three votes each or more, and these are what we used to make the final list. Six of these finalists also appear in Kuffler's Oldies but Goodies collection. The preface and table of contents of Stephen Kuffler's Oldies but Goodies classic reading list are included as Extended Data Fig. 1-1.
Footnotes
This research was supported by funding from both the Canada Research Chairs program and the Alfred P. Sloan Foundation to S.T. and A.K. We thank all of the following contributors who provided us with lists: Alessandra Angelucci, Gautam Awatramani, Vijay Balasubramanian, Marlene Berhmann, David Berson, Tobias Bonhoeffer, Richard Born, John Dowling, Marla Feller, William Guido, Nancy Kanwisher, John Maunsell, Markus Meister, Anthony Movshon, Fred Rieke, Massimo Scanziani, Joshua Sanes, Michael Shadlen, Carla Shatz, Murray Sherman, Joshua Trachtenberg, Doris Tsao, and Leslie Ungerleider. We also thank Arjun Bharioke, Richard Born, Erik Cook, John Dowling, Markus Meister, Christopher Pack, Edward Ruthazer, and Leslie Ungerleider for discussions and comments on the manuscript. We thank Keila Garcia for providing a translation of the legend in Figure 1.
The authors declare no competing financial interests.
- Correspondence should be addressed to Stuart Trenholm at stuart.trenholm{at}mcgill.ca or Arjun Krishnaswamy at arjun.krishnaswamy{at}mcgill.ca