Abstract
Real-world actions require one to simultaneously perceive, think, and act on the surrounding world, requiring the integration of (bottom-up) sensory information and (top-down) cognitive and motor signals. Studying these processes involves the intellectual challenge of cutting across traditional neuroscience silos, and the technical challenge of recording data in uncontrolled natural environments. However, recent advances in techniques, such as neuroimaging, virtual reality, and motion tracking, allow one to address these issues in naturalistic environments for both healthy participants and clinical populations. In this review, we survey six topics in which naturalistic approaches have advanced both our fundamental understanding of brain function and how neurologic deficits influence goal-directed, coordinated action in naturalistic environments. The first part conveys fundamental neuroscience mechanisms related to visuospatial coding for action, adaptive eye-hand coordination, and visuomotor integration for manual interception. The second part discusses applications of such knowledge to neurologic deficits, specifically, steering in the presence of cortical blindness, impact of stroke on visual-proprioceptive integration, and impact of visual search and working memory deficits. This translational approach—extending knowledge from lab to rehab—provides new insights into the complex interplay between perceptual, motor, and cognitive control in naturalistic tasks that are relevant for both basic and clinical research.
Introduction
Real-world sensorimotor behavior—from grasping a toothbrush after breakfast to washing the dishes after dinner—seems so simple that it is easily taken for granted, so long as one is healthy. But for a neuroscientist, such “eye-hand coordination” behavior requires the seamless integration of multiple brain processes. The visual aspects alone involve object recognition and localization within complex scenes, and then the integration of this information with proprioception and other senses for use in action. The action systems include coordinated control of the eyes, head, and hand in these examples. These mechanisms are guided by top-down cognitive processes, including directed attention, decision-making, and planning. Daily goal-directed behavior that we take for granted can be severely impacted by neurologic deficits and cause severe challenges for various patient populations.
Scientists face several challenges in understanding how the brain coordinates real-world behavior, and how to understand and treat deficits in real-world sensorimotor behavior. One challenge is that the neural mechanisms for sensory systems, perception, cognition, sensorimotor transformations, and motor coordination have traditionally been studied in isolation, whereas one must understand how they interact to study real-world, coordinated action (Ballard et al., 1992; Johansson et al., 2001; Crawford et al., 2004; Land, 2006). A second challenge is that natural behavior, by its nature, is relatively uncontrolled compared with typical psychophysics tasks, hard to replicate in the laboratory, and hard to measure, especially in conjunction with brain activity. This factor has been mitigated by more sophisticated visual presentation technologies, body motion tracking, and wearable devices for behavioral and neuroimaging studies (Fig. 1A).
Measuring coordinated action with different experimental tools. A, Top, The neural control of eye-hand and body coordination can be probed using MRI-compatible tablets and eye-trackers in the scanner or by developing portable setups (e.g., portable EEG system). Middle, Using a robotic manipulandum allows experimental control of the visual and movement space (e.g., mechanical perturbations) and a head-fixed setup enables well-calibrated high-precision eye tracking, while investigating real object manipulation. Bottom, Studies using virtual reality setups or head-mounted eye-tracking glasses allow the study of eye-hand coordination in naturalistic environments. B, Schematic represents how different experimental tools vary along two axes. The ability by the experimenter to control the visual and physical environment (x axis) and the translation of the observed behavior to the real world (y axis). Green boxes represent behavioral methods. Gray boxes represent neuroimaging techniques.
One approach to overcome the challenge of maintaining experimental control in a realistic environment is to study action tasks in virtual reality (Fig. 1B), which allows the manipulation of object appearance and physical properties within a naturalistic scene (Diaz et al., 2013; Rolin et al., 2019; Cesanek et al., 2021). Despite these advantages, the ecological validity of the virtual presentation of objects for action has been questioned (Harris et al., 2019). Another approach is to use portable eye and head tracking in combination with algorithmic methods that classify eye and/or body movements (Nath et al., 2019; Kothari et al., 2020; Matthis and Cherian, 2022). These recent technological advances in hardware and software allow us to move research on visuomotor control from the laboratory to real-world applications, with the ultimate goal of translating fundamental findings to natural behavior in healthy and brain-damaged individuals.
The current review, based on the 2023 Society for Neuroscience Mini-Symposium Perceptual–Cognitive Integration for Coordinated Action in Naturalistic Environments, illustrates such progress through six topics that investigate the mechanisms for goal-directed, coordinated action using technologies that span the naturalistic approaches shown in Figure 1. The first three topics focus on fundamental research questions (visuospatial coding of action-relevant object location, adaptive eye-hand coordination, visuomotor integration for manual interception), whereas the final three highlight applications to the understanding of neurologic deficits (steering in the presence of cortical blindness (CB), visual-proprioceptive integration in stroke survivors, impact of working memory deficits on visual and manual search). Together, these topics provide a coherent narrative that illustrates how studying naturalistic behaviors improves the potential for the translation of neuroscience to real-world clinical situations.
Goal-directed action in healthy individuals
An understanding of the brain, and its deficits, begins with the study of normal function. This first section addresses several such topics, starting with visuospatial coding of action goals, fundamental behavioral aspects of eye-hand coordination, and then considers specific visuomotor mechanisms for interception. In each case, we consider how laboratory experiments can be translated to real-world conditions.
Spatial coding for goal-directed action in naturalistic environments
A first step in understanding the neural mechanisms for goal-directed reach involves the mechanisms whereby the visual system encodes spatial goals. This can be accomplished using one of two main categories of reference frame. The first type (egocentric reference frames) involves determining the position of an object with respect to the self (i.e., eye position, hand position, etc.) (Blohm et al., 2009; Crawford et al., 2011). Typically, this has been associated with dorsal regions of cortex (Fig. 2A), particularly within medial posterior parietal cortex and frontal regions (Zaehle et al., 2007; Chen et al., 2014). These regions are able to use current or remembered visual information to program appropriate motor plans online (Buneo and Andersen, 2006; Piserchia et al., 2017).
Schematic overview of the major brain regions and pathways involved in goal-directed eye-hand coordination. Pathways for vision, object and motion perception, eye movements, and top-down cognitive strategies are integrated with cortical, subcortical, and cerebellar networks for sensorimotor transformations to produce coordinated action. A, Left brain lateral view. B, Right brain medial view. LOC, Lateral occipital cortex; MT, middle temporal area; PPC, posterior parietal cortex; SPL 7, superior parietal lobule area 7; SPL 5, superior parietal lobule area 5; IPL, inferior parietal lobule; S1, somatosensory cortex; M1, primary motor cortex; PMC, premotor cortex, SMA, supplementary motor area; PMd, dorsal premotor cortex; FEF, frontal eye fields; PMv, ventral premotor cortex; dlPFC, dorsolateral PFC; CBM, cerebellum; SC, superior colliculus; LGN, lateral geniculate nucleus; PPA, parahippocampal place area; LG, lingual gyrus; Cn, cuneus. Created with Biorender.com.
However, in real-world circumstances, egocentric mechanisms are often augmented or replaced by the use of allocentric reference frames (i.e., the coding of object position relative to other) surrounding visual landmarks. The ventral visual stream (Fig. 2A) has been associated with the use of allocentric reference frames for object location (Adam et al., 2016). Humans can be instructed to rely on one or the other cue through “top-down” instructions (Chen et al., 2011; Chen and Crawford, 2020), but normally this weighting occurs automatically, through “bottom-up” processing of sensory inputs (Byrne and Crawford, 2010). In this case, egocentric and allocentric information is weighted differently, depending on context (Neely et al., 2008; Fiehler and Karimpur, 2023).
To understand how those findings translate to real-world environments, recent investigations have opted for testing within more naturalistic environments, such as 2D complex scenes or 3D virtual environments (Fiehler and Karimpur, 2023). These studies have shown that cognitive factors influence the weighting of egocentric and allocentric visual information, including task relevance (Klinghammer et al., 2015) and prior knowledge (Lu et al., 2018). Another factor in the coding of object location that has garnered recent attention is object semantics. For example, if a scene contains objects from two different semantic categories (e.g., food vs utensils in a kitchen), a target object's perceived location will be influenced more by surrounding objects of the same category, compared with unrelated objects (Karimpur et al., 2019). However, many questions remain about how scene semantics affect the spatial coding of objects. This includes whether these effects reside at the level of individual object properties, interactions thereof, and/or whether these properties must be scene-specific (Võ, 2021).
Given the richness of information present within scenes, one way to systematically understand natural behaviors has been prompted by the categorization of scenes according to a particular “grammar” defined by specific building blocks in a hierarchical structure (Fig. 3A) (Võ, 2021). In this hierarchy, the lowest level is the (local) object level (see above) (Karimpur et al., 2019), whereby scene context information can be extracted from the nature or category of objects (small, moveable objects) found therein (e.g., kitchen-related objects are cues to the likelihood that they are in a kitchen). At present, this is the only level where a causal relationship between semantic information and spatial coding of objects has been established (Karimpur et al., 2019). Intuitively, when one thinks about designing a given scene, the large global objects (i.e., large, immovable objects with strong, statistical local object associations) are the first ones whose positions are defined (Draschkow and Võ, 2017) and, only later, are the local objects placed. Accordingly, the next level of the scene grammar (semantics) hierarchy is the global object level (Võ, 2021). Given the relationship between these two levels in the scene grammar hierarchy of location perception, it remains to be determined how the global object level may also influence local object spatial coding in naturalistic environments. Current efforts are investigating these relationships, as well as how gaze can guide spatial coding within different scenes (Boettcher et al., 2018; Helbing et al., 2022). Understanding how scene semantic information and spatial coding mechanisms interact will play an important role in understanding how surrounding stimuli influence object recognition, localization, and action.
Visual signals for action. A, Visual scenes consist of “grammar” that is defined by building blocks in a hierarchical structure, consisting of phrases (top), global objects (middle), and associated local objects (bottom) (Võ, 2021). B, When reaching to stationary objects, the target object is commonly fixated throughout the reach, which allows the integration of foveal vision of the object and peripheral vision of the hand. C, When intercepting moving objects, the eyes either track the moving object with SPEMs or fixate on the expected interception location to guide the hand toward the object.
Toward adaptive eye-hand coordination in real-world actions
Once objects are localized, the brain can initiate the equally important process of acting on them. Real-world actions, such as cooking, involve the coordination of eye movements and goal-directed actions (Land, 2006). Past research has shown that eye and hand movements are functionally coordinated in a broad range of manipulation and interception tasks (de Brouwer et al., 2021; Fooken et al., 2021). When reaching for and manipulating objects, eye movements support hand movements by fixating critical landmarks as the hand approaches, and then shifting to the next landmark (Ballard et al., 1992; Land et al., 1999; Johansson et al., 2001).
In controlled laboratory environments, the eyes often remain anchored to the reach goal throughout the movement (Neggers and Bekkering, 2000, 2001). This has been associated with the inhibition of neuronal firing in the parietal saccade region by neurons in the parietal reach region (Fig. 2) (Hagan and Pesaran, 2022). In this case, where gaze (foveal vision) remains fixed on the reach goal, the hand is initially viewed in peripheral vision and then transitions across the retina toward the fovea (Fig. 3B). This allows one to compare foveal vision with peripheral vision of the hand to directly compute a reach vector in visual coordinates, whereas comparison of target and hand location in somatosensory coordinates requires extraretinal signals for eye and hand position (Sober and Sabes, 2005; Buneo and Andersen, 2006; Beurze et al., 2007; Khan et al., 2007). Further, peripheral vision of the hand can be used to rapidly (∼150 ms) correct for reach errors (Paillard, 1996; Sarlegna et al., 2003; Saunders and Knill, 2003, 2004; Smeets and Brenner, 2003; Dimitriou et al., 2013; de Brouwer et al., 2018). Fixating the goal at reach completion allows the use of central (i.e., parafoveal) vision, which engages slow visual feedback loops to guide fine control of object contact (Paillard, 1996; Johansson et al., 2001). In addition, central vision is used to monitor and confirm completion of task subgoals (Safstrom et al., 2014). Conversely, deviations of gaze from the goal can degrade reach accuracy and precision, especially in memory-guided reaching (Henriques et al., 1998), and can either improve or degrade performance in patients with visual field deficits, depending on direction of the goal relative to the deficit (Khan et al., 2005a, 2007).
In naturalistic tasks, there is often a trade-off between fixating the eyes on the immediate movement goal versus deviating gaze toward other action-relevant information. Daily activities, such as locomotion (Jovancevic-Misic and Hayhoe, 2009; Matthis et al., 2018; Domínguez-Zamora and Marigold, 2021), ball catching (Cesqui et al., 2015; López-Moliner and Brenner, 2016), or navigation (Zhu et al., 2022), require the integration of low-level perceptual information and high-level cognitive goals (Tatler et al., 2011). The ongoing decision-making about which fixation strategy to prioritize, and when and where to move the eyes, depends on the visuomotor demands of the task (Sims et al., 2011), the visual context of the action space (Delle Monache et al., 2019; Goettker et al., 2021), and the availability of multisensory signals (Wessels et al., 2022; Kreyenmeier et al., 2023). Because action demands and perceptual context constantly change in naturalistic tasks, a major challenge for future research is to quantitatively describe ongoing visuomotor control in relation to changing environments.
Although humans perceive and act on the environment in parallel, the two mechanisms have mostly been studied in isolation. Critically, the dynamic interaction between perception and action depends on when and where relevant visual information is needed or becomes available. Whereas many perceptual events, such as the color change of a traffic light, are determined externally, the timing of motor events, such as contacting a target object, can to some extent be controlled by the actor. Past research has shown that people learn to predict temporal regularities of the environment (Nobre and Van Ede, 2023), and this knowledge can be exploited to intelligently aim eye movements while monitoring competing locations of interest (Hoppe and Rothkopf, 2016). An open research question is whether and how people use their perceptual expectation of external events to continuously adapt the timing of their ongoing movements and eye-hand coordination. A promising approach to understand the dynamic interaction between perceiving, thinking, and acting is to design tasks that require multitasking (e.g., when participants perform a perceptual task in parallel with an action task).
Visuomotor integration for manual interception
A major challenge for real-world eye-hand coordination is that action goals are not always stable. A case in point is manual interception behavior. Manual interception is a goal-directed action in which the hand attempts to catch, hit, or otherwise interact with a moving object. Studying interceptive actions offers two major advantages for understanding perceptual-motor integration in naturalistic environments: First, manual interceptions are ubiquitous in everyday tasks (e.g., quickly reacting to catch a falling object; Fig. 3C) and are a hallmark of skilled motor performance (e.g., in many professional sports). Second, interception tasks allow for simple, systematic manipulation of separable parameters related to both bottom-up sensory properties (e.g., target speed, occlusion) and top-down cognitive-motor strategies (e.g., deciding on interception location, accuracy demands) (Zago et al., 2009). Interceptions rely on combining sensorimotor predictions with online sensory information about the object and the hand, enabling continuous adjustments to intercept objects under varied spatial and temporal constraints (Brenner and Smeets, 2018). Precise timing is facilitated by integrating both object motion kinematics and temporal cues (Chang and Jazayeri, 2018), allowing for enhanced temporal predictions beyond what would be expected from perceptual judgments alone (de la Malla and López-Moliner, 2015; Schroeger et al., 2022).
Eye movements play an integral role in interception performance (for review, see Fooken et al., 2021). As is the case with stationary objects (see above), deviations of gaze from the tracked object systematically degrade performance (Dessing et al., 2011). To maintain gaze on a moving object, humans combine saccades with smooth pursuit eye movements (SPEM) to enhance spatiotemporal precision for interception (van Donkelaar and Lee, 1994; Fooken et al., 2016), but engaging the SPEM system is less useful when the object motion trajectory is predictable or greater accuracy is required (de la Malla et al., 2019). When interception depends on accurately perceiving a moving object's shape (e.g., circle or ellipse), initial saccades are faster and gaze lags farther behind the object, compared with situations where object shape is irrelevant (Barany et al., 2020a), suggesting that eye movement strategies are adapted to task demands.
Compared with studies of reaching to static targets (Battaglia-Mayer and Caminiti, 2019), relatively few have investigated the neural basis of manual interception. Neurophysiological and neuroimaging studies of target interception have revealed dorsal visual regions (Fig. 2A), including the visual middle temporal area (Bosco et al., 2008; Dessing et al., 2013) and the superior parietal lobule area 7 in posterior parietal cortex (Merchant et al., 2004; de Azevedo Neto and Júnior, 2018; Li et al., 2022) involved in dynamically transforming visual motion information into motor plans. This sensorimotor information is reflected in primary motor cortex outputs (Merchant et al., 2004; Marinovic et al., 2011) and conveyed to brainstem areas, such as the superior colliculus, to trigger rapid interception (Contemori et al., 2021). Connections between cortex and the cerebellum (Spampinato et al., 2020) also contribute to accurate sensory prediction for timing of interception (Fig. 2) (Diedrichsen et al., 2007; Therrien and Bastian, 2019).
Neuroimaging studies of human manual interception are lacking, but advances in neuroimaging techniques for studying visuomotor interactions with static targets (for review, see Gallivan and Culham, 2015) can guide future approaches. For example, hand movement kinematics have been recorded during fMRI data acquisition using real-time video-based motion tracking (Barany et al., 2014), MRI-compatible tablets (Karimpoor et al., 2015), and real 3D action set-ups (Marneweck et al., 2018; Knights et al., 2022; Velji-Ibrahim et al., 2022) to reveal distinct activation patterns associated with naturalistic reaching and grasping across the cortical sensorimotor network (Fig. 2). Eye movements during scanning can now be directly reconstructed from the MR signal (Frey et al., 2021; Kirchner et al., 2022), when using MRI-compatible eye-trackers is not feasible. More sensitive multivariate data analysis techniques, such as representational similarity analysis, during complex action tasks have uncovered task-dependent neural representations of low-level muscle information and high-level kinematic and task demand information in primary motor cortex (Barany et al., 2020b; Kolasinski et al., 2020). Improvements in source localization in EEG and sophisticated techniques for studying whole-brain functional “connectivity,” such as graph theory analysis, can likewise contribute to studying the temporal evolution of sensory and motor information across frontal and parietal regions in less constrained environments (Ghaderi et al., 2023). Finally, emerging mobile neuroimaging systems (Stangl et al., 2023) are a promising tool for understanding neural dynamics in naturalistic settings, such as the table tennis task illustrated in Figure 1A. This experiment showed fluctuations in parietal-occipital and superior parietal cortices related to the object kinematics and predictability of the upcoming interception (Studnicki and Ferris, 2023). A multimodal combination of these new methods may help close the gap in our understanding of how the human brain continuously plans movements when action goals are not stable.
Deficits in goal-directed action produced by neural damage
In this second half of the review, we explore ways in which the study of naturalistic behaviors following brain damage (primarily stroke) both shed light on brain function and suggest possible means of diagnosis and rehabilitation. These topics range from the influence of central visual processing on real-world behaviors, such as driving, to disintegration of vision with the internal sense of body position (proprioception), to the impact of visual search and memory capacity on movement control during brain damage. Each example highlights how translational neuroscience helps one to understand the real-world impact of cortical damage.
CB and the use of optic flow in steering and navigation
Each year, up to half a million individuals in the United States suffer from stroke-related damage to the primary visual cortex (V1). When this damage is unilateral, it leads to CB in a quarter to a half of the contralateral visual field (Gilhotra et al., 2002; Pollock et al., 2011). CB negatively impacts autonomy and overall quality of life in many ways, including the ability to safely or legally drive (Papageorgiou et al., 2007; Gall et al., 2010; Pollock et al., 2011). Although individuals with CB are legally prohibited from driving in at least 22 U.S. states, those CB-affected drivers who choose to exercise their legal right to remain on the road experience significantly more motor vehicle accidents than visually intact controls (McGwin et al., 2016) and demonstrate more variable steering behavior (Bowers, 2016).
One possible explanation for these behavioral deficits is that the processing of global motion patterns commonly used to guide steering (optic flow) (Gibson, 1950; Warren et al., 2001) is disrupted in the presence of CB (Fig. 4) (Issen, 2013). In healthy individuals, partially obscuring or omitting the visual field does not significantly degrade the perception of heading (Warren and Kurtz, 1992), so driving deficits with CB are not likely because of a simple omission of visual information. An alternative explanation is that the blind field acts as a generator of neural noise (Cavanaugh et al., 2015). It is possible that this noise is spatially integrated with unaffected regions before the downstream representation of global motion in the middle temporal area (Adelson and Movshon, 1982) (Fig. 2A), and area MSTd (Schmitt et al., 2020). MSTd receives input from the middle temporal area, but has larger receptive fields, attributed to the decoding of whole-field patterns of global motion that arise during translation (van Essen and Gallant, 1994). Emerging research on motion processing for navigation in these areas promises further insights into understanding the impact of CB. Other research has used artificial neural networks to estimate and/or control heading on the basis of simulated input (Layton, 2021; Mineault et al., 2021). Finally, the use of mobile eye tracking in freely moving humans has provided some of our first empirical measurements of the statistics of retinal optical flow that might shape selectivity (Dowiasch et al., 2020; Muller et al., 2023). Together, these approaches provide valuable guidance for physiological investigations into how the brain responds to natural motion stimulation during translation, and how this is affected by cortical damage.
The role of optic flow on visually guided steering. A, Drivers are immersed in a simulated environment seen through a head-mounted display with integrated eye tracking. B, Exemplary view inside the virtual reality as participants attempt to keep their head centered within a parameterized and procedurally generated roadway. This image is superimposed with a computational estimate of optic flow indicated here as white arrows (Matthis et al., 2018). C, To assess the effect of cortical blindness on the visual perception of heading will require the development of computational models of visually guided steering that account for the blind field. For illustrative purposes, we have superimposed the results from a Humphrey visual field test on the participant's view at a hypothetical gaze location with approximate scaling (reprinted from Cavanaugh et al., 2015 with permission).
When considering the role of optic flow experienced during navigation, it is also important to consider the role of eye movements and, in the context of CB, the compensatory gaze behaviors that appear spontaneously following the onset of CB (Elgin et al., 2010; Bowers et al., 2014; Bowers, 2016). As described above, gaze position has a strong influence on goal-directed behaviors in both healthy and brain-damaged individuals with visual field-specific deficits (Khan et al., 2005a, 2007). Likewise, altered gaze behaviors have a strong effect on steering, even in visually intact participants (Wilkie and Wann, 2003; Robertshaw and Wilkie, 2008). The effect of modified eye movements extends beyond influencing what portion of the scene is foveated: (1) because saccades-related inputs to MST cause transient distortions in perceived heading direction (Bremmer et al., 2017); and (2) because the rotational component of eye motion interacts with the optic flow because of translation (Cutting et al., 1992), eye movements have the potential to play an active role in structuring the pattern of retinal optic flow in a way that is optimized for the visual guidance of steering (Matthis et al., 2022). Recent modeling work has demonstrated that instantaneous extraretinal information about the direction of gaze relative to heading is sufficient for the reproduction of human-like steering behaviors when navigating a slalom of waypoints (Tuhkanen et al., 2023). Others suggest that navigation may also leverage path planning and internal models (Alefantis et al., 2022; but see Zhao and Warren, 2015). In the context of CB, altered gaze behaviors may indicate a shift in reliance from noisy optic flow signals toward increased weighting of alternative forms of visual information (Warren et al., 2001), such as strong allocentric cues from the road edges (Land and Horwood, 1995).
These considerations emphasize the critical need to study the impact of CB in naturalistic contexts that facilitate compensatory gaze behaviors, and translation of fundamental neuroscience (see “Goal-directed action in healthy individuals”) into these contexts. Naturalistic studies will provide new insight into the mechanisms visual function in the absence of the primary visual cortex and would refine current theories concerning the mechanisms underlying CB's disruptive influence on visual processing. Continued investigation of CB in naturalistic contexts also promises to provide new methods for assessment (Kartha et al., 2022) and rehabilitation (Kasten and Sabel, 1995) for applications to real-world settings beyond the laboratory environment (Fig. 4).
Impact of stroke on multisensory integration of vision and proprioception
In stroke, the integrity of neural processing is impacted by injury to the CNS. Quite often, this affects sensorimotor function of the upper limb (Lawrence et al., 2001). To date, problems with action-based behavior, including movement quality, coordination, and stability, are typically attributed to impairments in motor execution. However, the contributions of sensory information to the disordered output of movement are also critically important (Jones and Shinton, 2006; Carey et al., 2018; Rand, 2018). For example, as noted above, visual feedback can be used to calculate the direction of reach in eye coordinates, but the proprioceptive sense of eye and hand position is needed for interpreting visual input and calculating desired reach direction in body coordinates (Sober and Sabes, 2005; Buneo and Andersen, 2006; Khan et al., 2007).
In the past decade, several studies have highlighted the contributions of proprioceptive impairments of the upper limb toward functional impairments after stroke (Leibowitz et al., 2008; Dukelow et al., 2010; Semrau et al., 2013, 2015; Simo et al., 2014; Young et al., 2022). As one might expect, individuals who lack proprioception (because of degradation of sensory afferents) show substantial improvements in movement execution in the presence of visual feedback of the hand (Ghez et al., 1995; Sarlegna and Sainburg, 2009). However, some (∼20%) individuals with proprioceptive impairments have difficulty executing reaches from a proprioceptive reference, even with full vision of the limb (Fig. 5A, left, middle) (Semrau et al., 2018; Herter et al., 2019). Notably, these deficits could not be explained by visual field impairments or visual attention deficits (Patel et al., 2000; Meyer et al., 2016a). Thus, questions remain about the limitations of visual feedback for stroke rehabilitation, how brain injury impacts sensory integration, and how this impacts goal-directed motor behavior.
Movement impairment in stroke survivors. A, Kinesthetic Matching Task where a robotic manipulandum (Kinarm Exoskeleton Lab) passively moved the more affected arm and stroke survivors mirror-matched with the less affected arm (left). Middle, Performance from a stroke survivor that performed well with vision of the limb (top, left and right) and without vision of the limb (top, left); other participants performed worse than control participants with and without the use of vision (bottom, left and right), although vision significantly improved performance for some (right, top and bottom). Distribution of participant performance demonstrates that only 12% of stroke survivors (total N = 261) used vision to effectively correct proprioceptively referenced movements, suggesting that vision often fails to effectively compensate for multisensory impairments (right) (Semrau et al., 2018). B, Left, Saccades made by a healthy control (top) and stroke survivor (bottom) during hand movement (green) and hand dwell on target (yellow) in the Trails Making task. Excessive saccades in stroke survivors are associated with deficits in working memory and top-down visual search (middle). When stroke survivors make saccades during reaching, those movements tend to be slower compared with when they do not make saccades during reaching (right panels).
Previous studies have produced conflicting results concerning the degree to which patients are able to compensate for proprioceptive impairments of the hand and limb using visual feedback (Darling et al., 2008; Scalha et al., 2011; Semrau et al., 2018; Herter et al., 2019). Bernard-Espina et al. (2021) have suggested that we need to consider that vision and proprioception are encoded not only in their native coordinate frames (i.e., retinal coordinates, joint space, etc.), because integration of these signals also occurs within higher-order cortical areas that are equally susceptible to damage after stroke (Buneo and Andersen, 2006; Khan et al., 2007). This might explain why vision does not improve performance in proprioceptive-guided tasks for some stroke survivors (Fig. 5A, right) (Semrau et al., 2018; Herter et al., 2019).
Further, recent neuroimaging work has shown that damage to brain areas not normally associated with proprioception (e.g., the insula, superior temporal gyrus, and subcortical areas) can result in apparent proprioception deficits (Findlater et al., 2016; Kenzie et al., 2016; Meyer et al., 2016b; Semrau et al., 2019; Chilvers et al., 2021). This suggests that the “proprioception” network is far more expansive than previously assumed, including considerable links to visual perception (Fig. 2) (Desimone et al., 1990; Sterzer and Kleinschmidt, 2010).
In summary, impairments because of stroke and other neurologic injuries are typically investigated using unimodal approaches, but recent work has highlighted that the sensorimotor issues observed after stroke are not solely rooted in difficulty with motor execution. To fully understand the underlying mechanisms of complex multisensory deficits, one must consider the integration of sensory information (especially vision and proprioception) for coordination, error correction, and action monitoring. To effectively understand how these systems contribute to movement control after neural injury, we must therefore adopt a multisensory framework.
Visual search and working memory deficits: influence on movement sequences
While many studies focus on single actions, real-world behavior is often composed of action sequences, both of eye motion (see “Goal-directed action in healthy individuals”) and manual actions. For example, when a driver turns a car at an intersection, they must also look for incoming traffic and pedestrians and plan their future movements accordingly. Studies on action sequences often focus on the phenomenon of chunking, a process that causes individual elements of a movement sequence to be “fused” together to become faster, smoother, and cognitively less demanding with practice (Acuna et al., 2014; Ramkumar et al., 2016). However, the chunking literature has largely ignored the role of eye movements for action sequencing. The few studies that have considered how humans both search and reach toward a sequence of multiple visual targets have suggested that these behaviors are statistically optimized to minimize sense of effort (Gepshtein et al., 2007; Diamond et al., 2017; Moskowitz et al., 2023a,b).
The more extensive visual search literature has provided insights into the mechanisms of how bottom-up (stimulus-driven) and top-down (goal-dependent) attention drive eye movements (Treisman, 1986; Eckstein, 2011; Wolfe et al., 2011) and action selection (Buschman and Miller, 2007; Siegel et al., 2015). While bottom-up attention is primarily guided by stimulus salience, top-down processes involve knowledge-driven predictions that enable observers to direct their gaze toward task-relevant regions of visual space (Henderson, 2017). Visual search studies have also shown that healthy humans exhibit near optimal visual search behavior (Najemnik and Geisler, 2005), even in the presence of distractors (Ma et al., 2011).
A critical component of visual search is working memory: the ability to temporarily hold and manipulate information during cognitive tasks (Baddeley, 2003; Miller et al., 2018). Visual search slows when the spatial working memory buffer is loaded (Oh and Kim, 2004; Woodman and Luck, 2004). Importantly, spatial working memory also provides efficient storage of spatial locations to allow fast and smooth execution of reaching movements. Not coincidentally, the cortical systems for saccades and working memory overlap extensively (Constantinidis and Klingberg, 2016), likely because attention often drives both and because visual memory has to be updated during eye movements (Henriques et al., 1998; Dash et al., 2015).
Right hemisphere damage often produces spatial neglect, which may contribute to visual search deficits in stroke survivors (Ten Brink et al., 2016). But visual search is also compromised after left hemisphere stroke in individuals that do not show spatial neglect (Mapstone et al., 2003; Hildebrandt et al., 2005). Further, the cortical regions associated with disorganized visual search overlap with the spatial working memory system (Ten Brink et al., 2016). This suggests that trans-saccadic spatial integration may be disrupted in stroke survivors (Khan et al., 2005a, b).
Recently, an augmented-reality version of the Trails-Making test (Reitan, 1958) was used to address how visual search and spatial working memory deficits in stroke survivors affect limb motor performance. In this test, the optimal search area is in the vicinity of hand location. In this task, stroke was associated with impairments of visual search characterized by deficits in spatial working memory and top-down topographic planning of visual search (Singh et al., 2017). Healthy controls either restricted the search space around the hand or used working memory to search within a larger space. In contrast, stroke survivors did not use working memory and used many more saccades to search randomly within a much larger workspace, resulting in suboptimal search performance (Fig. 5B, left, middle). Stroke survivors made more saccades as the task became more challenging (Singh et al., 2023). Further, an increased number of saccades was strongly associated with slower reaching speed (Fig. 5B, right), decreased reaching smoothness, and greater difficulty performing functional tasks in stroke survivors (Singh et al., 2018).
Together, these studies suggest that healthy individuals optimally coordinate eye and limb movements to search for objects, store object locations in working memory, and successfully interact with those objects. In contrast, stroke survivors exhibit deficits in top-down spatial organization and use of spatial working memory for visual search. Deficits in these functions, which are mediated by the dorsolateral PFC (Fig. 2), likely contribute to the sluggishness and intermittency of reaching movements that slow down even more with an increase in cognitive load. It has been proposed that enhanced cognitive load and neural injuries may reduce top-down inhibition of the ocular motor system triggering saccades toward salient but irrelevant stimuli, even at the cost of task performance (Singh et al., 2023).
Conclusions
This review highlights six research topics that illustrate how naturalistic laboratory studies can be translated toward real-world, goal-directed action in healthy individuals (“Goal-directed action in healthy individuals”) and individuals with neurologic injuries (“Deficits in goal-directed action produced by neural damage”). In “Goal-directed action in healthy individuals,” we described how ego/allocentric spatial coding mechanisms are influenced by scene context in naturalistic environments, how adaptive eye movement strategies optimize visual information for action, and how manual interception tasks provide a useful framework to investigate eye-hand coordination in naturalistic environments. “Deficits in goal-directed action produced by neural damage” extends similar concepts to understand the behavioral consequences of CB and compensatory gaze behaviors for driving, why limb-based deficits may be the result of and/or aggravated by multisensory impairments in proprioceptive and visual systems, and how stroke symptoms are exacerbated by increased cognitive load in visuomotor tasks. Recurrent themes in these studies include the importance of considering interaction between systems (multisensory, sensorimotor, cognitive-motor, and multiple effector control) and interactions between bottom-up sensory and top-down cognitive/motor signals.
Although none of these studies occurred in completely natural environments, each study attempts to simulate naturalistic behavior through the use of new technologies (stimulus presentation, motion tracking, neuroimaging) and by combining approaches that were previously studied in isolation. This has important practical value because traditional diagnostics often rely on simplistic sensory or motor tests that may not translate well to complex real-world situations. This highlights the need for research and training programs that translate such knowledge for long-term rehabilitation and occupational therapy.
In conclusion, there is considerable agreement that sensory, cognitive, and sensorimotor systems continuously interact to produce goal-directed movements, and that these fragile interactions are readily disrupted in neurologic disorders, such as stroke. Understanding these interactions still presents considerable methodological and conceptual challenges, but the advances described here show that laboratory neuroscience can be translated for clinical populations dealing with real-world, complex problems. Together, these findings provide a nascent and compelling computational and experimental framework for future research.
Footnotes
J.F. was supported by a Deutsche Forschungsgemeinschaft Research Fellowships Grant FO 1347/1-1. B.R.B. was supported by Deutsche Forschungsgemeinschaft Grant FI 1567/6-1 “The Active Observer” and “The Adaptive Mind,” funded by the Excellence Program of the Hessian Ministry for Higher Education, Research, Science and the Arts. D.A.B. was supported by the University of Georgia Mary Frances Early College of Education and University of Georgia Office of Research. G.D. was supported by National Institutes of Health Award R15EY031090 and Research to Prevent Blindness/Lions Clubs International Low Vision Research Award. J.A.S. was supported by National Science Foundation Award 1934650. T.S. was supported by the Penn State College of Health and Human Development. This work was supported in part by University of South Carolina ASPIRE grants. J.D.C. was supported by a York Research Chair. We thank Justin McCurdy and Helene Mehl for assistance with figure creation.
G.D. works as a consultant for Luminopia. All other authors declare no competing financial interests.
- Correspondence should be addressed to Jolande Fooken at jolande.fooken{at}queensu.ca