Abstract
Associative learning involves complex interactions of multiple cognitive factors. While adult subjects can articulate these factors verbally, for model animals such as macaques, we rely on behavioral outputs. In our study, we used pupillary responses as an alternative measure to capture these underlying cognitive changes. We recorded the dynamic changes in the pupils of three male macaques when they learned the associations between visual stimuli and reward sizes under the classical Pavlovian experimental paradigm. We found that during the long-term learning process, the gradual changes in the pupillary response reflect the changes in the cognitive state of the animals. The pupillary response can be explained by a linear combination of components corresponding to multiple cognitive factors. These components reflect the impact of visual stimuli on the pupils, the prediction of reward values associated with the visual stimuli, and the macaques' understanding of the current experimental reward rules. The changing patterns of these factors during interday and intraday learning clearly demonstrate the enhancement of current reward-stimulus association and the weakening of previous reward-stimulus association. Our study shows that the dynamic response of pupils can serve as an objective indicator to characterize the psychological changes of animals, understand their learning process, and provide important tools for exploring animal behavior during the learning process.
- interday and intraday
- macaque monkey
- multiple cognitive factors
- pupillary responses
- reward-based associative learning
Significance Statement
This study aimed to understand animal learning by observing their behavior. By recording macaque monkeys' pupillary responses during a reward-based learning process lasting 2–3 months, researchers found that the dynamic changes in pupillary responses can be influenced by both external and internal factors, which can be explained by a linear combination of components corresponding to multiple factors. This study validates the effectiveness of using pupillary measurements to capture cognitive factors at different time scales, ranging from short term (hours) to long term (months), for the first time. This study provides valuable insights for a deeper understanding of animal learning processes and has significant implications for the application of pupillary responses in medicine, education, and scientific research.
Introduction
Associative learning is prevalent in everyday life and occurs in various species, such as rodents (Parkinson et al., 2000; Tachibana and Yamada, 2022), nonhuman primates (Brasted et al., 2003; Paton et al., 2006), and humans (Bray et al., 2008; Prévost et al., 2011). It is achieved through the continual association of environmental conditions, enhancing the adaptability and survival capacity of organisms (McSweeney and Murphy, 2014; Christoforou, 2017). Associative learning is a process that requires the involvement of multiple cognitive functions and undergoes changes over extended periods (Le Pelley et al., 2016; Wassum, 2022). While adults can describe the process of their associative learning through language, understanding the process in animals and human infants, who are nonverbal, is particularly challenging.
Previous researchers have attempted to infer information about the timing and strength of various cognitive processes by observing the behavior of animals or human infants. Some studies assessed animal decision-making and learning abilities by recording response times or accuracy rates (Brasted et al., 2003; Wirth et al., 2003) or infants' level of interest by measuring the duration of their gaze towards an object (Richards and Gibson, 1997; Wass et al., 2018). However, covert mental processes involved in associative learning, such as mental computation, memory, recall, and prediction, cannot be captured and characterized solely through behavioral measures. Therefore, a more flexible, direct, and accurate approach is needed to track the dynamic changes in multiple cognitive factors during the associative learning process.
Measuring changes in pupil size is one possible method. The dynamic changes in pupil size are influenced by not only external physical stimuli, such as the brightness, size, or eccentricity of the stimulus (Crawford, 1936; Clarke et al., 2003; Gao et al., 2020; Hu et al., 2020), but also internal cognitive functions (Mathôt, 2018; Joshi and Gold, 2020; Strauch et al., 2022), such as arousal (McGinley et al., 2015; Wang et al., 2018), emotion (Bayer et al., 2011; Korn and Bach, 2016; Korn et al., 2017), attention (Hoeks and Levelt, 1993; Wierda et al., 2012; Binda et al., 2014; Willems et al., 2015a; Denison et al., 2020), value assessment (O’Doherty et al., 2003; Bray et al., 2008; Pauli et al., 2015; Cash-Padgett et al., 2018; Xie et al., 2018), and decision-making (De Gee et al., 2014; Strauch et al., 2018; Van Slooten et al., 2018). Previous studies have shown that different cognitive factors may contribute to the dynamic response of pupils. However, most studies only explored the relationship between pupil size and a single cognitive function in a specific state. The dynamics of and changes in pupil response caused by different factors in associative learning remain unclear.
In this study, we aimed to determine the cognitive factors that control the dynamic changes in the pupils of macaques during reward-based associative learning under the classical Pavlovian experimental paradigm (Pavlov, 1927), in which the pupil can be a reliable indicator of reward value (Finke et al., 2021). We created an association between a certain stimulus orientation and high-level rewards and then broke this association and switched to an association between another stimulus orientation and high-level rewards. Animals were trained to learn these associations (for four stimulus orientations) sequentially in the long term (2–3 months). By recording the pupil sizes during associative learning, we not only described the dynamic changes in the pupils during the long-term learning process but also established a linear model based on these features, thus identifying the various cognitive factors that affect pupil changes and further characterizing the dynamic changes in these factors throughout the entire learning process. We found that the changes in pupils during reward-based associative learning were influenced by multiple factors, including the visual stimuli, the prediction of the reward level for visual stimuli, the judgment of reward rules in the current testing block, and the changes in animal arousal level.
Materials and Methods
Preparation of awake monkeys
All procedures were conducted in compliance with the National Institutes of Health Guide for the care and use of laboratory animals and were approved by the Institutional Animal Care and Use Committee of Beijing Normal University. Three male adult rhesus monkeys (Macaca mulatta, DG2 and DQ5; Macaca fascicularis, DP1; 7–10 years old; 6–12 kg) were used. Under general anesthesia induced with ketamine (10 mg/kg) and maintained with isoflurane (1.5–2.0%), a titanium post was attached to the skull with bone screws for immobilizing the animal's head during behavioral training (Wu et al., 2022; Yang et al., 2022). The experiment began after the monkeys fully recovered from the surgery.
Experimental design
Reward-based associative learning task
Reward-based associative learning lasted for approximately 2–3 months, during which animals sequentially associated 0°, 90°, 135°, and 45° with large rewards. Learning the association between each stimulus orientation and large reward lasted for approximately 20 d, which was termed one learning session. Each learning session (corresponding to the association between a large reward and a stimulus orientation at 0°, 90°, 135°, and 45°) included a fixation phase of approximately 4 d and a learning phase of approximately 15 d (Fig. 1F). In each day of the learning phase, animals first experienced two to three same reward blocks (SRBs) (defined as pretest) and then three to five different reward blocks (DRBs) (defined as posttest). In the fixation phase, animals only experienced five to eight SRBs within a day; the first two to three SRBs were also classified as pretest and the last three to five SRBs were classified as posttest for subsequent data classification and comparative analysis (Fig. 1D lower panel).
Reward rules
In each block, there were 150 trials that presented each of the five visual stimuli (stimuli with orientation at 0°, 90°, 135°, 45°, and blank) 30 times, and the order of visual stimulus presentation was completely random. In SRBs, the 150 trials in the block were all accompanied by 1 drop of water (approximately 0.2 ml) as a reward, which was defined as a low reward (LR) (Fig. 1D upper panel). In DRBs, 4 drops of water (approximately 0.8 ml) were given when a specific orientation (e.g., 0°) was presented in a trial (30 trials in the block), which was defined as a high reward (HR). One drop of water (LR) was given (a total of 120 trials in the block) when the other three orientations (e.g., 45°, 90°, and 135°) or the blank were presented in a trial (Fig. 1D middle panel).
High reward-associated stimuli (HRS) and low reward-associated stimuli (LRS) in each learning session
In each learning session (including all pretests and posttests in the fixation phase and learning phase), we defined the visual stimulus associated with HRs in the DRBs in a learning session as the high reward-associated stimulus (HRS). Similarly, the visual stimuli associated with LRs in the DRBs were defined as the low reward-associated stimuli (LRS) in this learning session. It is crucial to note that the designation of HRS and LRS in each learning session was based on their relation to reward value in the DRBs, even though the HRS is accompanied by a LR value in the pretest or posttest of the fixation phase and the pretest of the learning phase in the same learning session. For instance, if 0° is associated with a HR in the DRBs in a learning session, then the HRS is 0°, and the LRS refers to stimuli at 45°, 90°, and 135° for the entire learning session, including the pretest and posttest of the fixation phase and learning phase in this learning session.
This designation of HRS or LRS for visual stimuli facilitates the comparison of pupillary responses evoked by the same stimuli between the fixation phase and learning phase across multiple learning sessions. To distinguish between the fixation phase and the learning phase, we used darker (fixation phase) and brighter (learning phase) colors as indicators (Figs. 1G, 3G,H).
Based on the definitions of the HRS and LRS in each learning session, the pupillary response to the HRS in the learning phase was labeled the HRS in the learning phase, and the pupillary response to the LRS in the learning phase was labeled the LRS in the learning phase. In the fixation phase preceding the learning phase of the learning session, the pupillary response to the HRS was labeled the HRS in fixation phase, and the response to the LRS was labeled the LRS in fixation phase (Figs. 1G, 3G,H).
During the posttest of the learning phase in each learning session, the pupillary responses evoked by visual stimuli associated with HRs (e.g., 0°) were labeled HRS in posttest, and pupillary responses induced by visual stimuli associated with LRs (referring to 45°, 90°, and 135° in this context) were labeled LRS in posttest. In the pretest of the current learning phase, although no visual stimulus was associated with HRs, the pupillary response to the 0° stimulus was labeled the HRS in pretest, while responses to the other three orientated stimuli were defined as the LRS in pretest. To facilitate comparisons, we used solid (posttest) and dashed (pretest) lines in the figure (Fig. 3E). The HRS in pretest, HRS in posttest, LRS in pretest and LRS in posttest during the fixation phase were defined in the same way as those in the learning phase, although both the HRS and LRS were accompanied by a LR in the fixation phase (Fig. 3F).
Trial process
In the experiment, a trial began when a monkey began fixating on a 0.2° fixation point (FP) presented on a CRT screen. In each trial, the FP was displayed in the center of the screen. The animal’s eye positions were sampled at 120 Hz using an infrared tracking system (ISCAN). Within 300 ms after FP presentation, the animal was required to fixate within an invisible circular window (1° in radius) around the FP. After the animal stably maintained fixation for 1,000 ms, the stimulus was displayed for 500 ms, followed by a blank period of 1,500 ms. The FP then disappeared, and the animal received a certain amount of water (associated with the stimulus presented in the trial) as a reward (Fig. 1E). A trial was aborted without any reward if the animal's fixation moved outside the fixation window.
Visual stimulation
Visual stimuli were generated with a stimulus generator (ViSaGe, Cambridge Research Systems) under the control of a PC running a custom C11 program developed in our laboratory (Wang et al., 2020). Visual stimuli were presented on a 22-inch CRT monitor (Dell, P1230, 1,200 × 900 pixels, mean luminance 45.8 cd/m2, 100 Hz refresh rate; the display color lookup table was calibrated for linearization). The viewing distance for the monkeys was 114 cm. In the experiment, we used four gratings with different orientations (0°, 45°, 90°, 135°) and a blank for a total of five visual stimuli (Fig. 1C). Each of the four orientated stimuli was presented with six different spatial phases: 0°, 60°, 120°, 180°, 240°, and 300°. Each spatial phase corresponds to the relative position of the black and white bars in the stimulus patch. The reward values are related only to stimulus orientations and are unrelated to the spatial phases for each stimulus orientation. By varying the spatial phases in the experiment, we aimed to prevent adaptation caused by repeatedly using the same visual stimuli. Each spatial phase was repeated five times in one block. The shape of the stimuli was a circular grating with a size of 4°. The spatial frequency of the gratings was 1 cycle/degree for DP1 or 2 cycles/degree for DG2 and DQ5. The eccentricity of the stimuli was 7.6° for DG2, 2.5° for DP1, and 2.9° for DQ5. The location information of the stimuli is as follows: x position = −6, y position = −4.6 for DG2; x position = −1.7, y position = −1.8 for DP1; and x position = −2.9, y position = −0.3 for DQ5 (the unit is in degrees). The relationship between the stimulus position and FP position is shown in Figure 1B. Before the study of pupillary responses, we conducted several studies on sensory encoding via chronically implanted array electrodes (Utah arrays) in the V1 of all three animals. The different electrode positions in V1 of the three animals resulted in different spatial locations in the visual field to present visual stimuli covering the receptive fields of the recorded neurons. When we started the pupil study, we chose stimulus positions that the animals were accustomed to.
Data analysis
Pupil measurement
In a dark room, a monitor with uniform brightness was placed 114 cm in front of the monkey. The infrared eye tracker (ISCAN) was placed 0.5 m away from the animal's eyes next to the monitor. We placed the monitor and eye tracker in fixed positions every day, keeping the distance and angle from the monkey unchanged. After the experiment began, the eye tracker recorded the monkey's eye movement position and pupil size. ISCAN outputs were pupil size in voltage, and we converted the voltage output by the eye tracker into pupil size in millimeters as follows: We printed six black discs with diameters of 3−8 mm as artificial pupils on paper. After the end of the experiment each day, we placed these black artificial pupils with different diameters at the monkey's eye position. Then, we measured the sizes of these artificial pupils, keeping the parameter settings and placement position of the eye tracker unchanged. We found a linear relationship between the diameter of the artificial pupil and the pupil size output from the eye tracker, as shown in Equation (1), which is consistent with previous studies (Hayes and Petrov, 2016), where X is the diameter of the artificial pupil and Y is the pupil value output from the eye tracker. We linearly fit the daily pupil values output by the eye tracker with the diameter of the artificial pupil to obtain k and b and applied them to the pupil data recorded in the experiment on that day. The pupil values output by the eye tracker in voltage were converted into pupil sizes in millimeters:
Baseline correction for pupil size
All the data analyses were performed using custom programming in MATLAB (MathWorks). Before the data analysis, we resampled the data recorded by the eye tracker from 120 to 500 Hz. When analyzing the changes in the baseline characteristics of the pupils, we did not perform baseline correction but directly analyzed the pupil response with an average baseline period of −1,000 to −800 ms before stimulus onset as the pupil baseline size. For the analysis of pupil data after the presentation of stimuli, to eliminate baseline fluctuations between trials and facilitate the comparison of pupil changes caused by different cognitive factors in the task under different conditions, baseline correction was performed on the pupil data at this time (Mathôt et al., 2018). The average pupil fluctuation in the baseline period was subtracted from the pupil data for a single trial, as shown in Equation (2), where
PS (pupil size) before reward
We calculated the average pupil response (after the baseline correction) before the reward was given, represented by PS before reward (Fig. 3A,C,G). The time window for PS before reward was chosen between 1.5 and 1.9 s (after stimulus onset) for DG2 and DQ5 and between 0.75 and 1.25 s (after stimulus onset) for DP1.
Normalized temporal fluctuations
We calculated the temporal fluctuations to describe the temporal changes in pupillary responses within a trial (Fig. 3D,H). Within each training block, the pupil response for each trial was divided by the standard deviation of the pupil responses in this block. This process standardized the data across different blocks. Following normalization, the temporal fluctuations in pupillary responses were computed for each visual stimulus in each block (an average of 30 trials). The time window used for calculating temporal fluctuations was from –0.9 to 1.9 s relative to stimulus onset. The normalized fluctuations for different learning stages were then obtained by averaging the fluctuations across multiple blocks in the corresponding learning stages.
Different learning phases
Based on the value of PS before reward for each block, we proceeded to define the first day after learning as the day when the pupillary response induced by the HRS and LRS exhibited a significant difference across all DRBs. In each learning session (corresponding to a specific stimulus associated with a HR value), starting from the first day of training (defined as the first day with a DRB) to the first day after learning, all trials involved in this period were evenly divided into three parts, defined as learning phases 1–3. Learning phase 4 was defined as the experimental period on the second and third days after learning. Learning phase 5 was defined as the period from the fourth day after learning to the final day of the learning session (Fig. 3A). The learning phases for all learning sessions are defined with the same procedure. When comparing the changes in pupillary responses across days, we utilized the datasets from learning phases 1–4. When comparing the changes from pretest to posttest within each day, we utilized the datasets from learning phases 4–5.
Model fitting and evaluation
The pupillary response in reward-associated learning is influenced by both external physical factors and internal psychological factors. To effectively separate multiple factors within the pupillary response, we established a linear model composed of five additive components to predict the measured pupillary response (Eq. 3). The model components included (1) the pupillary response caused by the monkey fixating on the central FP of a blank screen (pupillary response evoked by fixation drift, represented by fixation drift evoked, FDE); (2) the pupillary response induced by the presentation of an external physical stimulus (evoked by stimulus, represented by stimulus evoked, SE), such as the appearance of circular gratings on the screen; (3) the pupillary response triggered by changes in the reward state (reward state evoked, represented by RSE); (4) the animal's estimation and prediction of reward levels associated with different stimuli (evoked by reward expectation, represented by REE); and (5) baseline response (BL) of pupil size (the average value of pupil size during the baseline period, −1,000 to −800 ms) (represented by BL). The coefficients K preceding each model component,
To accurately assess the dynamic curves of the four components in the model excluding the baseline, we used the recorded pupil data from all trials as the dependent variable (PS) after baseline correction and incorporated them into the model. The contribution levels of the four influencing factors (FDE, SE, RSE, and REE) in a single trial were set to two states, 0 and 1. 0 indicates that the factor did not affect the pupil response in the current trial, while 1 indicates that it did have an effect. These four influencing factors are combined into a state matrix W across all trials. Therefore, Equation (3) can be transformed into Equation (4). Through model regression, we can obtain the dynamic curve matrix MC for the four influencing factors (Eq. 5).
After obtaining the estimated dynamic curve matrix MC for different components of the model, we substituted each row of the MC into FDE, SE, RSE, and REE in Equation (3), allowing the contribution levels K of the four factors to vary freely (range: 0 ≤ K ≤ 30). This enabled us to predict the pupil response for all conditions in all blocks. Through model regression, we determined the contribution levels K of the different influencing factors in different blocks. By analyzing the K values of different factors in pupil responses across days or within a single day, we deciphered the timing and patterns of different factors' involvement, helping us gain a better understanding of the animal's reward-associated learning process.
We optimized the model parameters to minimize the mean square error E under constraints P by the MATLAB function fmincon as follows, where
Statistical analysis
All data analyses were performed using custom programming in MATLAB (MathWorks). For the analysis of the merging of multiple blocks, paired t tests were used unless otherwise noted. When presenting each learning session, to compare the differences between the pupil responses evoked by the five visual stimuli, we performed a one-way ANOVA and used the Bonferroni’s method to adjust for multiple comparisons (Figs. 2, 3A,B). To characterize the changes in different components of the model across days, to compare the differences between the three reward conditions [the current HRS (stimulus associated with HRs), the previous HRS, and the LRS (stimulus associated with LRs)] during cross-day learning processes, we also used one-way ANOVA and adjusted for multiple comparisons using the Bonferroni’s method (Figs. 7B, 8B, 9B, 10B, 11B). Significant time points or blocks that passed the tests are marked with solid circles or asterisks above them. To test the learning-related changes in the components, the Pearson correlation coefficient was used to compute the correlation between component gains and blocks (aligned with the first block for training in a learning session) (Fig. 6). All error bars represent the median ± SEM.
Results
To investigate the dynamic changes in pupillary response during reward-associated learning, we designed a reward-based associative learning task (Fig. 1, see details in Materials and Methods) with three animals (DG2, DQ5, and DP1) participating in the experiment. During the experiment, animals needed to maintain a stable fixation (1° within the FP), and their pupil size was detected and measured by an infrared eye tracker (Fig. 1A). The animal initiated a trial by staring at the specified FP (0.2°). One second after the trial was initiated, the visual stimulus appeared in the animal's lower left or left visual field (Fig. 1B shows the stimulus location for each animal) for 500 mm and then disappeared. After 1.5 s, the animal received a certain amount of reward (water) based on the presented visual stimulus (Fig. 1E). Visual stimuli in each trial were chosen from one of five visual stimuli: four gratings with different orientations (0°, 45°, 90°, and 135°) and a blank (Fig. 1C). Within an experimental block, each visual stimulus was presented 30 times, and the presentation order of the visual stimuli was completely random. The reward rules were divided into two types, same reward and different reward (Fig. 1D), and within each block, they were fixed. In the SRB, after the appearance of any one of the five visual stimuli, animals received a small amount of water (Fig. 1D upper panel). In the DRB, when a predetermined grating orientation (HRS, see Materials and Methods for detailed definitions) appeared, the animal received a larger amount of water; when the other three gratings (LRS, see Materials and Methods for detailed definitions) or the blank appeared, the animal received a smaller amount of water (Fig. 1D middle panel). During the 2–3 months of the learning process, each animal underwent a total of 4 associative learning sessions, in which HRS refers successively to 0°, 90°, 135°, and 45°. Each learning session was divided into two phases: the fixation phase and the learning phase (Fig. 1F). The daily experiment consisted of five to eight blocks. In the fixation phase, all blocks throughout the day were SRBs (LR for both HRS and LRS). In the learning phase, animals first experienced two to three SRBs (pretest, LR for both HRS and LRS) within a day, followed by three to five DRBs (posttest, LR for LRS and HR for HRS). The fixation phase usually lasted for 3–4 d before the learning phase began. The pupillary response induced by the HRS and LRS generally differed on the third to fourth day in the learning phase. To determine and enhance the stability of this pupil difference, the learning phase continued for an additional 10–12 d (Fig. 1D lower panel).
Changes in pupil size indicate monkeys' understanding of the association between HRs and visual stimuli
By combining the pupillary responses to HRS and LRS from all four associative learning sessions, we observed that during the fixation phase, both the HRS and LRS induced pupillary contraction followed by dilation, with no significant difference between the two stimulus conditions (first, third, and fifth columns; Fig. 1G). In the learning phase, stimulus presentation continued to elicit pupillary contraction followed by dilation; however, there was a significant disparity in the magnitude of contraction and dilation between the HRS and LRS conditions. Notably, pupillary contraction induced by LRS was greater than that induced by HRS, while pupillary dilation induced by HRS was greater after stimulus presentation (during the blank screen period) preceding trial completion (second, fourth, and sixth columns; Fig. 1G; number of trials in the fixation phase and learning phase for each animal labeled in each subgraph; all marked positions, p < 0.01; paired t test). The patterns of pupillary response changes were similar when animals were learning about the association between HRs and the visual stimulus in each individual orientation (even numbered columns in Fig. 2; number of trials in the different phases for each animal labeled in each subgraph; all marked positions p < 0.01, one-way ANOVA, Bonferroni’s correction). These findings suggest that the changes in pupillary size serve as an indication of the animals' comprehension of the association between HRs and visual stimuli.
Pupillary responses gradually change across days of learning
The changes in pupil size observed from the fixation phase to the learning phase were not immediate but rather evolved gradually over time. We examined the long-term changes in pupillary response by averaging the pupillary responses across different blocks. Each learning session (for an HRS) was divided into six stages: the fixation phase, and learning phases 1–5 (Materials and Methods).
We found that pupillary responses gradually changed over time, and we eventually distinguished grating stimuli associated with HRs (HRS) from other visual stimuli associated with LRs. Several important features are noted in the daily change in the time course of pupillary responses. The pupillary changes that occurred during early associative learning of the grating at 0° and HRs are shown in Figure 3A and B (from the fixation phase to learning phase 4). In the fixation phase, the pupillary response of the animals was induced by external stimuli, and there was little difference in the pupillary response induced by different orientations during this period (first column in Fig. 3B). When animals entered learning phases 1 and 2, the degree of pupil dilation induced by visual stimuli started to increase relative to that in the fixation phase, including in the blank condition (second and third columns in Fig. 3B). This finding suggested that the monkeys started to capture the change in reward rule, but they did not understand the exact association between HRs and the stimulus at 0° (HRS for this learning session). Afterward, when the animals entered learning phase 3, the degree of pupil dilation before the water supply (approximately 1 s–2 s after stimulus onset) was further enhanced until the response to the HRS could be faintly distinguished from that to other visual stimuli (45°, 90°, 135°, and blank; LRS for this learning session) (fourth column in Fig. 3B). Finally, in learning phase 4, the dilation of the pupils induced by the HRS was further enhanced, while the dilation induced by the LRS was weakened, resulting in a significant difference in the pupillary response before the water supply between the HRS and LRS conditions (fifth column in Fig. 3B; number of trials in different stages labeled in each subgraph; all marked positions p < 0.01; one-way ANOVA, Bonferroni’s correction).
The gradual changes in pupillary responses across days suggest changes in cognitive factors during associative learning, which is consistent with the current belief that the pupil can serve as an indicator of the internal state of the brain (Johnston et al., 2022). The average pupillary size before water delivery, denoted as PS before reward (Materials and Methods) in Figures 3A and C, can be employed as an indicator of the learning progress of the animals. The calculation of PS before reward was based on a time window determined by the period of maximum difference in pupillary response evoked by HRS or LRS, which was different for the three animals (time windows: from 1.5 to 1.9 s for DG2 and DQ5 and from 0.75 to 1.25 s for DP1, gray shaded area in Fig. 3B). We also quantified the pupillary responses by calculating the temporal fluctuations in pupillary responses during the entire trial period, which were the same duration (from −0.9 to 1.9 s) for the three animals. To assess changes in the temporal fluctuations in pupillary responses resulting from learning, we computed the normalized fluctuations in the pupillary response within each block, denoted as normalized fluctuations in PS (see Materials and Methods) in Figure 3D and H.
We combined the data from the four learning sessions, and Figure 3C and D illustrate the values of these two characteristics at different stages of learning. As learning progressed from the fixation phase to learning phase 4, the average pupillary size before water delivery induced by HRS (stimuli associated with HRs) gradually increased. On the other hand, the average pupillary size induced by LRS (stimuli associated with LRs) initially increased and then decreased. Consequently, during the fixation phase, there was little difference between the HRS and LRS conditions (Fig. 3C, left panel; three animals, mean ± SEM; PS before reward, fixation; HRS, −0.0015 ± 0.0002; LRS, −0.0012 ± 0.0002; N = 182; p = 0.07; paired t test); however, as the learning phase progressed, the differences gradually became more pronounced (Fig. 3C, left panel, three animals, mean ± SEM; PS before reward, LP4; HRS, 0.0075 ±0.0006; LRS, −0.0001 ± 0.0003; N = 94; p = 1.17*10−27). The temporal fluctuations in pupillary size exhibited similar changes (Fig. 3D, left panel, three animals; mean ± SEM, normalized fluctuation of PS, fixation, HRS, 1.53 ± 0.05, LRS, 1.66 ± 0.04, N = 182, p = 0.06; LP4, HRS, 3.53 ± 0.22, LRS, 1.26 ± 0.04, N = 94, p = 2.32*10−15). These findings indicate that the monkeys eventually comprehended the associative learning between HRs and the HRS. Although the fluctuation in DQ5 during learning phase 4 did not significantly differ between the HRS and LRS groups, the significant differences observed in learning phases 1–3 still support the conclusion that the difference in fluctuations in the pupillary response between the HRS and LRS conditions increases during the learning process (Fig. 3D, right panel; all marked positions in Fig. 3C,D p < 0.05, paired t test).
Change in pupil size within training days
Previous results (Fig. 3A–D) revealed the long-term (across days) variation patterns in the pupillary response (Kaskan et al., 2022) induced by HRS and LRS. We found that similar changes also occurred across trials or blocks within the same day during the learning phase from the pretest (SRBs) to the posttest (DRBs). Employing a similar analysis approach and combining the data from the four learning sessions, we averaged the pupillary data from the postlearning phase (several days after the animals learned the association, LP4–LP5) and the fixation phase. The results revealed that after the animals learned the association between HRs and the stimulus, a change in reward state (from pretest to posttest) led to different pupillary dynamics in response to the same visual stimuli. In the learning phase, there was a significant change in the pupillary response induced by HRS (the stimulus associated with HRs) from pretest to posttest. Specifically, pupillary dilation in response to HRS significantly increased, while the changes in response to LRS (stimuli associated with LRs) were relatively small (Fig. 3E). On the other hand, in the fixation phase, there was almost no change from pretest to posttest (Fig. 3F; see definitions for pretest and posttest of fixation phase in Materials and Methods). These findings further suggest that pupillary dynamics can reflect changes in reward state and the content associated with HRs.
The above results were confirmed by the data shown in Figure 3G and H, which depict the same pupillary characteristics as Figure 3C and D. Regarding the average pupillary size before water delivery, in the learning phase, the pupillary size induced by HRS significantly increased from pretest to posttest (Fig. 3G, left panel; three animals, mean ± SEM, from pretest to posttest; PS before reward, −0.0005 ± 0.0001 to 0.0082 ± 0.0001; N = 309, p = 0), while the LRS-induced pupillary size remained unchanged (Fig. 3G left panel; three animals, mean ± SEM, from pretest to posttest; PS before reward, −0.0014 ± −0.0001 to −0.0014 ± 0.0001, N = 309, p = 0.69), which remained contracted or dilated across different animals (Fig. 3G right panel). In the fixation phase, the direction of change for the HRS and LRS was similar, remaining either unchanged or elevated across different animals (Fig. 3G, left panel; three animals, mean ± SEM, from pretest to posttest, PS before reward, HRS, −0.0017 ± 0.0002 to −0.0017 ± 0.0002, N = 133, p = 0.92; LRS, −0.0017 ± 0.0001 to −0.0014 ± 0.00009, N = 133, p = 0.006). Even if there were changes, the magnitude of change was much smaller than that observed in the learning phase (Fig. 3G). Similarly, for the temporal fluctuations in pupillary dynamics within each trial, HRS conditions increased more from pretest to posttest in the learning phase and less in the fixation phase (Fig. 3H, left panel; three animals, mean ± SEM, from pretest to posttest; normalized fluctuation of PS, learning phase; HRS, 1.89 ± 0.04 to 3.98 ± 0.14; N = 309, p = 3.12*10−38; fixation phase, HRS, 1.34 ± 0.03 to 1.52 ± 0.07, N = 133, p = 0.012). Correspondingly, the LRS conditions exhibited a slight decrease in fluctuation during the learning phase, while the LRS conditions in the fixation phase remained unchanged (Fig. 3H, left panel; three animals, mean ± SEM, from pretest to posttest; normalized fluctuation of PS, learning phase; LRS, 1.17 ± 0.01 to 1.04 ± 0.02, N = 309, p = 1.54*10−8; fixation phase, LRS, 1.26 ± 0.02 to 1.28 ± 0.02, N = 133, p = 0.38). Although the HRS for the DQ5 animal showed a slight decrease from the pretest to posttest in the learning phase, it still elicited a weakly stronger response than did the LRS in the posttest (Fig. 3H, right panel; all marked positions in Fig. 3G,H, p < 0.05, paired t test).
In summary, we found a few important dynamic features across training trials, blocks and days. (1) Pupillary responses were mainly governed by a stimulus-driven pupillary response (for 0°, 45°, 90°, and 135°) and a fixation drift response (for blank stimulus) before the monkeys learned the association between the HRS and HRs (first column in Fig. 3B, F; odd-numbered column in Fig. 2). (2) A new dynamic component appeared in which controlled pupillary responses were induced by all visual stimuli, regardless of the presence of HRS or LRS (second and third columns in Fig. 3B; pretest in Fig. 3E). (3) After a few days of training, another new component that controlled the pupillary responses induced by HRS emerged (fourth and fifth columns in Fig. 3B; posttest in Fig. 3E; even-numbered column in Fig. 2). These results provide a basis for our subsequent model establishment.
Linear model with multiple components explaining pupillary responses
Based on the observations of the changes in pupillary response and the parameter changes involved in the experimental design, we established a linear model to explain pupil responses in associative learning. We believe that there are five factors affecting pupil changes: (1) pupillary response during natural fixation with a blank screen (FDE); (2) pupillary response purely induced by presentation of the visual stimulus (SE); (3) change in pupillary response caused by a change in the reward state (RSE), which depended on whether the current block belonged to a DRB or the SRB; (4) the prediction or expectation for the reward level of different stimuli (reward expectation evoked, REE), which was only related to whether the current trial will be accompanied by large rewards; and (5) baseline pupil responses (the average pupil size immediately after fixation and before stimulus presentation, −1,000 to −800 ms) (BL), representing the awakening state of the animal, which is related to the changes in reward state within a day. These components together control the dynamic changes in pupillary responses during associative learning. The coefficient K preceding each model component, namely,
Initially, using the measured pupil data after baseline correction and a predefined state matrix, we estimated the dynamic response curves of the first four components (FDE, SE, RSE and REE) in our model (see details in Materials and Methods, Eqs. 4, 5). The time courses of these four components are displayed in Figure 4, showcasing significant similarities among the four components across the three animals, particularly the temporal dynamics for the first three components (FDE, SE, and RSE). For instance, FDE (pupillary response evoked by fixation drift) initially exhibited a slight dilation for approximately 0.2 s after gaze fixation and then maintained a contraction trend (second column in Fig. 4). RSE (pupillary response evoked by the different reward states) exhibited the opposite pattern, with an initial contraction for approximately 1.3 s after gaze fixation, followed by gradual dilation after stimulus presentation (sixth column in Fig. 4). SE (pupillary response evoked by visual stimulus), which was influenced by external physical stimuli, also exhibited a similar overall pattern among the three animals, characterized by initial contraction followed by dilation returning to baseline (fourth column in Fig. 4). This consistency among animals indicated that the impact of the state on pupillary responses was similar across different individuals. Additionally, we observed that FDE and RSE underwent changes prior to stimulus presentation (second and sixth columns in Fig. 4), while SE and REE (pupillary response evoked by the expectation for different reward levels) exhibited dynamic alterations following stimulus presentation, aligning well with the assumptions for our linear model (fourth and eighth columns in Fig. 4).
Once the estimated dynamic responses for the four components were obtained (even-numbered columns in Fig. 4), we employed the linear model to fit the measured pupil responses. Detailed information is presented in Fig. 5, which encompasses the actual measurements of pupillary responses for all blocks (Fig. 5A), the model-predicted pupillary responses (Fig. 5B), and the subtle discrepancies between the two (Fig. 5C). To assess the disparity between the true pupil response and the fitted pupil response for each block, we calculated the GOF of each block (see details in Materials and Methods, Eq. 8). The results demonstrated that the five component linear model effectively explained the pupil response for all blocks (Fig. 5D; mean ± SEM; DG2, GOF 0.91 ± 0, N = 634; DP1, GOF 0.86 ± 0.01, N = 374; DQ5, GOF 0.79 ± 0.01, N = 515).
We estimated the contribution of each component in different blocks (odd-numbered columns in Fig. 4; refer to Materials and Methods, Eqs. 3, 6, 7). Analyzing the changes in the gains of different components across days in the model, we did not find any learning-related changes in the components of FDE (Fig. 4, first column), SE (Fig. 4, third column) or BL (Fig. 4, ninth column). Across the four learning sessions, these components did not exhibit variations between the fixation phase and the learning phase or between the HRS and LRS conditions in the learning phase. However, there were strong learning-related changes in the components of RSE (Fig. 4, fifth column) and REE (Fig. 4, seventh column). The variations in gains for RSE and REE were temporally aligned with the fixation phase and learning phase, and the gain in REE was specifically tied to the HRS condition during the learning phase. The statistical analysis confirmed that the learning-related changes were significant only for the components REE and RSE (Fig. 6). There was a positive correlation between the gain in REE for HRSs and blocks, while there was a negative correlation between the REE for LRSs and blocks during the learning phase (Fig. 6D,F; Pearson correlation coefficients, HRS, r = 0.47, p = 4.1*10−10; LRS, r = −0.27, p = 7.3*10−4; N = 156). The RSE component showed a positive correlation with blocks during the learning phase, regardless of the HRS or LRS condition (Fig. 6C,F; Pearson correlation coefficients, HRS, r = 0.21, p = 9.0*10−3; LRS, r = 0.38, p = 7.1*10−7; N = 156). There was no learning-related change in the FDE, SE or BL components during the learning phase (Fig. 6A,B,E,F).
Changes in the gains of different model components
As shown in Figures 4 and 6, the gain in the REE (evoked by reward expectation) component can capture the learning process across days. Next, we analyzed the variations in gains in REEs during interday and intraday changes (Fig. 7). Regarding interday changes, the gain in REE during the fixation phase (represented by light gray shading) approached zero, while during the learning phase (represented by light red shading), it competed among different visual stimuli conditions until the REE gain associated with visual stimuli associated with HRs (HRS) surpassed that of visual stimuli associated with LRs (LRS). This significant difference persisted until the end of the learning phase (Fig. 7A). Additionally, from the second to fourth learning sessions, we observed that during the early stages of the learning phase, the gain in REE for the previous HRS was greater with a stronger competitive advantage. However, this competitive advantage gradually decreased as learning progressed and was replaced by an increasing gain in REE for the current HRS. After approximately 8–10 blocks, the gain in REE for the current HRS established absolute dominance (Fig. 7A, left panel in Fig. 7B). Upon transitioning from the learning phase to the next fixation phase, the gain in the REE component elicited by the current HRS decreased to near-zero levels in one or two blocks, similar to the LRS conditions and the previous HRS condition (Fig. 7A, right panel in Fig. 7B; N = 3 learning sessions; all marked positions p < 0.05 in Fig. 7B, one-way ANOVA, Bonferroni’s correction). The dynamic change in REE gains during the entire association learning process suggested that the REE component may represent the animal's ability to predict and evaluate upcoming rewards following the presentation of different visual stimuli.
This viewpoint is further supported by the analysis of the intraday changes in REE gain (evoked by reward expectation). When comparing results from different animals, we found that the gain in the REE component remained stable at near-zero levels for LRS conditions but increased from near-zero to a higher level for HRS conditions during the transition from pretest (light gray shaded area to the left of the vertical dashed line) to posttest (light red shaded area to the right of the vertical dashed line) in the learning phase (Fig. 7C, right panel in Fig. 7D). Conversely, when the reward status remained unchanged during the fixation phase from the pretest (light gray shaded area to the left of the vertical dashed line) to the posttest (light gray shaded area to the right of the vertical dashed line), both the HRS and LRS conditions exhibited fluctuations toward zero (Fig. 7C, left panel in Fig. 7D). Moreover, even before the transition of reward status (pretest), we observed an increasing trend in the gain in the REE component for HRS conditions (Fig. 7C, right panel in Fig. 7D; number of days in different phases of each animal labeled in each subgraph; all marked positions p < 0.05 in Fig. 7D paired t test), which suggests the animals' anticipation of high-level rewards. These features of the REE component align with our model hypothesis and represent the animal's ability to predict and evaluate future rewards following the presentation of different visual stimuli.
We conducted a similar analysis of the variations in the gain of RSE (evoked by reward states) across days and within a day. The results revealed that the gain in RSE was correlated solely with the reward state and was independent of the stimulus content (Fig. 8). First, regarding the interday changes, we observed that the gain in RSE fluctuated around zero during the fixation phase (similar to that in REE). Upon entering the learning phase, the gain rapidly increased in the current HRS condition, LRS conditions, and previous HRS condition, but these conditions did not significantly differ (Fig. 8A, left panel in Fig. 8B). During the transition from the learning phase to the next fixation phase, the RSE gains for all three conditions decreased to near-zero rapidly (Fig. 8A, right panel in Fig. 8B; N = 3 learning sessions; all marked positions p < 0.05 in Fig. 8B; one-way ANOVA, Bonferroni’s correction). The intraday changes in RSE also exhibited consistent characteristics. When the reward state changed (from pretest to posttest) during the learning phase, both the HRS and LRS conditions increased from zero to a higher level (Fig. 8C, right panel in Fig. 8D). On the other hand, when the reward states remained unchanged (from pretest to posttest) during the fixation phase, both the HRS and LRS conditions consistently remained near zero (Fig. 8C, left panel in Fig. 8D; number of days in different phases of each animal labeled in each subgraph; all marked positions p < 0.05 in Fig. 8D, paired t test). Although there may have been significant differences between the HRS and LRS conditions in some blocks during the posttest of the learning phase, these differences did not affect the conclusion that RSE was highly correlated with reward states.
In addition to RSE (evoked by reward states) and REE (evoked by reward expectation), we also analyzed the variation characteristics of the gains in the other three components, FDE (evoked by fixation drift), SE (evoked by stimulus), and BL (BL). These analyses are presented in Figures 9⇓–11, respectively. FDE, SE, and BL did not exhibit any consistent relationship with reward-associated stimuli or reward rules. In the across-day transition from the fixation phase to the learning phase, the gain in FDE remained unchanged, showing no significant differences between the current HRS, previous HRS, and current LRS conditions (Fig. 9A, left panel in Fig. 9B). The same pattern persisted during the transition from the learning phase to the fixation phase (Fig. 9A, right panel in Fig. 9B). In the within-day transition from pretest to posttest, FDE also maintained a relatively stable value, regardless of whether it was during the fixation phase or the learning phase (Fig. 9C,D).
The SE component exhibited a pattern similar to that of FDE, with no abrupt changes due to reward-related content or states (Fig. 10; all marked positions p < 0.05 in Figs. 9B, 10B; one-way ANOVA, Bonferroni’s correction; all marked positions p < 0.05 in Figs. 9D, 10D, paired t test).
Although BL did not exhibit learning-related interday changes in the long term (Figs. 11A,B; N = 3 learning sessions; all marked positions p < 0.05 in Fig. 11B; one-way ANOVA, Bonferroni’s correction), it gradually increased within each training day (for both the fixation phase and the learning phase) from pretest to posttest (Figs. 11C,D; number of days in different phases of each animal labeled in each subgraph; all marked positions p < 0.05 in Fig. 11D; paired t test). Therefore, BL may reflect arousal (Loewenfeld and Lowenstein, 1993; Aston-Jones and Cohen, 2005; Wang et al., 2018) or the satisfaction level of the animals (Skaramagkas et al., 2023), and it is correlated with their overall water intake throughout the day. The greater the water intake is, the greater the animal's satisfaction, resulting in higher baseline values of pupil size.
Discussion
By recording a large amount of data for pupillary dynamic changes during the associative learning of visual stimuli and reward size (under the Pavlov experimental paradigm) in macaques, we successfully separated five components from pupillary responses. We found that these components corresponded to various cognitive factors in the learning process. We further described in detail the changes in these cognitive factors across and within days during the entire associative learning process. This study provides objective parameters/indicators for various cognitive factors in the learning process, which will enhance the impact of pupil measurement in medical, educational, and animal studies.
Comparison to previous studies
Many previous studies have focused exclusively on analyzing specific cognitive functions within specific cognitive states, such as extracting the intensity of internal signals (e.g., attention and decision-making) from the pupil once the subject becomes familiar with an operant task (Hoeks and Levelt, 1993; Wierda et al., 2012; De Gee et al., 2014; Kang and Wheatley, 2015; Denison et al., 2020) or separately dissecting visual, auditory, or emotional aspects from the pupil in independent experiments (Korn and Bach, 2016; Korn et al., 2017). The models in these studies typically simulate only the dilation of the pupil, representing the activation of the sympathetic pathway driven by internal signals (Mathôt, 2018). Our research concurrently scrutinized pupillary activities associated with a variety of cognitive functions. This deepens our understanding of how pupillary responses are jointly controlled by multiple cognitive processes.
Previously, a few studies have been able to track pupil changes over several hours during the learning process (Koenig et al., 2018; Van Slooten et al., 2018). They found that pupil fluctuations are related to behavioral choices and value predictions or decode past prediction errors from the pupil responses of human subjects. A recent study in primates tracked pupillary responses over several days (Kaskan et al., 2022). This study revealed that animal pupils exhibit learning effects. As learning progresses, the pupil dilation elicited by stimuli associated with HR probabilities gradually intensifies, and the disparity between responses to high and low reward probability stimuli increasingly widens. Our results are highly consistent with these findings (Figs. 1–3). Our study has further shown that the pupillary responses can be used to track the continuous learning for months and demonstrated a way to separate different cognitive factors during this long-term learning. We not only modified the reward rules repeatedly but also altered the animals' learning content multiple times. By utilizing the simple learning paradigm, we demonstrated the credibility of using pupillary responses in investigating cognition within intricate learning environments.
Additionally, several studies have focused on the influence of reward feedback on the pupillary response during the learning process (Van Slooten et al., 2018; Rothenhoefer et al., 2021). They found that reward delivery causes pupil dilation before learning about the association between cues and rewards; however, after the association is established, rewards no longer cause pupil dilation, indicating that prediction error during the learning process is a significant factor in pupil dilation (Rothenhoefer et al., 2021). In contrast, our present study, which focused on the relationship between pupillary response and learning before reward delivery, showed that the main factors related to changes in pupils before and after learning were the prediction of reward levels and changes in reward rules (Figs. 6–8). The difference between our study and previous studies (Van Slooten et al., 2018; Rothenhoefer et al., 2021) is due to the use of different time periods relative to reward delivery for characterizing pupillary responses. It will be interesting to compare pupillary responses before and after reward delivery in future research.
Neural mechanisms for pupillary responses
Pupillary responses during reward-based associative learning may involve distinct pupillary control circuits. In previous studies, attempts have been made to categorize the factors influencing pupil dilation into three levels: low (e.g., light reflex, near response), middle (alerting and orienting), and high (executive functioning) (Strauch et al., 2022). Each type of factor corresponds to neural networks operating at different hierarchical levels. At the low level, the sympathetic and parasympathetic nervous systems directly control the contraction and dilation of the pupil by regulating the pupillary muscles (Loewenfeld and Lowenstein, 1993; Szabadi, 2012). At the middle level, the subcortical network comprising the locus coeruleus (LC), superior colliculus (SC), and basal forebrain modulates pupillary responses through the release of noradrenaline (NE) from the LC's noradrenergic neurons and acetylcholine (ACh) from cholinergic neurons in the basal forebrain, impacting the dilation and contraction of the pupil (Aston-Jones and Cohen, 2005; Wang and Munoz, 2015; Joshi et al., 2016; Mathôt, 2018; Joshi and Gold, 2020). At the high level, the complex cognitive systems that oversee decision-making, such as the prefrontal cortex (PFC) and frontal eye fields (FEFs), indirectly impact the pupillary response through subcortical pathways (Huerta et al., 1987; Kim and Lee, 2003; Aston-Jones and Cohen, 2005).
The components we identified in our model may correspond to neural activities at different levels. The SE component, which reflects the impact of physical stimuli on pupillary responses, likely corresponds to the low-level neural activity analogous to the pupillary light reflex (Gamlin et al., 2007; Joshi and Gold, 2020). The RSE component, which reflects the influence of the state of the reward rule on pupillary responses, and the BL component, which reflects the influence of satisfaction (Skaramagkas et al., 2023) or arousal (Loewenfeld and Lowenstein, 1993; Aston-Jones and Cohen, 2005; Wang et al., 2018), may involve subcortical nuclei such as the LC and SC, representing middle-level neural activity. The REE component, associated with animals' evaluation and prediction of the associative value of the current stimulus, presumably involves neural activity in decision-related brain regions, consistent with high-level neural activity. These speculations are valid, as dopaminergic neurons, which play a crucial role in reward-based associative learning (Bouret et al., 2012; Varazzani et al., 2015), not only densely project to subcortical nuclei such as the LC and SC (Aston-Jones and Cohen, 2005; Sara, 2009) but also engage in reciprocal communication with higher cognitive brain regions, including the PFC (Karreman and Moghaddam, 1996; Seamans and Yang, 2004; Sara, 2009). This neural connectivity provides important anatomical support for pupillary responses to different cognitive functions during reward-based associative learning.
Individual differences
In this study, we observed individual differences in pupillary responses among the three monkeys. These individual differences primarily manifested in the magnitude of pupil size changes. Specifically, we found that during reward-associated learning, the magnitude of pupil dilation before the water reward relative to baseline was approximately 5–10% for DG2 and DP1, while DQ5 exhibited a smaller dilation of approximately 2% (Figs. 1, 2). Consequently, this led to higher noise in the pupil data of DQ5 and resulted in a poorer fit of the model compared to the other two animals (Fig. 5).
We also discovered distinctiveness in the components influencing pupillary responses among individual animals (Fig. 4). While the temporal influence of each component on the pupillary response displayed a remarkable consistency across animals, certain model components exhibited variations. For instance, the SE (evoked by stimulus) component demonstrated only one instance of contraction in DG2 and DP1, whereas it contracted twice in DQ5 (fourth column in Fig. 4). This discrepancy may be attributed to differences in stimulus location (Clarke et al., 2003; Hu et al., 2020). Considering the schematic diagram, we observed that the boundaries of the stimulus and gaze position overlapped in DQ5, potentially leading to two contractions of the pupil (Fig. 1B). Alternatively, in DP1 and DQ5, the REE (evoked by reward expectation) component exhibited two dilations, while it only dilated once in DG2 (eighth column in Fig. 4). This might reflect temporal discrepancies in the animal's cognitive decision-making processes. Animals need to make two decisions to predict the forthcoming reward associated with the current stimulus: first, to determine if the current stimulus is the expected HRS, and second, to evaluate the anticipated reward based on the current stimulus. The interval between the two decision-making processes determines whether two pupillary dilations occur in the REE component. If the interval is short, the superposition of the two pupillary responses manifests as a single dilation. Conversely, if the interval is longer, two dilations can be observed.
Although statistical analysis often combines data from multiple subjects, the significant interindividual disparities observed in our results highlight the importance of considering individual psychological decision-making processes when utilizing pupillary measurements to represent cognitive processes, which is consistent with studies involving human subjects (Willems et al., 2015b; Denison et al., 2020).
Limitations of the study
Although our study showed that the pupillary response changed during the learning process, it did not directly participate in the learning itself. Instead, changes in pupillary responses indirectly reflect changes in internal states related to learning processes, such as changes in attention (Smallwood et al., 2011; Wierda et al., 2012) or emotional states (Skaramagkas et al., 2023). In addition, the experimental paradigm in our study does not force animals to actively provide behavioral responses, which is different from other studies requiring animals to perform cognitive tasks (e.g., to actively saccade to the target object) (Rothenhoefer et al., 2021; Kaskan et al., 2022). Such differences in tasks might affect the number or degree of internal states involved in changes in pupillary responses. Furthermore, the measured pupillary changes in our study, which involved limited eye movements, differed from what was observed in a natural way/state, which has no limitation on eye movement. This may also lead to different pupillary responses related to the cognitive process. Therefore, studies on changes in pupillary response during natural learning or task states would be interesting in the future.
Moreover, our study documented only the pupillary response during the process of reward-based associative learning. To uncover the neural mechanisms underlying the control of the pupillary response in this process, a multidisciplinary approach combining fields such as chronic electrophysiology recordings, two-photon images, functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) can be employed. By using these techniques, we can further identify the relevant brain regions, utilize optogenetics and fluorescence labeling to track the activity of different neurotransmitters (such as dopamine, norepinephrine, and acetylcholine), and explore the connectivity between these pertinent brain regions. These integrative approaches from various disciplines can facilitate a more profound exploration of the relationship between pupillary response and cognitive functions such as decision-making, attention, and emotion.
Footnotes
This work was supported by STI2030-Major Projects (2022ZD0204600), National Natural Science Foundation of China [Grants 32171033 (D.X.) and 32100831 (T.W.)] and the Fundamental Research Funds for the Central Universities.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dajun Xing at dajun_xing{at}bnu.edu.cn.