Human orbitofrontal cortex is necessary for behavior based on inferred, not experienced outcomes

SUMMARY Decisions are typically guided by what we have experienced in the past. However, when direct experience is unavailable, animals and humans can imagine or infer the future to make choices. Outcome expectations that are based on direct experience and inference may compete for guiding behavior [1, 2], and they may recruit distinct but overlapping brain circuits [3–5]. In rodents, the orbitofrontal cortex (OFC) contains neural signatures of inferred outcomes and is necessary for behavior that requires inference, but it is not necessary when responding can be based on direct experience [6–10]. In humans, OFC activity is also correlated with inferred outcome expectations [11, 12], but it is unclear whether the human OFC is selectively required for inference-based behavior. To test this, here we used non-invasive targeted continuous theta burst stimulation (cTBS) [13] to inactivate the human OFC in a sensory preconditioning task designed to isolate inference-based behavior from responding that can be based on direct experience [6, 12, 14]. We show that OFC-targeted cTBS disrupts reward-related behavior only in conditions in which outcome expectations have to be mentally inferred, but that it does not impair behavior that can be based on stimulus-outcome associations that were directly experienced. These findings suggest that OFC is necessary for decision making when outcomes have to be mentally simulated, providing converging cross-species evidence for a critical role of OFC in model-based but not model-free behavior.

between the second cue in each pair and an outcome (Bodor reward, Dodorless air). 71 Continuous theta burst stimulation (cTBS) was administered at 80% and 5% resting motor 72 threshold in the STIM and SHAM group, respectively. During the probe test, participants were 73 asked to make outcome predictions to all cues, but no outcomes were provided. (b) Participants 74 rated the pleasantness (left) and intensity (right) of food odors and odorless air, which did not 75 differ between groups (p's>0.14). (c) The percentage of trials in which an odor reward was 76 expected after cue B increased across time during conditioning, and there were no group 77 differences. Error bars depict SEM (n=23 SHAM, n=24 STIM). 78 79 Critically, after conditioning and immediately prior to the probe test, we applied 40 seconds of 80 cTBS to a site in lateral prefrontal cortex (LPFC) that was individually selected to have maximal 81 resting-state fMRI connectivity with the OFC, following previously established procedures [13]. 82 Stimulation was administered in the STIM group at a high intensity that we have previously shown 83 that was not expected to produce any impact on neural function [13]. 85 86 We hypothesized that OFC-targeted stimulation would selectively disrupt reward expectations 87 based on inference, but not direct experience. In line with this, we found significantly reduced 88 responses to cue A in the STIM relative to the SHAM group (t(45)=2.36, p=0.023, Fig. 2a Error bars depict SEM (n=23 SHAM, n=24 STIM) and * depicts p<0.05. 105 reported above reflect a failure of memory rather than inference. To rule out this potential 108 explanation, we measured recognition memory for cue-cue associations after the probe test, and 109 compared it between groups. Recognition memory was significantly above chance in both groups 110 (SHAM: t(21)=5.01, p=0.00006; STIM: t(20)=2.70, p=0.013), and there was no group difference 111 (t(41)=1.34, p=0.188). This indicates that OFC was not necessary for remembering the cue-cue 112 associations required for outcome inference. 113

114
Our results show that OFC in humans is necessary for behavior when outcome expectations need 115 to be mentally simulated, but that it is not required for similar behavior when expectations can be 116 based on direct experience. This closely parallels previous findings from rats [6], providing 117 converging cross-species evidence for a critical role of OFC in model-based but not model-free 118 behavior. By extension, this finding implies that the value correlates previously observed in OFC 119 across different species [15][16][17][18][19] may only be required for choice when these values have to be 120 mentally inferred, but not when they can be based on direct experience. This suggests that the 121 contribution of OFC to decision making is much more specific than previously thought, and that 122 choices based on direct experience may rely on computations in areas other than OFC, such as 123 the striatum [20]. 124

125
Our results also demonstrate a contribution of human OFC to model-based behavior that occurs, 126 at least in part, at the time of decision making. This effect rules out the possibility that inference-127 based behavior depends exclusively on memory integration or other replay-like mechanisms that 128 occur during learning [21]. While such so-called rehearsal or mediated learning may also 129 contribute to adapting behavior when direct experience is not available, perhaps via recruitment

ACKNOWLEDGMENTS 143
The authors thank Rachel Reynolds, Devyn E. Smith, and Kelly Vogel for help with data collection, 144 and Molly Hermiller for technical support related to TMS. This work was supported by grants from Visual cues consisted of 14 abstract symbols and 12 of them were randomly grouped into six 172 pairs for each participant, of which two served as A1-B1 pairs, two served as A2-B2 pairs, and 173 two served as C-D pairs. The two remaining symbols were used to form two catch-trial pairs (E-174 E) in which the same symbols were presented twice in a row (i.e., E1-E1, E2-E2). The two 175 symbols constituting a pair were presented in different colors (e.g., first symbol blue, second 176 symbol green; counterbalanced across participants). 177 178 Eight food odors (four sweet: strawberry, caramel, gingerbread, and yellow cake; four savory: 179 potato chip, pot roast, garlic, and pizza) were provided by Kerry (Melrose Park, IL) and 180 using a custom-built and computer-controlled olfactometer [12,13]. The olfactometer was 182 equipped with two independent mass flow controllers (Alicat, Tucson, AZ), which allow dilution of 183 any given odorant with odorless air. Odorless air was delivered constantly during the experiment 184 and odorized air was mixed into the airstream at specific time points. The overall flow rate was 185 kept constant at 3.2 L/min throughout the task, such that odor deliver did not involve a change in 186 overall airflow or somatosensory stimulation. In each trial, participants were presented with one of eight food odors in a randomized order and 197 each odor was delivered for 2 seconds. Participants were instructed to make a medium sniff and 198 then rate the pleasantness of the delivered odor on a scale from "Most disliked sensation" to "Most 199 liked sensation". We then selected one sweet and one savory odor that were both rated as 200 pleasant (i.e. pleasantness above neutral) and as closely matched as possible. The two selected 201 odors were then used as reward for that individual participant in the main task session. If no such 202 two odors were found, participants were excluded from further participation in the study. Next, 203 participants rated the intensity and pleasantness of the two odors as well as odorless air. The 204 scale of the intensity rating was from "Undetectable" to "Strongest sensation imaginable". 205 206 p=7.38 × 10 −11 ; STIM: t(23)=12.97, p= 4.59 × 10 −12 ), but did not differ between groups (2-way 208 ANOVA, main effect of group, F(1,45)=2.29, p=0.137; odor by group interaction, F(1,45)=0.37, 209 p=0.544). In addition, rated intensity was significantly higher for food odors vs. odorless air (SHAM, 210 t(22)=11.62, p=7.38 × 10 −11 ; STIM, t(23)= 12.97, p=4.59 × 10 −12 ), but did not differ between groups 211 (2-way ANOVA, main effect of group F(1,45)=0.11, p=0.744; odor by group interaction 212 F(1,45)=0.11, p=0.747). 213

MRI and TMS Motor Threshold Session 215
We first acquired a T1-weighted structural MRI scan for the purpose of TMS neuronavigaton and 216 an 8.5 minutes resting state fMRI (rs-fMRI) scan for individually defining OFC-targeted stimulation 217 coordinates. Immediately after the scan, we measured resting motor threshold (RMT) by 218 delivering single pulses at left motor cortex. RMT was defined as the minimum percentage of 219 stimulator output necessary to evoke 5 visible thumb movements in 10 stimulations. 220

Main Task Session 222
The main task session consisted of preconditioning, conditioning, TMS, probe test, and cue-cue 223 pair recognition test. In four preconditioning runs, participants were instructed to learn the 224 associations between the two cues in each pair (AB [A1B1, A2B2], CD [C1D1, 225 C2D2], EE). The cues in a pair were presented one after another for 3 s each, separated by 226 a delay of 300 ms. A fixation cross appeared between trials for a variable duration between 3 and 227 11 s. To ensure attention to the cue pairs, participants were instructed to memorize the cue pairs, 228 press a button if the second cue was different from the first, and withhold a response if the two 229 cues were identical. To facilitate learning, in the first two runs of preconditioning, each cue pair 230 was repeated three times in a row. In the remaining preconditioning runs, the order of cue pairs 231 was randomized.

(cues B [B1, B2] and D [D1, D2]) was presented individually for 3,000 ms. Participants were 235
instructed to indicate by button press which outcome (e.g. strawberry [SB], garlic [GA], or no odor 236 [NO]) they expected following the cue. If they expected strawberry, they were asked to select "SB"; 237 if they expected garlic, they were asked to select "GA"; If they expected no odor, they were asked 238 to select "NO". Participants made their prediction by pressing a button with the index, middle or 239 ring fingers of their right hand corresponding to the positions of "SB", "GA" and "NO" on the screen. 240 The positions of abbreviated names were randomized across trials to dissociate specific motor 241 responses from outcome predictions. Irrespective of their selection, the outcome was always 242 presented for 2,000 ms immediately after the cue. However, "too slow" was displayed if 243 participants failed to respond within 3,000 ms. Each cue-outcome association was repeated four 244 times in each run in pseudorandomized order, resulting in 12 repetitions total. 245 246 After the conditioning phase, participants received cTBS over the individually selected LPFC 247 coordinate (see details below). The probe test followed immediately after the stimulation. In the 248 probe test, cues A (A1, A2), B (B1, B2), C (C1, C2), D (D1, D2) were presented individually in 249 extinction conditions (odorless air was delivered throughout). Each cue was presented four times 250 in pseudorandomized order. Participants were instructed to predict the outcome after each cue, 251 as they did during the conditioning phase. They were further instructed to use the cue-cue 252 associations to infer the outcomes associated with the preconditioned cues [12]. The durations of 253 cue and the interval between trials were exactly the same during the conditioning phase. 254

255
Following the probe test, participants were tested for their memory of the cue-cue associations in 256 a recognition task. Participants were presented with the original cue pairs as well as recombined 257 pairs consisting of cues belonging to different pairs. Pairs were presented sequentially as during preconditioning, and participants were asked to indicate whether a pair was old or recombined 259 after the second cue was presented using a button press. 260 coil (MagVenture A/S, Farum, Denmark). We used a cTBS protocol involving a 40 second train 264 of 3-pulse 50 Hz bursts delivered every 200 ms [5 Hz], totaling 600 pulses [27]. Stimulation was 265 delivered at an intensity of 80% MT in the STIM group and 5% MT in the SHAM group. The target 266 coordinate was defined as a location in the lateral prefrontal cortex (LPFC) that showed maximal 267 functional connectivity with the orbitofrontal cortex (OFC) seed coordinate (see details below). 268 The orientation of the coil was tilted such that the long axis of the figure-of-eight coil was 269 approximately parallel to the long axis of the middle frontal gyrus. All participants were informed 270 that they may experience muscle twitches in the forehead, eye area, and jaw during stimulation. 271 We delivered two single test pulses to demonstrate the potential muscle twitches and test for 272 tolerability before cTBS was delivered. Immediately after the last pulse of cTBS, the time was 273 noted. All subsequent tests took place within 30 minutes of the end of stimulation. 274

MRI data acquisition 276
The MRI data were acquired at the Northwestern University Center for Translational Imaging (CTI) 277 using a Siemens 3T PRISMA system equipped with a 64-channel head coil. rs-fMRI scans were 278 acquired with an echoplanar imaging (EPI) sequences with the following parameters: repetition 279 time(TR), 2 s; echo time (TE), 22 ms; flip angle, 90˚; slice thickness, 2mm, no gap; number of 280 slices, 58; interleaved slice acquisition order; matrix size, 104 x 96 voxels; field of view, 208 mm 281 x 192 mm; multiband factor, 2. To minimize susceptibility artifacts in the OFC, the acquisition 282 plane was tilted approximately 25˚ from anterior commissure (AC)-posterior commissure (PC) 283 the parietal lobes. In addition, a 1 mm isotropic T1-weighted structural scan was collected. 285 286 fMRI data preprocessing 287 Functional image preprocessing was performed using Statistical Parametric Mapping (SPM12, 288 https://www.fil.ion.ucl.ac.uk/spm/). To correct for head motion during scanning, all rs-fMRI images 289 were aligned to the first acquired image. The mean realigned images were then co-registered to 290 the anatomical image, and the resulting registration parameters were applied to all realigned EPI 291 images. Finally, co-registered EPI images were smoothed with a 6 x 6 x 6 mm Gaussian kernel. 292 To generate forward and inverse deformation fields, the anatomical image was normalized to 293 Montreal Neurological Institute (MNI) space using the 6-tissue probability map. 294 295

Coordinate selection for OFC-targeted TMS 296
Individualized stimulation coordinates on the right LPFC were determined based on rs-fMRI 297 connectivity with a right central/lateral OFC seed region using a procedure described previously 298 [13]. Briefly, we first created two spherical masks of 8-mm radius around a LPFC target coordinate 299 (x=48, y=38, z=20) and a OFC seed coordinate (x=28, y=38, z=-16) in MNI space, both inclusively 300 masked by the gray matter tissue probability map provided by SPM12 (thresholded at >0.1). 301 These masks were then un-normalized to each participant's native space using the inverse 302 deformation field generated by the normalization of the anatomical images. We then estimated a 303 general linear model with the rs-fMRI time series in the inverse-normalized OFC sphere as the 304 regressor of interest and realignment parameters as regressors of no interest. The voxel in the 305 inverse-normalized LPFC mask that had highest functional connectivity with the OFC seed was 306 defined as stimulation coordinate. 307 308