Figure 2. Rat performance in fixed-choice and free-choice blocks. *A*, A representative example of a rat's performance. Blue, red, and orange vertical lines indicate individual choices for left, right, and choice tones, respectively. A sequence of blocks consisted of two single fixed-choice blocks (left- or right-tone trials), two double fixed-choice blocks (mix of left- and right-tone trials), and four free-choice blocks with different reward probabilities (choice-tone trials). Bottom, Blue, red, and orange represent the probability of a left choice for each tone (average of the last 20 trials). When the choice frequency of the action that was associated with higher reward probability reached 80%, the block was changed. “e” indicates an extinction test, consisting of 5 trials without reward delivery. This block sequence was repeated two or three times in one day recording sessions. *B*, The average left-choice probability during extinction tests (five unrewarded trials for each cue tone) with 95% confidence intervals (shaded bands). Left choice probabilities for left tones, right tones, and choice tones are plotted in blue, red, and orange colors, respectively. Left-choice probabilities for choice tone trials were separated by the optimal action in the previous free-choice block (the upper graph for left, the lower graph for right). *C*, Averages of left-choice probabilities over five extinction trials for left tone (blue), choice tone (orange), and right tone (red) with 95% confidence intervals. Top and bottom orange plots represent the average of the upper and lower orange graphs, respectively in *B*. ****p* < 0.0001 (χ^{2} test). *D*, The decision tree for choice tones, the left choice probability for all possible experiences in one and two previous trials. Four types of experiences in one trial [left or right times rewarded (1) or no reward (0)] are represented by different colors and line types. For instance, left probability after L1 is indicated by the right edge of a blue solid line (green arrow), and left probability after L1 R0 (L1 and then R0) is indicated by the right edge of a red broken line connected to the blue solid line (brown arrow). Values of trials = 0 (*x*-axis) represent the left choice probability for all trials. Shaded bands indicate 95% confidence intervals. *E*, *F*, Decision trees for left tones and right tones, respectively. Conditional left choice probabilities for left-tone (*E*) and right-tone trials (*F*) in single- and double-fixed blocks are represented in the same manner as in *D*. *G*, Accuracy of each model in predicting rat choices. Prediction accuracy was defined by the normalized likelihood of test data. Free parameters of each model were determined by maximization of the likelihood of training data. Markov *d* stands for *d*th Markov model, a standard prediction model from the past *d* trials. Q, FQ, and DFQ indicate variations of reinforcement learning models. Numbers followed by the name of models indicate the numbers of free parameters of each model. “const” means that the parameters (α_{1,} α_{2}, κ_{1}, and κ_{2}) were assumed to be constant for all sessions, and “variable” means that the parameters were assumed to vary. ***p* < 0.01, significant difference from the prediction accuracy of FQ-learning (variable) (paired-sample Wilcoxon's signed rank tests). **p* < 0.05, significant difference from the prediction accuracy of FQ-learning (variable) (paired-sample Wilcoxon's signed rank tests). *H*, An example of predictions of rat choices based on the FQ-model with time-varying parameters. Top, Green line indicates *P*_{L}(*t*) = L), the probability that a rat would select left at trial *t*, estimated from the rat's past experiences e(1), e(2), …, e(*t*−1). Vertical line indicates the rat's actual choice in each trial. Top lines and bottom lines indicate left and right choices, respectively. Black and gray represent reward and no-reward trials, respectively. Middle, Estimated action values, *Q*_{L} and *Q*_{R}. Bottom, Estimated κ_{1} and κ_{2}.