Abstract
Humans are much better in relative than in absolute judgments. This common assertion is based on findings that discrimination thresholds are much lower when measured with methods that allow interstimuli comparisons than when measured with methods that require classification of one stimulus at a time and are hence sensitive to memory load. We now challenged this notion by measuring discrimination thresholds and evoked potentials while listeners performed a two-tone frequency discrimination task. We tested various protocols that differed in the pattern of cross-trial tone repetition. We found that best performance was achieved only when listeners effectively used cross-trial repetition to avoid interstimulus comparisons with the repeated reference tone. Instead, they classified one tone, the nonreference tone, as either high or low by comparing it with a recently formed internal reference. Listeners were not aware of the switch from interstimulus comparison to classification. Its successful use was revealed by the conjunction of improved behavioral performance and an event-related potential component (P3), indicating an implicit perceptual decision, which followed the nonreference tone in each trial. Interestingly, tone repetition itself did not suffice for the switch, implying that the bottleneck to discrimination does not reside at the lower, sensory stage. Rather, the temporal consistency of repetition was important, suggesting the involvement of higher-level mechanisms with longer time constants. These findings suggest that classification is based on more automatic and accurate mechanisms than interstimulus comparisons and that the ability to effectively use them depends on a dynamic interplay between higher- and lower-level cortical mechanisms.
Introduction
One of the hallmarks of human perception is our remarkable discrimination abilities along simple physical dimensions. When performance is assessed with protocols that provide two stimuli between which participants are asked to compare (e.g., “which of the two stimuli is larger?”), measured thresholds are impressively low. This observation was reported for many dimensions, including frequency, intensity, and duration of sounds in the auditory modality (Wright et al., 1997; Karmarkar and Buonomano, 2003; Ulrich et al., 2006) and orientation, position, and contrast of stimuli in the visual modality (Regan, 1985; Regan and Beverley, 1985; Lages and Triesman, 1998; Ulrich et al., 2006). In contrast to this remarkable ability in relative judgment tasks, absolute judgments, in which participants are asked to classify a single stimulus with respect to the same dimensions, are typically quite crude, particularly when assessed with a broad range of stimuli (Garner, 1953; Pollack, 1953; Miller, 1956; Laming, 2004; Stewart et al., 2005).
In an attempt to assess the best thresholds that human performance can achieve, most studies use relative judgment protocols, asking participants to discriminate between sequentially presented stimuli. A relatively common protocol in psychophysical studies is therefore the two-interval two-alternative forced-choice paradigm (2AFC) (Green and Swets, 1966; Leek, 2001; Macmillan and Creelman, 2005), in which participants are asked “which of the two stimuli is longer (or higher, denser, etc.)?”. The common use of this method is based on two assumptions. The first is that our brains can implement interstimulus comparisons. This assumption was directly assessed and supported by electrophysiological studies in monkeys, showing that comparisons can be implemented by a distributed network that includes the prefrontal cortex and the premotor cortex, which can actively retain the information regarding the first stimulus during the temporal interval between the two stimuli (“delayed activity”) (Romo et al., 1999, 2004; Hernández et al., 2002; Zaksas and Pasternak, 2006). The second assumption is that these comparisons can be performed accurately with no loss of information, namely, that the comparison process itself does not impair thresholds.
In parallel to this main dogma, several studies noted that relative judgment procedures do not always yield better performance (Nachmias, 2006; Lapid et al., 2008; Yeshurun et al., 2008). In fact, 60 years ago, Harris (1948) showed that the sequence of stimulus pairs in a block affects performance. Thus, performance is substantially better when a consistent reference stimulus is one of the stimuli in every trial than when the two stimuli in each trial are randomly chosen (Harris, 1948; Moore, 1997). This improvement was attributed to a reduction in the internal noise of the sensory representation of the repeated stimulus (Viemeister, 1970; Lages and Treisman, 1998; Morgan et al., 2000; Nachmias, 2006). Namely, the averaged representation of the reference becomes gradually more accurate with repeated presentations, and this average may replace the reference stimulus when compared with the nonreference stimulus. Recent studies further showed that other cross-trial manipulations, such as the temporal position of the reference within the pair, may significantly affect performance (Morgan et al., 2000; Oxenham and Buus, 2000; Nachmias, 2006; Hairston and Nagarajan, 2007; Georgeson et al., 2008; Lapid et al., 2008; Yeshurun et al., 2008).
However, despite the accumulated findings pointing to the importance of the assessment protocol, the basic concept of a “retain-and-compare” process that can be implemented with no loss of information remained unchallenged. We now asked whether a retain-and-compare mechanism can indeed account for the impressively low discrimination thresholds reported in the literature. Applying a two-tone frequency discrimination paradigm, we found that the perceptual system actively attempts to bypass comparisons between two recently presented stimuli and replace them with a task-related classification (“high” or “low”) based on an internal reference. When the experimental protocol does not allow such classification, discrimination thresholds invariably increase even when a repeated stimulus is presented in each trial and its sensory representation is expected to be accurate. These findings suggest that the retain-and-compare process itself poses a severe bottleneck to perception.
Materials and Methods
Behavioral experiments
Participants
One hundred eighteen participants (mean age, 24 ± 4 years) were engaged in the behavioral part of the study, as detailed in Table 1. Each group included participants with and without previous musical training, i.e., at least 1 year of experience with a tonal instrument (Table 1). Groups 1–5 were composed of naive participants, with no previous experience with the task. Each participant performed one protocol. Group 6 was composed of highly trained laboratory members, each of whom performed the task in all three protocols (“reference first,” “reference second,” and “random”; see below).
Participant groups in the behavioral experiments
Two-tone frequency discrimination task
All stimuli were presented binaurally through Sennheiser HD-265 linear headphones using a TDT System III signal generator (Tucker Davis Technologies) controlled by in-house software in a sound attenuated room in the laboratory. Tone intensity was 65 dB.
In all protocols but the “implicit reference” protocol (detailed description follows), thresholds for frequency discrimination were measured using a 2AFC. The frequency difference between the two tones in each trial was changed in a three-down, one-up staircase procedure (Levitt, 1971) converging to 79.4% correct responses. Step size decreased every four reversals, from 4.5 to 2 to 1 to 0.5 to 0.1%.
Each trial contained two 50 ms tones, presented with a 950 ms interstimulus interval (ISI). Participants were asked to indicate, by pressing a button, which of the two tones was higher. A pleasant visual feedback (a happy face) was provided for correct responses, whereas an unpleasant feedback (a sad face) was presented after incorrect responses. The next trial began 1 s after the participant's response. Each assessment block contained 80 trials (except for the “reference interleaved” protocol; see below). Discrimination thresholds were determined as the mean of the frequency differences in the last seven reversals. Each group of participants (groups 1–5) (Table 1) was assigned to one of the following five protocols and performed three consecutive blocks in that protocol.
Reference first protocol.
A fixed reference tone (1000 Hz) was presented on the first interval of every trial. The tone on the second interval (nonreference) was randomly selected to be either higher or lower in frequency. The (absolute) frequency of the nonreference (second) tone was adapted during the assessment as described above.
Reference second protocol.
A fixed reference tone (1000 Hz) was presented on the second interval of every trial, whereas the nonreference tone (either higher or lower in frequency) was always presented first. The (absolute) frequency of the nonreference (first) tone was adapted as described above.
Random protocol.
There was no fixed reference. On every interval, one tone was selected from the range of 600–1400 Hz, and the other tone (randomly chosen as either higher or lower) differed in frequency according to the same adaptive procedure described above.
Implicit reference protocol.
A fixed reference tone (1000 Hz) was presented five times at the beginning of the session. After that, a single tone (the nonreference), which was randomly selected to be either higher or lower than the reference frequency, was presented on every trial. Its temporal position (with respect to the time of response in the previous trial) was equivalent to that of the reference first protocol. Participants were required to reply whether the tone was higher or lower than the initial reference. The (absolute) frequency of the tone was adapted during the assessment as described above.
Reference interleaved protocol.
Blocks were of double length, i.e., composed of 160 trials. Odd trials were of type reference first, whereas even trials were of type reference second. Separate staircases and thresholds were calculated for reference first and for reference second trials, each following the same adaptive procedure described above.
Data analysis
Comparison of thresholds under reference first, reference second, random, and implicit reference protocols was conducted using a univariate analysis with between-subjects factor of protocol (four levels). Post hoc Scheffé's contrast was performed to compare thresholds under the four protocols. Comparison of thresholds under reference first protocol when measured separately versus interleaved was done using an unpaired two-tailed Student's t test. A similar comparison was done for the reference second protocol. In addition, we compared thresholds obtained for the reference second trials of the reference interleaved protocol with those of the random protocol, using a similar t test.
Event-related potential measurements
Participants
A total of 16 participants (mean age, 25 ± 2 years; five males) took part in the event-related potential (ERP) measurements (Table 2). Four of them (laboratory members) participated in both ERP experiments and had previous experience with the task. These four participants include the experienced subject YC (laboratory member), who initially performed at chance level and was subsequently administered an intensive training protocol with the adaptive reference first protocol for 2 weeks. After training, this subject participated in the two ERP sessions of experiments 1 and 2, respectively.
The two participant groups who participated in the ERP experiments
Stimuli
The same protocols used for the behavioral experiment were used for the ERP measurements. However, the ERP measurements were conducted using constant, rather than adaptively modified stimuli to collect more data under exactly the same stimulation conditions. ERP measures were taken in two separate experiments. In the first experiment, the frequency differences between the reference and nonreference tone were 1 or 2% (with equal probabilities), whereas in the second experiment, they were 4 and 8% (Table 2).
Protocols
In the first ERP experiment, two 2-tone discrimination protocols were used.
Reference first.
The first tone was always 1000 Hz, whereas the second tone was chosen to be 990, 980, 1010, or 1020 Hz with equal probability.
Reference second.
Same as reference first protocol, but the order of the tones within the trial was reversed, as described above.
Both protocols were assessed within a single session, composed of four blocks, each composed of 300 trials, in the following sequence: reference first, reference second, reference first, and reference second. The ERP waves reported for each protocol are an average of the trials in the corresponding two blocks.
In the second ERP experiment, the procedure was essentially the same. The frequency differences used were, however, larger, 4 and 8%. Three protocols were assessed in three separate blocks composed of 300 trials each, in the following sequence: reference first, reference second, and reference interleaved. In the reference first block, the first tone was always 1000 Hz, whereas the second tone was chosen from 920, 960, 1040, or 1080 Hz. In the reference second block, the same stimuli were used just in opposite order. In the reference interleaved block odd trials were reference first trials, whereas even trials were reference second trials.
Experimental procedure
Electrophysiological activity was recorded in a sound-attenuated room while participants performed a two-tone frequency discrimination task. Two tones of 50 ms length and with an ISI of 600 ms between them were presented in each trial. Participants were asked to determine which tone was higher (first or second), by pressing either 1 or 2 on the computer keyboard, respectively. Participants were requested to respond immediately after hearing both tones and before the next trial, which began after 1.4 s. No feedback was presented to avoid ERP components related to the feedback stimulus. Based on our pilot studies in which we compared performance in these tasks with and without feedback, we concluded that, although participants prefer having feedback, their discrimination thresholds are not affected by its absence.
The recording session consisted of four blocks (three in the second ERP experiment), each containing 300 trials, with short breaks between them.
EEG recording and averaging
EEG was recorded from 32 active Ag-AgCl electrodes mounted on an elastic cap using the BioSemi ActiveTwo tools. Recordings were referenced to the tip of the nose. Electrode sites were based on the 10–20 system (American Electroencephalographic Society, 1991). Two additional electrodes were placed over the left and right mastoids. Horizontal electro-oculogram (EOG) was recorded from two electrodes placed at the outer canthi of both eyes. Vertical EOG was recorded from electrodes on the infraorbital and supraorbital regions of the right eye in line with the pupil.
EEG and EOG signals were sampled at 256 Hz, amplified, and filtered with an analog bandpass filter of 0.16–100 Hz. Offline analysis was performed using BrainVision Analyzer software. EEG was referenced to the averaged mastoids and was digitally filtered using bandpass of 1–30 Hz. Artifact rejection was applied to the unsegmented data according to the following criteria: any data point with EOG or EEG greater than ±100 μV was rejected along with the data ± 300 ms around it. In addition, if the difference between the maximum and the minimum amplitudes of two data points within an interval of 50 ms exceeded 100 μV, data ± 200 ms around it were rejected. Finally, if the difference between the maximum and the minimum amplitudes of two data points within an interval of 100 ms was below 0.5 μV, the data point along with the data ± 300 ms around it were rejected. Trials containing rejected data points were omitted from additional analysis, as well as the first three trials of each block. For ERP averaging, the EEG was parsed to 1300 ms epochs starting 100 ms before the first interval and then averaged separately for each condition. Then the baseline was adjusted by subtracting the mean amplitude of the prestimulus period of the averaged ERP epoch from each data point in the epoch.
Analysis of P3 peaks
Quantitative analysis was specifically performed for the P3 component obtained from the Cz electrode (this electrode was chosen because the magnitude of the P3 component measured with this electrode was largest, as expected from the literature for the P3b component) (Squires et al., 1975; Polich, 2007). P3 peaks were defined as the maximal values between 350 and 500 ms after stimulus onset for each of the two stimuli in a trial. The amplitude and the latency of the P3 peaks were measured for each subject and for each protocol (or trial type) separately. An ANOVA with within-subjects factors of measurement (two levels: separate vs interleaved) and protocol (reference first vs reference second) was conducted separately for the first and second peak values.
Results
Behavioral experiments
Five groups of listeners were tested with five protocols, respectively (see Materials and Methods and Table 1). All groups performed a two-tone frequency discrimination task (with the exception of the implicit reference protocol; see below), in which participants were asked, “which of the two sequentially presented tones is higher?”.
Three groups were tested with three different protocols in which one interval, whose position was fixed throughout the assessment blocks (either first or second of the two stimuli in a trial), contained sufficient information for task performance. Discrimination thresholds in these three groups were similarly low. Figure 1a shows the dynamics of the first threshold assessment as a function of trial number in these groups, performing the reference first (reference tone presented in the first interval on each trial; solid blue line), reference second (reference tone presented in the second interval; solid red line), and implicit reference (a single tone, above or below the implicit 1000 Hz reference, presented in every trial; solid green line) protocols, respectively. Thresholds did not significantly differ between the groups (3.8 ± 1.4, 4.5 ± 1.1, and 4.9 ± 3% difference between the two tones in a trial, respectively), even when the reference was never explicitly presented (implicit reference).
Behavioral performance in two-tone frequency discrimination under four different protocols. a, Percentage frequency difference between tones as a function of trial number in the first assessment block under three protocols: reference first (blue curve), reference second (red curve), and implicit reference (green curve). Performance was measured using an adaptive staircase procedure (Levitt, 1971), which began with a 20% frequency difference under all protocols and was adaptively modified according to listener's performance. All three protocols yielded similarly low thresholds. b, Percentage frequency difference along the first assessment for two additional protocols: random (no repeated reference) and reference interleaved (reference first on odd trials and reference second on even trials) measured with the same adaptive paradigm. Note that the random and reference second trials yielded similarly high thresholds, whereas the reference first trials yielded low thresholds, almost as low as those measured separately. c, Thresholds (percentage frequency difference) in each of the three blocks performed consecutively on each of the five protocols: random (black bars), reference first (filled blue bars), reference second (filled red bars), implicit reference (green bars), and reference interleaved (open blue and red bars for reference first and reference second thresholds, respectively). Although all thresholds showed improvement, their ranks were retained across assessments. Cross-subject averages and SEMs are shown; n = 21 in each group, except for the reference interleaved group in which n = 25.
Two additional groups were tested with protocols in which there was no consistent interval that contained sufficient task-related information (Fig. 1b). One group (group 4) was tested with the random protocol (no repeated reference), which requires interstimulus comparisons between the two tones in a trial. As shown in Figure 1b (black line), thresholds measured for the random protocol were substantially higher (14 ± 2%) than those measured in the three protocols illustrated in Figure 1a (F(3,80) = 5.6; p < 0.003; post hoc Scheffé's contrast: p < 0.03 for reference first, reference second, and implicit reference compared with random; p = 0.99, NS, for comparing the first three protocols).
The other group (group 5) was tested with the reference interleaved protocol, in which the position of the reference tone was systematically switched between consecutive trials. Here successful performance requires either an interstimulus comparison or a trial-by-trial switch of the classified interval, which contains the nonreference tone. In this protocol, the thresholds obtained in reference first (odd) trials (Fig. 1b, dotted blue line) greatly differed from those obtained in reference second (even) trials (Fig. 1b, dotted red line). In the interleaved reference first trials, performance was just as good as in the group that performed the reference first protocol separately (4.4 ± 0.9% compared with 3.8 ± 1.4% for the separate measurement; F(1,43) = 1.5, p = 0.23) (compare with blue line in Fig. 1a). However, in the interleaved reference second trials, performance dramatically degraded compared with that of the group that performed the reference second protocol separately (15.5 ± 3% compared with 4.5 ± 1% for the separate measurement; F(1,43) = 8.8, p < 0.005) (compare with red line in Fig. 1a) and was as poor as in the random protocol (comparing reference second interleaved with random: F(1,44) = 0.75, p = 0.79). The asymmetry that was found between the odd and the even trials within the reference interleaved protocol cannot be explained in terms of the accuracy of the internal representation, because both trial types contained a repeated reference tone in a fixed position.
Administering these five protocols for three consecutive assessments induced some improvement in all of them but did not affect the magnitude of the benefit of the reference-containing protocols over the others (Fig. 1c). Thus, by the third block, thresholds in the random and reference second trials in the reference interleaved protocol were more than three times larger than the thresholds in the other protocols.
This more than threefold difference in thresholds, in favor of the protocols containing a reference at a fixed temporal position in the trial, was consistent across groups with various degrees of previous musical background and training with the specific task. Thus, when the group of naive participants (shown in Fig. 1) was subsequently divided according to individuals' previous musical experience, similar patterns showing large benefits for reference use were found among both subgroups: those with no musical background (Fig. 2a) and those with musical background (Fig. 2b).
The advantage of the reference-containing conditions over the random protocol in different participant groups. Performance on reference first (blue), reference second (red), and random (black) protocols is shown for three participant groups: a, naive to task with no musical training (n = 8 for reference first and reference second protocols; n = 10 for random protocol); b, naive to task with at least 1 year of musical training (n = 13 for reference first and reference second; n = 11 for random); and c, laboratory members who are highly trained with the task (n = 9; each subject did all three protocols). The data for the total group of participants shown in a and b was shown in Figure 1. Graphs show frequency difference as a function of trial number throughout the first assessment for the three protocols; overall performance is best for the expert group (c) and is worst for the naive with no musical training group (a). However, all groups show a similarly large benefit in thresholds for the reference-containing protocols compared with the no-reference (random) protocol (2.8, 6, and 3.7 for the threshold ratio between random and reference first protocols for the 3 subject groups, respectively).
We further assessed this effect among individuals with lots of previous experience with the specific task (laboratory members, group 6) (Table 1). As expected from previous results in the perceptual learning literature (Tremblay et al., 1997; Irvine et al., 2000; Delhommeau et al., 2002; Demany and Semal, 2002; Hawkey et al., 2004; Amitay et al., 2005; Wright and Fitzgerald, 2005; van Wassenhove and Nagarajan, 2007; Halliday et al., 2008), thresholds for these highly experienced individuals were substantially lower. However, they showed a similarly large impact of protocol as inexperienced participants (Fig. 2c), indicating that the benefit of the reference (explicit or implicit) does not reflect the first stage of a faster long-term perceptual learning mechanism.
Despite the dramatic effect of the protocol on discrimination performance, listeners were unaware of any cross-trial repetition and naturally unaware of its temporal consistency. When asked immediately after assessments, all participants reported that they listened to the first tone, retained it in memory, listened to the second tone, and then compared the two. Trained laboratory members (group 6), who all performed reference first, reference second, and random protocols, were also unaware of the protocol while performing the task and had a similar introspection regarding their strategy for task performance.
ERP experiments
Two ERP experiments were conducted. In the first experiment, measurements were taken while participants performed the reference first and reference second protocols in separate blocks. The protocols were similar to those used in the behavior-only experiments described above, except that constant (rather than adaptive) frequency differences (1–2%, in the range of the thresholds reached by most participants after a few hundred trials using the adaptive procedure) were used. The average ERP responses (at central electrode Cz) obtained under these two protocols are similar in terms of the N1, P2, and N2 components (Fig. 3, blue and red lines, for reference first and reference second, respectively). Each of these components was induced in response to each of the two tones in each trial; the N1 response to the second tone was somewhat smaller, in line with previous reports for successive tones with a temporal interval smaller than 1 s (Hari et al., 1982; Nahum et al., 2009).
ERP waves recorded while participants performed the two-tone frequency discrimination task with a repeated reference in a fixed position across trials. Participants performed the reference first (blue) and reference second (red) protocols in separate blocks, with 1–2% difference between the reference and nonreference tones. Most participants (9 of 10) were good listeners and reached ∼95% correct responses (averaged performance on reference first, 95 ± 4%; reference second, 93 ± 6%). One participant had chance-level performance on the first session, and her data were analyzed separately (see Fig. 4). The plotted waves are taken from the central electrode Cz and were averaged across participants (n = 9). The temporal location of the tones in a trial is marked by the black rectangles at the bottom of each plot. The relevant components are marked on the averaged waveforms. A clear P3 component can be seen after the second tone in the reference first protocol and after the first (nonreference) tone in the reference second protocol. The insets on the right show examples of two participants: ML had no previous experience with the task and no musical background, and LD had previous experience with the task and musical background (see Materials and Methods). Participants typically responded ∼400 ms after the second tone in both protocols.
The crucial difference between the ERP responses in the two protocols was in the position of the later P3 component. In the reference first trials (blue plot), in which the nonreference stimuli were presented in the second interval, a clear P3 was elicited at the end of the trial (average peak amplitude, 3.4 ± 0.5 μV), and no P3 was elicited after the first stimulus (average amplitude, 1.4 ± 0.2 μV). Conversely, in the reference second protocol (red plot), in which the informative (nonreference) stimulus was presented in the first interval and the second (reference) tone was not necessary for determining which tone is higher, no P3 was elicited after the second tone (average amplitude, 0.6 ± 0.2 μV); instead, a clear P3 response followed the first, nonreference tone (2.4 ± 0.5 μV). These differences were consistent across individuals (comparing the first P3 between protocols, p < 0.02; second P3, p < 0.0001). Thus, the P3 response was produced in each of the two protocols ∼350 ms after the presentation of the nonreference stimulus. Because this component is reliably, and even implicitly, produced when a perceptual decision is made (Donchin and Coles, 1988; Verleger, 1988; Nieuwenhuis et al., 2005), the shift of its position suggests that an implicit decision was made immediately after the presentation of the category determining stimulus, even when this stimulus was presented first (as in reference second). Thus, an implicit decision was made with no online interstimulus comparison, in contrast to listeners' subjective report and in contrast to the prevailing comparison assumption.
One participant was a poor listener and performed the task (the nonadaptive procedure used during ERP recordings) at chance level. This participant showed no P3 responses in either the reference first or the reference second blocks (Fig. 4, left, blue and red lines). However, after 2 weeks of intensive practice, performance improved, and ERPs measured during task performance were similar to those of good listeners (Fig. 4, right). These findings suggest that P3 is not produced when the task is not performed successfully. However, once successful performance is obtained, an implicit perceptual decision is made immediately after the informative, nonreference tone.
ERP waves recorded for the reference first and reference second protocols before (left) and after (right) 2 weeks of training. Participant YC performed around chance level in the initial ERP assessment of the reference-containing protocols (1–2% frequency difference; reference first, 59.5% correct; reference second, 41.5% correct). The initial ERP waves reflect this poor performance, showing no P3 component for reference first (blue curve) or reference second (red curve) trials. After training with the adaptive protocol, another ERP measurement session was administered. Performance reached 91 and 94% success for reference first and reference second blocks, respectively. Posttraining ERP measures (right) reflect this behavioral change. A clear P3 component is seen after the second (nonreference) tone in the reference first protocol (blue curve) and after the first but not the second tone for the reference second protocol (red curve).
The data presented thus far are consistent with a quick, implicit strategy change from an online comparison between the two tones to task-related classification of the tone presented in the informative, nonreference interval. To further test this hypothesis, we measured ERP responses under the reference interleaved protocol. Based on our previous behavioral results with the reference interleaved protocol (Fig. 1b), we chose to use larger frequency differences between the tones (4–8%) so that performance in both reference first and reference second trials will be adequate. To ensure that the use of a larger frequency difference does not impact the pattern of ERP responses in the separate protocols, we administered them as well. As shown in Figure 5a, ERP response patterns in the separate protocols were similar to those obtained with the smaller frequency differences (compare the right side of Fig. 5 with the plots in Fig. 3): a clear P3 was formed after the second tone in reference first and after the first tone in reference second blocks. However, the potentials induced in the reference interleaved protocol (Fig. 5b, dotted lines) were different and showed the same type of asymmetry between the reference first and reference second trials that was found in the behavioral experiment. A relatively clear P3 was induced at the end of the odd, reference first trials (3.1 ± 0.9 μV, similar in magnitude to the 3.5 ± 0.8 μV measured separately; effect of measurement: F(1,9) = 0.001, p = 0.98, NS). However, in the reference second trials, no clear P3 was induced after either the first (1.2 ± 0.4 μV) or the second (2.1 ± 0.4 μV; significant effect of protocol, F(1,9) = 6.6, p < 0.05; significant interaction of protocol × measurement, F(1,9) = 7.03, p < 0.03) interval. Only a marginal P3 was induced after the second interval (Fig. 5b, dotted red curve). The lack of a clear P3 in reference second trials suggests that, in these trials, participants could not reach a clear perceptual decision at any consistent point in time with respect to stimulus presentation.
ERP measurements of the reference first and reference second performed separately (a) and in (reference) interleaved blocks (b). a, Averaged response traces recorded from participants (n = 10) performing reference first (blue trace; 96 ± 2% correct) and reference second (red trace; 87 ± 6% correct) protocols in separate blocks, using 4 and 8% frequency difference. A clear P3 component is evident after the second tone for reference first protocol and after the first tone for reference second protocol. b, Averaged response traces recorded from the same participants while performing the reference interleaved protocol (dotted blue trace, reference first trials, 81 ± 5% correct; dotted red trace, reference second trials, 84 ± 4% correct). The same frequency difference of 4 and 8% between the reference and nonreference tones was used. A clear P3 component was elicited at the end of the trial for the reference first trials but not after the first tone for the reference second trials.
Summary of results
We measured behavioral performance and ERP waves under several stimulation protocols of a two-tone frequency discrimination task that differed in the presence and location of a consistent reference tone. Behaviorally, we found that best performance (lowest thresholds) was achieved for reference first, reference second, and implicit reference protocols, as well as for reference first trials in the reference interleaved protocol. Thresholds were more than threefold higher (i.e., worse) in the random protocol (no repeated reference) and in the reference second trials of the reference interleaved protocol. The advantage of the reference-containing protocols was consistent across the degree of musical training and familiarity with the task.
ERP measurements, recorded while participants performed the reference first and reference second protocols, mainly differed in the location of the P3 component. P3 followed the second tone in reference first blocks and the first tone in reference second blocks. Thus, in both cases, it followed the informative, nonreference tone. In the reference interleaved protocol, it followed the second tone in reference first trials and was almost absent in reference second trials. This asymmetry is in line with the asymmetrical performance in these trial types when they are interleaved. Reference first performance is not affected, whereas reference second performance does not benefit from the repeated reference.
Discussion
The main conclusion of this study is that the classical approach, which uses fixed thresholds for characterizing human discrimination ability, should be modified. We found that seemingly minute changes in the assessment protocols induce dramatic changes to the measured thresholds. Specifically, discrimination thresholds are systematically dictated by the temporal cross-trial structure of the assessment protocol. Thus, when characterizing discrimination thresholds, the protocol used for assessment should be part of the description.
Some effects of the measurement protocol on performance have been reported previously (Morgan et al., 2000; Nachmias, 2006; Hairston and Nagarjan, 2007; Lapid et al., 2008). However, previous studies attributed the improvement in threshold to increased accuracy of the internal representation of the repeated reference (Durlach and Braida, 1969, 1988). We now find that discrimination thresholds may be quite poor, even when the reference is repeatedly presented on every trial and its temporal position is fully predicted, as in the reference second trials of the reference interleaved protocol. Thus, decreasing internal noise and providing sufficient information regarding the informative interval is not sufficient for attaining low thresholds. Moreover, impressively low thresholds were obtained even when the reference stimulus was never explicitly presented, as in the implicit reference protocol. Thus, decreasing internal noise by using a repeated reference is not necessary for attaining low thresholds. We propose that the protocols that afford impressively low discrimination thresholds are those that allow our perceptual system to switch the underlying mechanism from interstimulus comparison to classification, which is based on comparison of a single stimulus, typically in a fixed interval, to an internal reference. The dramatically improved performance stems from the greater accuracy of the classification mechanism.
Classification versus comparison mechanisms
The three protocols that consistently yielded lowest discrimination thresholds share two characteristics. First, successful performance (except for the first few trials) could be based on a single, nonreference tone (lower or higher than 1000 Hz). Second, this informative tone is presented at a fixed interval throughout the assessment block. These characteristics afford classification based on a single, fixed interval.
In these protocols, successful performance was correlated with the formation of a P3 response immediately (300–350 ms) after the presentation of the informative (nonreference) tone. A P3 response followed the first tone in the reference second protocol and the second tone in the reference first protocol. Several investigators suggested that the P3 component is produced when a clear task-related categorization (“high” or “low” in our case) is made (Donchin and Coles, 1988; Verleger, 1988; Mecklinger and Ullsperger, 1993; Nieuwenhuis et al., 2005). In our study, we could therefore use it as a marker for the point in time in which a decision had been made, even when listeners are unaware of such a decision. Our findings indicate that the implicit decision in the two-tone discrimination task had been made based on the single informative stimulus.
The difference in performance between the successful protocols and those inducing higher thresholds was apparent already early within the first assessment. We therefore propose that the switch to classification of stimuli that are evaluated as informative is quick. Moreover, it did not result from participants' explicit intention or even their awareness and may therefore be automatic. However, under some conditions, as in the random protocol, in which classification of a single stimulus is not sufficient for successful performance, a comparison between the representations of two recently presented stimuli is required. When classification of a single stimulus is sufficient but its position is not fixed, a mixed strategy may be used. Crucially, both protocols involve interstimulus comparisons, which require explicit decision and activation of working memory mechanisms to retain and compare the two recently presented stimuli. We suggest that the accuracy of the implicit classification processes is inherently better than the accuracy obtained under explicit comparison mechanisms.
There is ample evidence that these two types of comparisons are indeed implemented in the brain by different mechanisms. For example, Hernández et al. (1997) trained monkeys on a tactile discrimination task using a reference first protocol, until a low threshold was reached. After training, the monkeys were tested on the implicit reference protocol, in which they retained their trained level of accuracy, and on the random protocol, which they completely failed. This result suggests that, in the testing phase, the monkeys kept classifying the second stimulus with respect to the learned internal reference and ignored the first stimulus. Attaining successful performance in the random protocol, which requires the monkeys to retain-and-compare sequentially presented stimuli, needed intensive additional training. The authors concluded that monkeys (and probably humans, who also participated, but were not trained, in this study) have difficulties using retain-and-compare mechanisms and use classification when possible. The similarity in the impact of the assessment protocol on the relative thresholds of highly trained monkeys, highly trained humans, and naive human performers suggests that similar mechanisms underlie classification with respect to a recently formed reference and to a reference retrieved from long-term memory.
A similar type of separation between underlying mechanisms using seemingly similar tasks is provided by tasks that require to indicate whether a stimulus had been presented recently compared with those asking to indicate when was it presented. The first task requires only implicit automatic mechanisms, whereas the latter requires explicit working memory mechanisms. For example, when listeners are asked to perform an n-back task, i.e., report whether an element is a repetition of the element presented n steps earlier in the sequence, they have no difficulty reporting that the element was recently presented, supported by the implicit memory classification mechanism. However, they have huge difficulties reporting its exact serial position, supported by explicit working memory (Gray et al., 2003). Similarly, when monkeys are asked to retrieve the serial order of stimulus presentation, their default behavior relies on the highly trained rather than on the most recent presentation order (Orlov et al., 2000).
Additional evidence for the separation of these two types of mechanisms comes from studies of dyslexia. Although activating explicit working memory mechanisms is typically more demanding, some individuals have specific difficulties in gaining benefits from the implicit mechanisms using cross-trial repetitions. These individuals have difficulties in switching from comparison to classification even when the protocols afford such a switch for the general population. Recent findings (Ahissar et al., 2006; Banai and Ahissar, 2006; Ahissar, 2007) indicate that such is the case for many individuals with reading disabilities, who perform the random protocols adequately, whereas their performance in the reference-containing protocols is impaired compared with that of controls. These findings suggest that they have adequate interstimulus comparison mechanisms but difficulties in the mechanisms underlying the formation of internal references and automatic stimulus classification. Therefore, a broad range of tasks remains effortful for them.
Underlying neuronal mechanisms
The formation of the P3 component that we recorded provides a useful indication that an implicit decision had been made. However, it is not too informative with respect to the exact timing of this decision, its neuronal origin, or its underlying mechanism. The accumulative findings suggest that P3 reflects the response of the norepinephrine system to the outcome of internal decision-making processes and the consequent effects of noradrenergic potentiation of information processing (Nieuwenhuis et al., 2005). As such, it is probably evoked by a spread network of activity, is an outcome of a variety of decision-making mechanisms, and can only provide an upper boundary for when a decision had been made.
However, single-unit recordings in monkeys using similar tasks shed light on the neural origins of the response. A series of seminal studies by Romo and his colleagues directly assessed the neural mechanisms underlying performance in these two protocols. The monkeys were trained to perform a two-vibration tactile discrimination task, in which they were asked which of two sequentially administered vibrations had a higher frequency (Hernández et al., 2002; Romo et al., 2002). Neuronal activity was recorded in a series of cortical areas while monkeys performed an equivalent of our random protocol. Whereas S1 neurons responded only during the presentation of the stimuli, neurons in a series of higher-level areas, including the prefrontal cortex and ventral and medial premotor cortices, were active during the delay interval. This observation suggests that, in retain-and-compare demanding conditions, the magnitude of the first stimulus is actively retained with delayed, poststimulus activity. Delayed activity is not an automatic property of these areas (Miller et al., 1996), and its accurate retention may be difficult (Seung, 1996).
Indeed, when monkeys performed the reference first protocol, they ceased to produce delayed activity but exhibited fantastic discrimination thresholds (Romo et al., 2002). Similar results were recently obtained when monkeys performed an auditory equivalent of this task, in which they were required to compare the rate of two sequentially presented auditory pulse trains. Similarly, no delayed activity was found in primary auditory cortex A1 (Lemus et al., 2009a), whereas neurons in ventral premotor cortex showed clear delayed activity (Lemus et al., 2009b). Thus, although activity in A1 is related to both the behavioral condition (Fritz et al., 2003; Otazu et al., 2009) and the protocol of assessment (Ulanovsky et al., 2003), current data suggest that A1 (or primary somatosensory cortex S1) does not implement interstimulus comparisons (Salinas et al., 2000; Romo and Salinas, 2003; Yoon et al., 2006).
Pinpointing the neural mechanisms underlying classification is difficult perhaps because it is the automatic or the default mode of our perceptual system. We propose that the automatic, implicit mechanisms of classification benefit from the sharpening of the reference by adaptation mechanisms, whose impact is retained through intervening trials (Li et al., 1993; Ulanovsky et al., 2004).
Conclusion
Our findings suggest that our impressively low discrimination thresholds do not reflect refined explicit retain-and-compare mechanisms, as assumed by most behavioral studies, but rather an effective implicit strategy shift from discrimination to classification, based on quickly established internal references. The similarity between the impact of the assessment protocol on naive and trained (both human and monkey) performers suggests that similar mechanisms underlie classification, whether the reference was learned long ago or just recently. The impressive effectiveness of this strategy shift probably stems from the quick ability of higher-level mechanisms to detect stimulus repetitions at lower levels but only those repetitions that follow the specific temporal patterns that these levels are predisposed to detect. This shift may serve as an automatic mechanism for releasing the specialized high-level working memory networks to engage in novel comparisons (for such distinction, see Zatorre et al., 1998), whereas the quickly-learned classifications are “down-sourced” and more efficiently implemented by lower-level cortical networks (Schneider and Shiffrin, 1977; Ohl and Scheich, 2005).
Footnotes
-
The research was supported by the Israel Science Foundation, a grant from the Israeli Institute of Psychobiology, and a subcontract from the National Institutes of Health (NIH Grant 2R01DC004855). We thank Ehud Ahissar, Laurent Demany, Shaul Hochstein, and Yonatan Loewenstein for insightful comments and discussions and Mor Levi and Nitzan Tal for assistance in collecting the data.
- Correspondence should be addressed to Prof. Merav Ahissar, Department of Psychology and Interdisciplinary Center for Neural Computation, Hebrew University of Jerusalem, Jerusalem, Israel 91905. msmerava{at}mscc.huji.ac.il