Abstract
Perceiving an object as salient from its surround often requires a preceding process of grouping the object and background elements as perceptual wholes. In humans, motion homogeneity provides a strong cue for grouping, yet it is unknown to what extent this occurs in nonprimate species. To explore this question, we studied the effects of visual motion homogeneity in barn owls of both genders, at the behavioral as well as the neural level. Our data show that the coherency of the background motion modulates the perceived saliency of the target object. An object moving in an odd direction relative to other objects attracted more attention when the other objects moved homogeneously compared with when moved in a variety of directions. A possible neural correlate of this effect may arise in the population activity of the intermediate/deep layers of the optic tectum. In these layers, the neural responses to a moving element in the receptive field were suppressed when additional elements moved in the surround. However, when the surrounding elements all moved in one direction (homogeneously moving), they induced less suppression of the response compared with nonhomogeneously moving elements. Moreover, neural responses were more sensitive to the homogeneity of the background motion than to motion-direction contrasts between the receptive field and the surround. The findings suggest similar principles of saliency-by-motion in an avian species as in humans and show a locus in the optic tectum where the underlying neural circuitry may exist.
SIGNIFICANCE STATEMENT A critical task of the visual system is to arrange incoming visual information to a meaningful scene of objects and background. In humans, elements that move homogeneously are grouped perceptually to form a categorical whole object. We discovered a similar principle in the barn owl's visual system, whereby the homogeneity of the motion of elements in the scene allows perceptually distinguishing an object from its surround. The novel findings of these visual effects in an avian species, which lacks neocortical structure, suggest that our basic visual perception shares more universal principles across species than presently thought, and shed light on possible brain mechanisms for perceptual grouping.
Introduction
For humans, an object that is different from a uniform surrounding, such as a vertical bar embedded in an array of horizontal bars, tends to perceptually “pop out” (Treisman and Gelade, 1980; Wolfe and Horowitz, 2004). Perceiving an object as popping out relative to its neighbors often requires a preceding process of grouping in which the object and the background are grouped into perceptual wholes (Duncan and Humphreys, 1989; Kingstone and Bischof, 1999). In the early 20th century, Gestalt theorists attempted to explain factors that govern this organization, defining a set of laws of perceptual organization specifying how we construct simple individual elements into global wholes (Spillman and Ehrenstein, 2003).
Neurophysiological studies have sought for the neural correlates of Gestalt-like figure-ground segregation mostly in primates (Lamme, 1995; Zipser et al., 1996; Zhou et al., 2000; Lee et al., 2002; Qiu and von der Heydt, 2005; Burrows and Moore, 2009), showing neurons that process contextual stimuli, which appear outside the classic receptive field (RF) of the cell but influence its response to a stimulus inside the RF. However, we hypothesize that the Gestalt principles derived from humans are a manifestation of visual mechanisms that evolved early in evolution as a means of breaking camouflage. Therefore, we expect to find similar principles in a wide range of animal species, beyond primates and mammals.
To explore this hypothesis, we studied an avian species, the barn owl, and focused on motion pop-out stimuli both at the perceptual and neuronal levels. It has been shown that this animal expresses pop-out perception for orientation and luminance stimuli (Orlowski et al., 2015, 2018). At the physiological level, it has been shown that tectal neurons in the barn owl respond stronger to an object moving in the RF when objects outside the RF move in an opposite direction compared with when the surrounding objects move in the same direction (Zahar et al., 2012). This is consistent with the proposed role of the optic tectum (OT) in saliency mapping (Mysore and Knudsen, 2011; Gutfreund, 2012). However, such a modulation from the surround does not necessarily imply perceptual grouping but can arise from simple motion-contrast sensitivity between the RF and its surround (Hegdé and Felleman, 2003). The pop-out effect in its classical interpretation involves global perception of a homogeneous surround as a separate whole (Duncan and Humphreys, 1989; Hochstein and Ahissar, 2002). Thus, to address whether barn owls can use the homogeneity of motion for saliency mapping, it is necessary to use a paradigm that distinguishes between motion contrast and homogeneity.
To achieve this, we designed a paradigm in which responses are compared between when background elements move homogeneously but in a direction that is less contrasting the direction of the target to when the background elements move nonhomogeneously, in directions that are more contrasting the target's direction. We first demonstrate in behavioral experiments that an object moving relative to homogeneously moving background elements is perceived by barn owls as being more salient compared with an object moving relative to a nonhomogeneous motion, consistent with perceptual grouping for figure-ground segregation. In complementary neurophysiological experiments, we found that tectal neurons in the intermediate/deep layers similarly tend to respond preferentially to targets embedded in homogeneous background motion compared with nonhomogeneous motion. Importantly, neural sensitivity to contrast and homogeneity matched the behavioral sensitivity of barn owls, suggesting a neural correlate of perceptual grouping by motion.
Materials and Methods
Animals.
Seven adult barn owls (Tyto alba) were used in this study: 5 owls in electrophysiological experiments and 2 owls in behavioral experiments. The owls were hatched and raised in captivity and kept in aviaries equipped with perching spots and nesting boxes. All experiments were performed in Haifa. All procedures were in accordance with the guidelines and approved by the Technion Institutional Animal Care and Use Committee. Surgical procedures were performed under isoflurane anesthesia; and in all recording sessions, the animals were sedated with mixture of oxygen and nitrous oxide. During recording sessions, no painful procedures were performed.
Surgical procedures.
Owls were prepared for repeated electrophysiological experiments in a single surgical procedure. First, the owl was anesthetized with isoflurane (2%) and nitrous oxide in oxygen (4:5). Lidocaine (lidocaine HCl 2% and epinephrine) was injected locally at the incision site. A craniotomy of 1 cm diameter was performed 0.6 cm lateral to the midline and 1.7 cm anterior from the anterior tip of attachment of the dorsal neck muscles to the skull. Then a recording chamber was cemented to the skull (Unifast dental cement mixed with cyanoacrylate adhesive) over the craniotomy. The chamber was filled with chloramphenicol ointment (5%) and sealed with a cap. After surgery, the animal was left to recover overnight in an individual cage and then released back to its home cage.
Electrophysiological recordings.
Before each electrophysiological session, the owl was moved to an individual cage without food overnight. At the beginning of each electrophysiological session, the owl was anesthetized briefly with isoflurane (2%) and nitrous oxide in oxygen (4:5). Once anesthetized, the owl was wrapped in a soft leather jacket and positioned in a stereotaxic apparatus inside a double-walled, sound-attenuating booth (internal size 2.05 × 1.7 × 1.95 m). The head was bolted to the apparatus after aligning the visual axis using retinal landmarks (Reches and Gutfreund, 2008). After the bird was fixed, the isoflurane was removed and the bird was maintained on a steady mixture of nitrous oxide and oxygen (4:5). Small weights were attached to the feathers on the owl's eyelids to maintain eye opening throughout the recording session. The nictitating membrane was not restrained, allowing for spontaneous moistening of the cornea. The head chamber was opened, and a tungsten, parylene-coated, or glass-coated micro electrode (0.5–1.5 mΩ; Alpha-Omega) was driven using a motorized manipulator. Because eye movements in barn owls are limited to a range <±2° (du Lac and Knudsen, 1990), we did not immobilize or control eye movements. The recorded electrical signal was amplified, digitized, and filtered (313–5000 Hz) using the AlphaLab SnR system (Alpha Omega). In each experiment, a threshold was set online to select the larger units in the recording sites and isolate action potentials from a small cluster of neurons (multiunit recording). At the end of each recording session, the recording chamber was treated with chloramphenicol ointment (5%) and closed. The owl was then returned to its home flying cage.
Identification of the location of the recording site was based on stereotaxic coordinates and on the expected physiological properties: the OT was recognized by characteristic bursting activity and spatially restricted visual and auditory RFs. Position within the OT was determined based on the location of the visual RF. Recordings were taken from all layers of the OT. The intermediate layers of the OT were located beneath the bursty layers and identified based on a transition from bursty activity to regular firing (Knudsen, 1982; Netser et al., 2010). The electrode was advanced in small steps to search for sites with clear units and visual responses. Recording sites were separated by at least 300 μm. All recording sites were from the anterior part of the OT having visual RFs between left and right 20° and up and down 20° relative to the center of the visual field.
Visual stimuli.
The visual stimuli were computed in MATLAB using Psych Toolbox extension (Brainard, 1997) and either displayed on a computer screen (17 inch LCD screen, at a refresh rate of 76 Hz) for the behavioral experiments or projected (refresh rate 72 Hz, XD400U; Mitsubishi) on a calibrated screen inside the sound attenuating chamber for the electrophysiological experiments (screen size 170 cm × 140 cm, 1.5 m away from the owl). The projector was positioned outside the chamber, projecting the image through a double-paned glass window. Visual stimuli were dark dots presented on a gray background (luminance of background screen was ∼20 cd/m2 and luminance of dots was ∼8 cd/m2). In each recording site, we first estimated the visual RF by moving a visual stimulus on the screen (a dark dot ∼1° in diameter) and listening to the neural discharge. The point that elicited the strongest neural discharge was chosen as the center of the RF. Typical width of RF in these layers was estimated in a previous study to be ∼6°-10° (Zahar et al., 2012). After estimating the RF center, a test paradigm was applied in which a dark dot (∼1° in diameter) was positioned at the center of the RF (the target). The dot was embedded in an array of identical dots (the distractors) equally spaced at 10° intervals (see Figs. 1, insets, 2, insets). In several experiments, the dots in the rectangle surrounding the target dot were omitted, thus increasing the distance between the target dot and its surrounding dots (see Figs. 6, insets, 7, insets).
In each trial, the initial frame of the dots array was displayed static for 1.5 s, and then the target, sometimes together with the background elements, moved to the right for 600 ms on a straight horizontal line for a distance of 2.9°. At the end of the movement, the last frame of motion was maintained static on the screen for 500 ms until the initiation of the next trial. In a previous study, no average difference was observed in the response properties of tectal neurons between leftward and rightward motions, and only weak modulations by direction were observed (Zahar et al., 2012). Therefore, to reduce the experimental trials, in this study we only studied responses to a target moving rightward. The target dot (in the RF) was embedded in one of six possible contexts relative to its background dots: (1) the singleton condition in which the target moved rightward while the distractors were static; (2) the uniform condition in which the target and distractors moved coherently rightward; (3) the offset 180° condition in which the distractors moved coherently in the opposite direction to target; (4) the offset 90° condition in which the distractors moved coherently upward (orthogonal to the target's rightward movement); (5) the offset 270° condition in which the distractors moved coherently downward (orthogonal to the target's motion); and (6) the mixed condition in which the target moved rightward while each of the distractors moved arbitrarily in one of the three directions, leftward, upward, or downward (for a graphical illustration of the six conditions, see Fig. 2, inset). Offset 135° and offset −135° conditions were also displayed (see Fig. 6, inset). In each test, conditions were interleaved randomly and repeated 15 times. In the mixed conditions, the dots moving leftward, upward, or downward were randomly reallocated in every trial.
Behavioral experiments.
Two hand-raised barn owls (Owls DO and DK; females of ∼1 year of age) were used to measure the behavioral responses. For the experiment, the owl was placed on a perch in a darkened room with a computer screen that was facing upward in a pecking range below the owl. To track the owl's gaze, a lightweight wireless video camera (Owl-Cam, 30 frames per second, ∼60° view angle) was mounted on the owl's head. The camera was self-assembled from a miniature micro-camera combined with a video broadcasting chip (900 MHz) and a rechargeable lithium-polymer battery (weight together with mounting unit was ∼10.5 g). The Owl-Cam was attached to the head using a 3D printed attachment unit glued to the skull with dental cement. The unit was designed to maintain a fixed and reproducible relationship between the Owl-Cam and the head. Because barn owls lack substantial eye movements, a head-fixed camera can provide a reliable estimation of the owl's gaze position (Ohayon et al., 2008). Initially, the position of the gaze center (functional fovea) was calibrated for each owl by allowing the owl to fixate on multiple targets on the screen. The average position of targets on the video frame results in a single position corresponding to the point of gaze (Harmening et al., 2011; Hazan et al., 2015). Owls were pretrained in a previous project (Lev-Ari and Gutfreund, 2018) to initiate a trial by fixating on a red dot on the center of the screen, waiting until it disappears, and then searching for Gabor patches on the screen. Thus, the owls were well trained to initiate the trials and search the screen for rewarded targets, but they had never been trained for the specific task and stimulus at hand. In the current task, after fixation was achieved, the red fixation dot disappeared followed by one of the four stimulus conditions: singleton, offset 90°, offset 180°, and mixed. All four conditions were interleaved randomly. In each test, the odd target was located randomly at one of the four positions corresponding to 4 cm above, 4 cm below, 4 cm to the right, and 4 cm to the left of the screen center (see Fig. 1A, top; see also Movies 1, 2). Owls typically searched the screen from a distance of ∼25 cm. As in the electrophysiological experiments, dots were moved 1.3 cm on the screen (corresponding to a view angle of ∼3°) for a duration of 600 ms. However, unlike in the electrophysiological experiments, the movement was repeated continuously on the screen for up to 30 s (once a dot reached the end of the path, it disappeared and reappeared immediately at the motion starting point; see Movies 1, 2). Rewards (small chunks of chicken meat) were fed manually to the owl by the experimenter sitting behind a curtain. Food was given with forceps a few centimeters above the screen. The owls were rewarded approximately every second trial for initiating the trials and seeking the screen, but reward was not associated with a specific target. Owls performed ∼20–40 trials a day. We tested the owls on consecutive days until they reached 35 repetitions of each condition (a total of 140 repetitions per owl for all four stimuli).
Data analysis and statistical testing.
Owl-Cam data were analyzed using a custom MATLAB GUI. Typically, owl search behavior consisted of stable fixation periods for 0.5–4 s durations terminated by rapid head saccades to a new fixation point (Movies 1, 2) (Hazan et al., 2015). To estimate the fixation target, we defined a circular area with a radius of 50 pixels around the center of gaze as estimated from the calibration process described above. This corresponds to a viewing angle of ∼8°. Any dot within this area maintained for 10 consecutive frames was considered to be a target of gaze. The relatively large window was chosen to account for the relatively large area centralis of barn owls (Wathey and Pettigrew, 1989) and to include errors that may arise from differences in distance and gaze angle to screen plane. In each trial, the time from stimulus onset to the first gaze on the target (search time [ST]) and number of head saccades to reach the target were registered. In addition, for control, the number of head saccades and time to the first gaze on the dot opposite the target were registered (for example, if the target was above the center, the control dot was below the center, etc.). A trial was considered a success if, during the 30 s window of stimulus presentation, the owl fixated on target. Differences in success rates between conditions were tested using the Mann–Whitney test (nonpaired). Differences between successes rates to target versus control were tested using the Wilcoxon test (paired). In the ST analysis, we discarded trials with STs slower than 3 times the SD of that test. This led to the exclusion of 5% and 6.5% of the trials for Owls DO and DK, respectively. STs were tested using one-way ANOVA with post hoc Tukey test.
Unit responses to a visual stimulus were calculated as the number of spikes in a given time window after stimulus onset minus the number of spikes during the same period of time immediately before stimulus onset (baseline activity). The duration of the time window for spike count was 600 ms, starting from the onset of motion. To observe the time course of the response, we generated poststimulus time histograms (PSTHs) with 15 ms time bins. PSTHs were normalized to the maximum value achieved in each experiment and averaged across the population. For graphical display, curves were smoothed (5-point sliding average). The SEMs were depicted as the width of the PSTH curves. Differences between population responses were analyzed using one-way ANOVA and post hoc Tukey tests. To quantify the contextual modulation, we calculated the modulation index (MI) as follows: MI = (Rcontext1 − Rcontext2)/(Rcontext1 + Rcontext2), where Rcontext1 is the response to the target motion in one surrounding context, and Rcontext2 is the response to the same target motion in a different context. Positive values of this index indicate a preference for context 1 over context 2. Distribution of MIs was tested using a binomial sign test.
Results
Behavioral experiments
Behavioral measurements were conducted in owls spontaneously viewing displays of dot arrays on a computer screen. In all conditions, a single rightward moving dot served as the target that could appear in one of four locations (see Materials and Methods; Fig. 1A). Following trial initiation (fixation of a red dot), the barn owls typically scanned the computer screen and surrounding room with abrupt head saccades (Movies 1, 2). If, during the 30 s after the trial initiation, the target dot appeared within the gaze point window (see Materials and Methods), the trial was considered to be a successful trial in which the time (ST) and number of head saccades to gaze-on-target were registered. In the singleton condition, when only the target was moving while the rest of the distractors were stationary, both owls acquired the target in all trials (success rate of 1; Fig. 1B). Success rates dropped for the offset 180° condition (0.78 and 0.93 in Owls DO and DK, respectively) and the offset 90° condition (0.76 and 0.73 in Owls DO and DK, respectively), and further decreased for the mixed conditions to 0.31 and 0.4 in Owls DO and DK, respectively (Fig. 1B). The success rate in the mixed condition was significantly smaller than in the other three conditions in both owls (Mann–Whitney test, Z = −5.870, p < 0.001; Z = −5.249, p < 0.001; singleton vs mixed for Owls DO and DK, respectively; Z = −3.890, p < 0.001; Z = −4.664, p < 0.001; offset 180° vs mixed for Owls DO and DK, respectively; Z = −3.633, p < 0.001; Z = −2.696, p = 0.007; offset 90° vs mixed for Owls DO and DK, respectively). In each of the four conditions, the success rate for fixating on a control dot (the dot opposite the target) was also measured (Fig. 1B, white columns). Success rates for reaching control dots were significantly lower in all conditions from reaching the target (Wilcoxon sign rank test; Z = −5.657, p < 0.001; Z = −4.600, p < 0.001 for Owl DO and Owl DK, respectively (singleton); Z = −4.264, p < 0.001; Z = −5.568, p < 0.001, offset 180°; Z = −5.477, p < 0.001; Z = −4.600, p < 0.001, offset 90°; Z = −1.897, p = 0.029; Z = −3.051, p = 0.001, mixed; for Owl DO and Owl DK, respectively). These data suggest that the target in the mixed condition is perceived to be less salient to the owls compared with the homogeneous conditions of both 180° and 90° offsets. However, even in the mixed conditions, the target attracts more gazing compared with the control targets (Fig. 1B, blue columns compared with corresponding white columns).
The perceived saliency of the target is expected to be reflected also in the speed on which the target is fixated. Therefore, we analyzed STs. Figure 1C shows the average ST for all four stimulus conditions. STs varied significantly in both owls, with the shortest average STs for the singleton conditions and the longest average STs for the mixed conditions (ANOVA, N = 33, 26, 25, 11; F(3,91) = 6.552, p < 0.001 for Owl DO; and N = 32, 31, 24, 14, F(3,96) = 10.427, p < 0.001 for Owl DK). The STs in the mixed conditions were significantly longer in Owl DO than the other conditions and in Owl DK significantly longer than the singleton and offset 180° conditions (post hoc Tukey test, p < 0.001, p = 0.037, p = 0.004 for Owl DO and p < 0.001, p = 0.003, p = 0.38 for Owl DK).
Figure 1D shows the cumulative distributions of the number of fixations (head saccades) to reach the target. For both owls, the curves in the singleton conditions (black curves) were shifted leftward and upward compared with the rest of the curves. In >70% of the trials, the singleton target was reached in less than five saccades (in both owls). On the other hand, in the mixed conditions, five saccades to target were observed in <20% of the trials. The curves representing the offset 180° and offset 90° conditions were in between the singleton and mixed conditions, indicating fewer saccades performed before reaching the target in the homogeneous conditions compared with the mixed conditions. Control curves (number of saccades to reach the control dot) in all cases were below the mixed condition curves (Fig. 1D, dashed lines). In summary, the results show that oddly moving dots were perceived to be more salient compared with dots moving coherently with other dots (control dots). However, the target dot attracted gaze faster, with less preceding saccades and more often when the background elements moved homogeneously compared with when they moved incoherently.
Electrophysiological experiments
In the first experiment, the responses of 99 multiunit recording sites from two owls were analyzed: 46 from the superficial bursty layers of the OT and 53 from the intermediate/deep layers (Ramon y Cajal layers 10–14) (Knudsen, 1982; Netser et al., 2010). In each recording site, the position of the RF was estimated, and the target dot was then positioned at approximately the center of the RF. We tested six conditions randomly interleaved across trials: the four conditions tested in the behavioral experiments, as well as a uniform condition and an offset 270° condition (Fig. 2, insets). In this study, we analyzed population data and report modulations at the population level. We therefore do not expect that restricting data to single-unit recordings would have qualitatively changed the results. Noteworthy, in previous studies where we isolated single units in the OT and compared with multiunits, the population results did not differ qualitatively (Reches and Gutfreund, 2008; Zahar et al., 2012; Wasmuht et al., 2017).
An example of the responses of a single recording site from the intermediate/deep layers is shown in Figure 2. In all conditions, the neurons responded above baseline to the motion of the dot within the RF. However, the response was highly modulated by the background context. Maximal responses were achieved in the singleton (50.3 spikes per stimulus) and offset 180° (39.3 spikes per stimulus) conditions (Fig. 2A and Fig. 2C, respectively). The response in the uniform condition was considerably smaller (13.8 spikes per stimulus; Fig. 2B). The responses in the upward (25.66 spikes per stimulus) and downward (28.06 spikes per stimulus) background motion (Fig. 2D and Fig. 2E, respectively) were smaller compared with the offset 180° condition (t test, n = 15, p < 0.001 for upward and p = 0.015 for downward motion); however, they were larger than the uniform condition (t test, n = 15, p < 0.001 for upward and p < 0.001 for downward motion). Thus, this site responded mostly to motion contrast of 180°, less to 90°, and the least to zero contrast. However, the response in the mixed condition (17.93 spikes per stimulus) was smaller compared with the coherent upward and downward motions (compare Fig. 2F with Fig. 2D,E; t test, n = 15, p = 0.017 for upward and p = 0.0105 for downward motion). This single site example was typical of the population pattern shown below.
To compare the responses at the population level, we averaged the PSTHs from all recording sites in the intermediate/deep layers (n = 53). First, we compared the population responses in the singleton, uniform, offset 180°, offset 90°, and offset 270° conditions (Fig. 3A). The singleton condition gave rise to the maximal average response followed by a lower, albeit still prominent, average response in the offset 180° condition. The uniform motion resulted in a dramatic reduction in average response strength (∼75% attenuation of peak response from the offset 180° context). This agrees with previous findings that tectal neurons robustly prefer opposing motion over uniform motion (Frost and Nakayama, 1983; Zahar et al., 2012). The average PSTHs to the target motion embedded in a coherent upward or downward motion array were positioned in between the responses for opposite and uniform motion conditions (ANOVA, N = 53, F(3,208) = 38, p < 0.001; post hoc Tukey test, p < 0.001). Next, we compared the population responses in the mixed condition with the offset 90° and offset 270° conditions (Fig. 3B). The average response to the mixed conditions was below the average responses to the offset 90° and offset 270° conditions and above the average response to the uniform condition (ANOVA, N = 53, F(3,208) = 9.5, p < 0.001; post hoc Tukey test, p = 0.02 and p = 0.009 mixed compared with offset 90° and offset 270° conditions, p = 0.003 mixed compared with uniform condition). In all stimulus conditions, the initial response was followed by a decrease of the average firing rates below the baseline level, suggesting an effect of inhibition. The crossing of the response curve to below the baseline was earliest for the uniform condition, followed by the mixed conditions, and latest for the offset 90° and offset 270° conditions (Fig. 3B).
For each of the four contrasting conditions between the RF motion and the surrounding motions (mixed, offset 270°, offset 90°, and offset 180°), an MI (see Materials and Methods) was calculated to quantify by how much the responses deviated from responses in the uniform condition. Figure 3C depicts the MIs for the offset 270° versus offset 90° condition. Both resulted with mostly positive MIs, indicating a preference to a target moving oddly in a direction orthogonal to the direction of the background elements compared with a target moving uniformly with its surrounding elements. Dots were distributed evenly around the center line (binomial sign test, p = 1, n = 53), indicating no average difference between modulation of the upward versus downward background motion. Therefore, in the following graphs, we combined results for the offset 90° and offset 270° conditions to a single group of orthogonal offsets. Figure 3D shows the MIs for an orthogonal offset versus the offset 180° condition, showing a bias of distribution to larger MIs for the offset 180° condition (binomial sign test, p < 0.001, n = 106). The MIs of the offset 180° condition were significantly larger than the MIs obtained in the mixed conditions (Fig. 3E; binomial sign test, p < 0.001, n = 53). Similarly, the MIs of the orthogonal offsets were significantly larger than the MIs obtained in the mixed conditions (Fig. 3F; binomial sign test, p < 0.001, n = 106). Thus, neurons in the intermediate/deep layers of the OT tended to prefer homogeneous over mixed background motion.
In this study, we distinguished between recordings from the superficial layers of the OT that receive direct retinal inputs and recordings from the intermediate/deep layers that receive visual inputs from the superficial layers and forebrain areas (Luksch, 2003). An example of a recording site from the superficial layers is shown in Figure 4. Except for the singleton condition, which shows a somewhat stronger, albeit not significantly different, response from the uniform condition (t test, p = 0.17, n = 15), all other conditions produced responses that apparently did not differ from each other.
Figure 5 shows the population analysis of all 46 recording sites from the superficial layers. The average population PSTH in the singleton condition was significantly higher than the average PSTHs to the other conditions (Fig. 5A; ANOVA, F(4,225) = 32.4, p < 0.001; post hoc Tukey tests, p < 0.0001). The average response to the offset 180° was the second highest response and significantly larger compared with the uniform and orthogonal conditions (post hoc Tukey tests, p < 0.001 for the uniform condition, p = 0.002 for the offset 90° condition, and p = 0.005 for the offset 270° condition). The average responses to the rest of the conditions did not differ significantly from each other (Fig. 5B; ANOVA, F(3,180) = 0.51, p = 0.6783). Thus, the main difference between the superficial and intermediate/deep layers was that on average the neurons in the superficial layers responded similarly to the uniform, orthogonal, and mixed offsets, whereas in the intermediate/deep layers the neurons were significantly modulated by these offsets, responding stronger to the homogeneous orthogonal conditions compared with the mixed condition (compare Fig. 3B with Fig. 5B). To directly compare recording sites in the superficial layers with results from recording sites in the intermediate/deep layers, we calculated for each site the average difference between the responses to orthogonal conditions and the mixed condition. The difference was significantly larger in intermediate/deep sites compared with superficial sites (two-tailed t test, df = 97; p = 0.0053).
In both the superficial and intermediate/deep layers, the MIs in the offset 90° condition did not differ systematically from the MIs in the offset 270° condition (Fig. 5C; binomial sign test, p = 1, n = 46). However, unlike in the intermediate/deep layers, the distribution of the MIs in the superficial layers was not significantly biased toward preferring the orthogonal background to the mixed background (Fig. 5F; binomial sign test, p = 0.08, n = 92).
The main conclusion from the electrophysiological results presented above is that the responses in the intermediate/deep layers of the OT to multiple elements cannot be simply explained by center-surround motion contrasts. The motion homogeneity of the surrounding elements plays a role in shaping the responses. Therefore, we performed a second experiment to address modulation by homogeneity, independent of contrast. For this, as before, the center dot moved to the right; however, the surrounding dots moved at offsets of either 135° or −135° (Fig. 6, insets). By having two symmetrical offsets from the target, we could control the homogeneity in the stimulus array (percentage of dots moving in one direction) while maintaining the direction contrast between the center and the surrounding elements equal. In this paradigm, we omitted the dots from the rectangle close to the target (Fig. 6B, insets). Thus, the surrounding dots were not closer than 20° from the target dot. We tested 49 multiunit recording sites in the intermediate/deep layers with this paradigm. Figure 6A shows raster plots of the responses from one recording site to five stimuli ranging in the percentage of the surrounding dots moving in a direction of 135°: 0% (0 of 21), 28% (6 of 21), 48% (10 of 21), 71% (15 of 21), and 100% (21 of 21). The average response was smallest for the mixed background (48%) and increased in both directions with increasing levels of homogeneity of the surround. At the population level, the homogeneity of the background elements significantly modulated the response (Fig. 6B; ANOVA, F(4,225) = 32.4, p < 0.001). Both homogeneous conditions (0% and 100%) elicited average responses that were significantly larger than the mixed condition (post hoc Tukey tests, n = 49, p < 0.001). The intermediate conditions (28% and 71%) also elicited larger responses compared with the mixed condition (post hoc Tukey tests, n = 49, p < 0.05). Thus, the recorded population of neurons code the motion homogeneity of the elements in the surround.
Next, we asked how many elements are required for an opposing effect to take place. For this, we performed an experiment in which the number of dots in the surrounding area varied between 0 and 21. In each trial, the number of dots and their positions on the screen were chosen randomly and either moved uniformly with the target dot (in the RF) or opposite the target. Possible positions for the dots were chosen from the dots array as in the experiment above (10° spacing). Again, we omitted the dots from the rectangle close to the target (Fig. 7B, insets). Data were collected from 54 multiunit recording sites in the intermediate/deep layers. A single dot moving inside the RF induced a vigorous response in the site shown in Figure 7A (lower raster and gray bar). Adding a second dot (somewhere in the surrounding array) resulted in a suppression of the average response. The suppression seemed independent of whether the motion was uniform to or opposite the target's motion. Similarly, 2, 4, or 6 dots in the surrounding area suppressed the response relative to the singleton response, independent of direction of motion (uniform or offset 180°). Stronger responses to opposing over uniform motion began to emerge when eight dots were displayed in the surrounding area and continued with additional dots (Fig. 7A, compare blue bars with red bars).
Across the recorded population (n = 54; Fig. 7B), the suppression of the response to a target in the RF by the additional dots in the surround is clear for both uniform and opposing conditions. However, in the uniform condition (red curve), suppression increased gradually reaching ∼75% suppression at 21 surrounding dots. In the offset 180° condition (blue curve), the downward inclination stopped at ∼4 surrounding dots, and suppression level was kept at ∼50% throughout (Student's t test comparing regression slopes, t860 = 4.57, p < 0.001). Thus, for the neurons to respond stronger to a motion contrast between the target and its surrounding area, several coherently moving elements are required.
Discussion
Tectal neurons are known for their sensitivity to local motion. A small moving object gives rise to strong tectal responses if moving relative to a static background (Verhaal and Luksch, 2015) or if moving in a direction opposite to its background (Frost and Nakayama, 1983). By contrast, if an object moves in the same direction as the background, the neural responses can be highly suppressed and sometimes completely eliminated (Frost et al., 1981; Frost and Nakayama, 1983; Dellen et al., 2004; Mysore et al., 2010; Zahar et al., 2012). This robust property of tectal neurons, which has been observed in the OT of fish, birds, and mammals (Davidson and Bender, 1991; Zahar et al., 2012; Ben-Tov et al., 2015), is thought to allow rapid detection of localized motion and is consistent with the proposed role of the OT in the selection of the most salient stimulus (Mysore and Knudsen, 2011; Dutta and Gutfreund, 2014). Tectal sensitivity to opposing motion over uniform motion has also been associated with the ability to ignore self-induced motion cues (Frost et al., 1990) and with figure-ground segregation (Frost et al., 1988; Davidson and Bender, 1991). However, sensitivity to opposing motion between the RF and its surrounding area is not sufficient for motion-based figure-ground segregation. For this, it is essential to respond preferentially to targets moving oddly relative to a homogeneous motion in the background (Hegdé and Felleman, 2003). Because this requirement necessarily contains motion contrasts between the RF and the surrounding area, it is not trivial to experimentally differentiate sensitivity to local motion contrasts from figure-ground sensitivity per se.
In this study, we addressed to what extent tectal neurons are modulated by homogeneity of the background. Previous studies have addressed a similar question in visual cortical areas by making use of conjunction stimuli (Hegdé and Felleman, 2003; Burrows and Moore, 2009). Using this approach, it was shown that V1 neurons in monkeys are mostly sensitive to RF: surround contrasts rather than pop-out per se (Hegdé and Felleman, 2003). Sensitivity to the homogeneity of the surround rather than to local discontinuities between the RF and the surround seems to require a higher level of visual processing (Burrows and Moore, 2009). Here, we used a somewhat different approach, testing one sensory feature, the direction of motion. We compared responses to stimuli where background motion is contrasting and homogeneous (similar to pop-out stimuli) with responses to stimuli where background motion is contrasting but not homogeneous (a mixed combination of three possible directions: two orthogonal and one opposite the direction of the target). The advantage in this design is that the target to background difference is defined by one feature and can therefore be quantified easily as the average difference to target across all elements. Thus, the motion direction contrast between the target object and the surrounding objects followed this order: offset 180° > mixed > offset 90° = offset 270° > uniform. Interestingly, the average population neural response in the intermediate/deep layers followed a different order: offset 180° > offset 90° = offset 270° > mixed > uniform. The responses to the orthogonal conditions outrun the mixed condition, even though, in the latter, the target direction differed more from the directions of the background elements. Thus, sensitivity to center-surround motion contrast does not provide a good description of the responses. The homogeneity or regularity of the surrounding area enhances the responses to the target, consistent with motion-based grouping for figure-ground segregation. This was also shown in an experiment where the homogeneity of the surround was varied while maintaining a constant background to center contrast (Fig. 6). The responses observed here are reminiscent of pop-out perception in humans (Duncan and Humphreys, 1989). In most visual search tasks, the strongest pop-out effect (shortest detection latencies and shallower search slopes) is observed when background elements are similar and the target is dissimilar. Pop-out strength scales down in a continuous manner as the similarity between the target and distractors increases, and scales down further as the similarity between the background elements decreases.
Our behavioral experiments show that barn owls, as well, perceive a target contrasting a homogeneous background as salient compared with a target contrasting a mixed background (Fig. 1). In our behavioral experiments, the owls were not trained to search for the odd target. Reward was given to encourage search behavior, but the target selection was spontaneous. This suggests that motion contrasting a homogeneous background is an innate and powerful salient feature for barn owls.
Comparing the neural responses with the behavioral results, we find that the population neural responses in the OT qualitatively matched the behavioral responses. The mixed conditions, which gave weaker neural responses compared with orthogonal and opposing motions, also gave slower responses and lower success rates at the behavioral level. However, at the neuronal level, the gap between the population responses of the offset 180° and the orthogonal conditions were larger than the gap between the responses of the orthogonal conditions and the mixed condition (Fig. 3A,B), whereas at the behavioral level, particularly in owl DO, the average difference between the responses to the offset 180° and orthogonal conditions was smaller relative to the difference to the mixed conditions (Fig. 1B,C). Thus, it seems that, at the behavioral level, the effect of the motion homogeneity over the mixed condition is stronger compared with at the neural level. This may indicate processing that takes place downstream from the OT to further separate between the mixed and the homogeneous conditions and/or that the read-out for perception is from a subpopulation of the recorded neurons. Interestingly, the spontaneous search of the odd target in the mixed condition, even though resulting in poor performance relative to the homogeneous conditions, was still significantly above chance level. This, again, agrees with the neural responses in the intermediate/deep layers, which on average were greater in the mixed conditions than in the uniform condition (Fig. 3B).
The Gestalt principles for perceptually organizing the visual scene have been established by human psychologists. However, birds can demonstrate remarkably similar principles. For example, barn owls have been shown to perceive subjective contours (Nieder and Wagner, 1999), and pigeons are capable of grouping by shape and color (Cook et al., 1996). Our finding adds to these previous findings, proposing the hypothesis that human Gestalt principles are manifestations of general neural mechanisms evolved to cope with common ecological needs of visually foraging animals. This raises the intriguing possibility that birds and mammals share similar neural mechanisms for perceptual grouping.
The intermediate/deep layers of the OT provide the major output pathways of the OT (Luksch, 2003). Neural responses in these layers have been shown to be highly context-dependent, modulated by other modalities (Zahar et al., 2009; Mysore et al., 2010), by stimulation history (Reches and Gutfreund, 2008; Netser et al., 2011), and by stimuli well outside of the RF (Mysore et al., 2010; Zahar et al., 2012). The findings of neural correlates of perceptual grouping in these layers agree with the emerging notion that the intermediate/deep layers of the OT form a priority map to represent the most relevant stimulus for the control of orienting behavior (Mysore and Knudsen, 2011; Gutfreund, 2012). This evolutionary role of the OT seems to be conserved in vertebrates all the way to primates (Boehnke and Munoz, 2008; Kardamakis et al., 2015). Neurons in the monkey's superior colliculus are also highly modulated by the surroundings and the history of stimulation (Davidson and Bender, 1991; Boehnke et al., 2011), and can discriminate between the selected target and distractors in visual feature and conjunction search tasks (McPeek and Keller, 2002; Shen et al., 2011).
It is possible that the processing for obtaining selective modulation by a homogeneous contrasting background takes place in the retina (Olveczky et al., 2003). Our results do not support this. The superficial layers of the OT, which receive direct retinal input and relay visual information to the intermediate/deep layers, did not show selectivity to homogeneous versus mixed backgrounds. However, if the neural responses are shaped by the motion direction contrasts between the RF and its surround, independent of the homogeneity, the prediction is for the responses to the mixed conditions to be significantly larger than the responses to the orthogonal conditions. This prediction is fulfilled neither in the superficial layers (Fig. 5B) nor in the intermediate/deep layers (Fig. 3B). Therefore, a basic effect of homogeneity can also be traced in the superficial layers. The effect increases in the intermediate/deep layers where the responses were found to code the level of homogeneity independent of the motion direction contrast (Fig. 6).
Previous studies have revealed an extensive lateral inhibitory network in the avian OT. This network contains a feedback loop through the isthmi complex, which enables the more powerful stimulus to suppress responses to the less powerful stimulus, and thus give rise to competitive interactions (Wang, 2003; Marín et al., 2007; Mysore and Knudsen, 2013). However, the lateral inhibition mediated by the isthmi complex seems to be nonspecific to direction or orientation of the stimulus (Maczko et al., 2006; Saha et al., 2011) and therefore cannot explain selective modulation. Consistent with nonspecific lateral inhibition, we observed nonspecific suppression by the surrounding elements when only one or two elements in the surround were shown. The sensitivity to motion contrasts that we observed in the OT seems to require a group of homogeneously moving elements (Fig. 7). The neural circuitry to achieve this important property is yet to be discovered.
Footnotes
This work was supported by the Israel Science Foundation to Y.G., Deutsche Forschungsgemeinschaft to H.W. and Y.G., and Rappaport Institute for Biomedical Research Research Grant to Y.G.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Yoram Gutfreund, Rappaport Faculty of Medicine, Technion, Bat-Galim, Haifa 31096, Israel. yoramg{at}tx.technion.ac.il