Abstract
Neural signals recorded at different scales contain information about environment and behavior and have been used to control Brain Machine Interfaces with varying degrees of success. However, a direct comparison of their efficacy has not been possible due to different recording setups, tasks, species, etc. To address this, we implanted customized arrays having both microelectrodes and electrocorticogram (ECoG) electrodes in the primary visual cortex of 2 female macaque monkeys, and also recorded electroencephalogram (EEG), while they viewed a variety of naturalistic images and parametric gratings. Surprisingly, ECoG had higher information and decodability than all other signals. Combining a few ECoG electrodes allowed more accurate decoding than combining a much larger number of microelectrodes. Control analyses showed that higher decoding accuracy of ECoG compared with local field potential was not because of differences in low-level visual features captured by them but instead because of larger spatial summation of the ECoG. Information was high in the 30–80 Hz range and at lower frequencies. Information in different frequencies and scales was nonredundant. These results have strong implications for Brain Machine Interface applications and for study of population representation of visual stimuli.
SIGNIFICANCE STATEMENT Electrophysiological signals captured across scales by different recording electrodes are regularly used for Brain Machine Interfaces, but the information content varies due to electrode size and location. A systematic comparison of their efficiency for Brain Machine Interfaces is important but technically challenging. Here, we recorded simultaneous signals across four scales: spikes, local field potential, electrocorticogram (ECoG), and EEG, and compared their information and decoding accuracy for a large variety of naturalistic stimuli. We found that ECoGs were highly informative and outperformed other signals in information content and decoding accuracy.
Introduction
Electrical activity from the brain can be recorded at various levels of resolution. Microelectrode arrays, typically inserted in the cortex of animals, record extracellular action potentials from nearby neurons, yielding single-unit activity (SUA) or multiunit activity (MUA). The extracellular signal recorded by these electrodes can be low-pass filtered to get the local field potential (LFP), thought to mainly reflect synaptic activity of a neural population around the microelectrode (Buzsáki et al., 2012). At the other extreme, noninvasive scalp electrodes are used to obtain electroencephalogram (EEG). Between these extremes lies the electrocorticogram (ECoG), typically obtained by placing macroelectrodes on the surface of the brain, widely used in epileptic patients to find the seizure focus (Lesser et al., 2010; Morshed and Khan, 2014; Yang et al., 2014).
These signals have been used in Brain Machine Interface (BMI) applications, which are especially useful in case of motor disabilities (Andersen et al., 2014; Bockbrader et al., 2018) or speech impairment (Herff and Schultz, 2016). Apart from decoding voluntary activity (active BMIs), forays have been made into passive BMIs, used for cognitive monitoring and providing contextual and sensory information that can supplement active BMIs (Zander and Kothe, 2011). In humans, EEG has been the popular choice for BMIs because it is noninvasive, but it suffers from poor signal-to-noise ratio (Blankertz et al., 2006; McFarland et al., 2010; Sereshkeh et al., 2017; Padfield et al., 2019). More recently, BMIs based on spiking activity, LFPs, and ECoG signals have also been used (Moran, 2010; Filippini et al., 2017; Slutzky and Flint, 2017; Ibayashi et al., 2018), including human subjects with paraplegia (Aflalo et al., 2015; Bouton et al., 2016; Milekovic et al., 2018, 2019). However, an objective comparison of the usefulness of these signals is lacking because of differences in recording setup, brain area and resolution, behavioral task, species, and so on.
Many BMI studies have focused on recordings from the motor cortex or associated areas, in which a motor command is generated from the signals when the subject imagines a particular movement (Aflalo et al., 2015; Meng et al., 2016). But the quality of motor imagery may differ among subjects (Marchesotti et al., 2016); and since the input is internally generated, it is not possible to directly quantify the variability in this input. An alternative approach is to record from a sensory area, in which the input is external and can be precisely controlled (Chechik et al., 2006; Chang et al., 2017). However, the information content in the signal may depend simply on the response properties of the neurons, such as their stimulus preferences and receptive field (RF) sizes, as well as the properties of the sensory stimulus. When combining across electrodes, the total information may also depend on interelectrode spacing.
Here, we addressed some of these concerns as follows. First, we designed a hybrid electrode array that contained both a 9 × 9 microelectrode array (400 μm separation, Blackrock Microsystems) and a 3 × 3 array of ECoG electrodes (2.3 mm diameter separated by 1 cm, Ad-Tech Medical Instrument), attached to the same connector and referenced to a single wire, allowing simultaneous spike, LFP, and ECoG recordings under almost identical noise conditions. These were implanted in the primary visual cortex (V1) of 2 female monkeys. In some sessions, EEG was also recorded simultaneously. Second, to ensure that the stimuli did not favor any one signal over others, we used a large variety of naturalistic stimuli (5 categories, 16 stimuli in each) and their grayscale and scrambled versions while the monkeys maintained fixation. In addition, we also recorded responses to full screen-oriented gratings. Together, this mimics a “natural” situation of decoding the external visual world using popular and commercially available electrode arrays of different types. Finally, we used an information theoretic approach to identify the most informative features of the neural responses. To compare efficiency for BMI applications, we calculated decoding accuracies and measured the effect of adding more electrodes as well as multiple features.
Materials and Methods
Animal preparation and recording.
Two adult awake female monkeys (Macaca radiata) weighing 3.3 and 4 kg (M1 and M2) were used. Details of the surgery from these 2 monkeys have been presented previously (Dubey and Ray, 2019a) and are described here in brief. The experiments were done according to the guidelines of the Institutional Animal Ethics Committee of the Indian Institute of Science, Bangalore, and the Committee for the Purpose and Supervision of Experiments on Animals. A titanium headpost was surgically attached to the skull. The monkeys were trained on a visual fixation task after which they underwent surgeries in which custom-made hybrid arrays were implanted in the left hemisphere. The hybrid array had 81 microelectrodes (9 × 9) from Blackrock Microsystems and 9 ECoG (3 × 3) electrodes from Ad-Tech Medical Instrument; both were connected to the same Blackrock 96 channel connector and had common reference wires. The platinum microelectrodes were 1 mm long, with a tip diameter of 3–5 μm, and an interelectrode distance of 400 μm. The ECoG electrodes, also made of platinum, were 2.3 mm in diameter separated by an interelectrode distance of 10 mm. A hole was made in the silastic sheet which the ECoGs were embedded in to make room for inserting the microarray. During surgery, a large craniotomy (∼2.8 mm × 2.2 mm) and a smaller duratomy were performed under general anesthesia. The ECoG strip was inserted and slid under the surrounding dura (Dubey and Ray, 2019a, their Fig. 1). For M2, due to difficulty in sliding, a part of the strip with 3 electrodes was cut off before sliding. The hole in the silastic was aligned with the duratomy, the microarray was placed in the hole and inserted into the cortex using a pneumatic inserter. The array was 10–15 mm rostral to the occipital ridge, and 10–15 mm lateral from the midline. Six ECoG electrodes in M1 and four in M2 were located on V1, posterior to the lunate sulcus. The dura was sutured back and the bone flap replaced. The reference wires were either inserted in the crevice of the craniotomy or wound around the titanium strap that secured the bone flap in place. Following a recovery period of ∼10 d, the monkeys performed the experimental tasks regularly. For simultaneous EEG recordings, 18 EEG (passive Grass electrodes) electrodes were placed on the scalp and connected to the same data acquisition system as the micro and ECoG electrodes. The EEG ground electrode was placed in front of the headpost, and the reference electrode was either behind or right lateral to the headpost. Because of the presence of titanium mesh and screws present under the skin to secure the large craniotomies, EEG data were generally noisy. We were able to get usable simultaneous EEG recordings only from a few occipital electrodes in M1.
Signals were recorded using Cerebus Neural Signal Processor from Blackrock. LFP, ECoG, and EEG signals were obtained by bandpass filtering the raw data between 0.3 Hz (analog Butterworth filter, first order), and 500 Hz (digital Butterworth filter, fourth order), and sampling at 2000 Hz. MUA was obtained by filtering the raw data between 250 Hz (digital Butterworth filter, fourth order) and 7500 Hz (analog Butterworth filter, third order) and setting an amplitude threshold at 5 SDs of the signal on all microelectrode channels.
Behavioral task and stimulus.
For the grating stimuli, full screen static-oriented gratings at 100% contrast and a spatial frequency of 2 or 4 cycles per visual degree were displayed, with orientation varying in steps of 22.5° from 0° to 157.5°. For naturalistic stimuli, 64 images were chosen from the McGill Color Calibrated Image Database (Olmos and Kingdom, 2004) (http://tabby.vision.mcgill.ca/html/browsedownload.html) and grouped in 4 sets of 16 each: Fauna (birds and animals), Flora (flowers, foliage and fruits), Textures and Landscapes (natural and manmade). Images were cropped and downsampled to get 1280 × 720 sized images. We also added a set of 16 images of human faces. We used gimp image editor to make grayscale versions of these images, which were displayed in separate sessions. Because the RF sizes and locations varied considerably for the different signals, it is impossible to equate low-level features, such as luminance, spatial frequency, contrast, and color across images (although we were able to compare some of these properties post hoc; see Fig. 15). Therefore, all images were presented full screen, without any further calibration. For two sets (Fauna and Texture), we also displayed Fourier scrambled versions of the images, interleaved with the original. They were obtained by computing the Fourier transform of the image, randomizing the phase values, and taking a Fourier inverse to get the scrambled image. This procedure did not change the overall luminance, but higher-order correlations were removed. We computed one scrambled image for each of the original ones. The stimuli are shown in Figure 4.
During a recording session, 8 (gratings), 16 (all stimuli from 1 image set), or 32 (from 2 image sets) stimuli were shown in a randomly interleaved fashion. The monkey sat in a chair with its head fixed via the headpost, in front of a gamma-corrected monitor (BenQ XL 2411, LCD, 1280 × 720, refresh rate 100 Hz). The distance from the screen to monkey's eyes was fixed at ∼50 cm. Each trial began with appearance of the fixation dot, followed by a blank gray screen for 1000 ms, after which 2–4 full screen stimuli were displayed for 500 ms each with an interstimulus interval of 500 ms. The monkey passively fixated on a white fixation dot (0.1° radius) at the center of the screen while keeping the gaze within 2° of the fixation dot. The monkey was rewarded by a juice drop for maintaining fixation for the whole trial. For performing information and decoding analysis, a large number of stimulus repeats were required, especially for population analysis involving multiple electrodes. Across sessions and monkeys, we obtained an average of 113 ± 19.8 (SD) repeats per stimulus for the gratings, and 67.7 ± 12.5 (SD) trials per stimulus across all image sets.
Electrode selection and RF mapping.
All data were analyzed using custom code written in MATLAB (The MathWorks, R2015b). Electrodes were selected on the basis of a RF mapping protocol as described previously, where we showed that ECoG RFs are surprisingly local, only ∼3 times the size of LFP RFs (Dubey and Ray, 2019a), and comparable with the RFs obtained in human subjects (Yoshor et al., 2007). RFs were estimated by flashing small gratings across the visual field. While we were able to reliably estimate the RFs of spikes, LFP, and ECoG using this procedure (Dubey and Ray, 2019a, their Fig. 3), these stimuli were too small to induce a measurable response in the EEG. LFP and ECoG electrodes that had consistent responses, and reliable estimates of RFs across days were selected. For the ECoG electrodes, we chose the ones that were posterior to the lunate sulcus and had a minimum response value >100 μV. For M2, only a small part of the grid was active in the first few weeks after implantation, but a second patch of microelectrodes started showing reliable activity ∼4 weeks after implantation. The results shown here include electrodes from both patches. LFP and ECoG RF centers for both monkeys are shown in Figure 3B. Overall, we obtained 77 and 31 LFP electrodes and 5 and 4 ECoG electrodes from M1 and M2, respectively.
Spiking electrode selection.
Spike sorting was performed on the selected LFP electrodes (77 and 31 for the 2 monkeys) using Spikesort (Kelly et al., 2007) (http://www.smithlab.net/spikesort.html). For the sorted units, we calculated the signal-to-noise ratio, and the trial averaged change in firing rate (FR) in the 250–500 ms period from baseline (0–250 ms) across stimulus conditions. We chose units that had signal-to-noise ratio above a threshold of 2 and a maximum FR change above a threshold of 3. This procedure yielded an average of 30.3 ± 4.5 and 14.0 ± 2.8 electrodes for grating protocols of M1 and M2, and 20.0 ± 7.0 and 13.7 ± 5.7 electrodes for image protocols of the 2 monkeys. For all of the following spike analysis, the FR in the 250–500 ms window after stimulus onset was used, unless otherwise mentioned.
Power calculation.
Power spectral density was calculated using the Multitaper method, using Chronux toolbox (Bokil et al., 2010) (http://chronux.org/), using 3 Slepian tapers. Baseline period (spontaneous activity) was chosen between 250 and 0 ms before stimulus onset, and response period was chosen between 250 and 500 ms after stimulus onset to avoid onset related transients, unless mentioned otherwise. This yielded a frequency resolution of 4 Hz. Power was calculated for each trial separately, and then averaged over trials to get the power spectral density for each electrode. The log of power (to base 10) was taken before averaging across electrodes. For the information analysis and the decoding calculations, the log of the trial wise power values for each electrode was used.
Field potential range (FPR) calculation.
For BMI applications, often a metric is desired that is simple, independent of arbitrary cutoffs (such as the gamma band range), and easy to compute. One such metric that can be computed in the time domain itself without any spectral analysis is FPR, which is simply the difference between the maximum and minimum potential of a signal (max(potential) − min(potential)) in the given time period (Liu et al., 2009). We calculated FPR for each trial of each stimulus presented for all selected LFP, ECoG, and EEG electrodes during either the early (0–250 ms) or late (250–500 ms) stimulus period and compared these with other measures based on power calculation.
Coefficient of variation (CV) calculation.
The observed responses could vary for two reasons: noisy fluctuations across trials when the stimulus was fixed and modulation by presentation of different stimuli. We measured the variability in the signal due to both these by calculating the CV, which is defined as the ratio of SD and mean. To measure the fluctuations across trials, we computed the noise CV (nCV). The “noise” reflects response variations for a fixed stimulus; these may even arise from important neural contributions (therefore does not mean noise in the general sense of the word) (Belitski et al., 2008). To calculate how much the response variability depended on the stimulus-induced modulations, we calculated the signal CV (sCV). It also tells us how well the response can potentially encode the stimuli.
Mutual information (MI) calculation.
To measure how well a signal encodes the stimuli, we measured the MI between the stimulus shown (S) and the response (R). MI (I(S;R)) measures the amount of information between two random variables (Shannon, 1948; Cover and Thomas, 1999). Here, these random variables S and R represent the stimulus and response recorded. It is defined as follows:
where s ∈ S, r ∈ R, p(s) and p(r) are the probability mass functions of S and R; p(s) = Pr{S = s}. p(s,r) is the joint probability mass function of S and R, or the probability of observing response r and stimulus s together across trials. When the logarithm is used with a base of 2, MI has units of bits. The MI is also expressed in terms of entropy. Entropy is a measure of the uncertainty about a random variable, or the amount of information required to correctly say which instance r ∈ R occurred. It is defined as follows:
Observing an instance of another variable (s) may provide some information about R, and the remaining uncertainty then depends on the conditional probability p(r|s), that is the probability of observing value r ∈ R when s is known to have occurred. It is called the conditional entropy as follows:
The MI between R and S is the corresponding reduction in the entropy as follows:
The response random variable R may be unidimensional or multidimensional. In the latter case, for a pair of responses (R1, R2), p(r) in the above equations will be replaced by p(r1 r2), the probability of observing (r1, r2) across all s, and the conditional probability will be p(r1 r2 | s), the probability of observing (r1, r2) in response to a given s.
We measured the MI between the stimuli (S) and signal power (R) at each frequency between 0 and 150 Hz, for all LFP, ECoG, and EEG electrodes. The stimulus s could be from 8 gratings or 16 images depending on the session (entropy of 3 or 4 bits, respectively). For the sessions where we had displayed 32 images (from 2 sets), analysis was done separately for 2 (nonoverlapping) stimulus sets of 16 images each. The response r was the log of power at a given frequency. For calculating the joint information using two frequencies, the response was (r1, r2), where r1 was the log power at one frequency and r2 at the second frequency. In case of spiking or time domain analysis, the metric r was the FR or the FPR value.
To perform the calculations, we used the information breakdown toolbox (ibtB) (Magri et al., 2009). Trialwise responses for each electrode across stimuli were binned into 7 equi-populated bins. The resulting probability distributions were used to compute the response entropy, and conditional entropy, which led to a measure of MI as explained above. The empirical calculation of entropies suffers from a bias because of a finite number of samples available, and reduces as the sample size increases (Panzeri et al., 2007). In our case, we had a large number of trials available, and so the bias was less, which we removed using a bootstrapping procedure (Optican et al., 1991; Magri et al., 2009). The responses were randomly shuffled to remove any information they had about the stimulus and bootstrapped estimates (10 iterations) of residual information were thus obtained. The average residual information was subtracted from the estimated MI. Any negative values were set to 0. We used the same number of trials for each stimulus condition by randomly dropping any extra trials. This was done over five iterations, and the MI values were averaged over them.
Classification.
To measure how well the signals could be used to decode the stimuli, we used a simple linear decoder based on Fisher's linear discriminant (Fisher, 1936). Linear discriminant analysis (LDA) has been used both as a classifier and a dimensionality reduction tool. It assumes a multivariate normal distribution of data, and a common covariance matrix over classes but has been shown to be efficient even when these assumptions are not held (Li et al., 2006). LDA projects the data into a lower dimensional space such that the class means are maximally separated, while the within class variance is minimized. LDA is easy to use and robust when the number of observations (n) is larger than the number of predictors (p). However, when p > n, the covariance matrix estimates can be singular and estimation errors are more due to lack of observations. In such situations, regularization and shrinkage of covariance matrix have been proposed (Friedman, 1989), and a regularized discriminant analysis (Guo et al., 2007) is used. It uses a modified covariance matrix by regularizing it toward its diagonal, and a further shrinkage can be performed by dropping any features that have less discriminatory power. We used a simple LDA to measure decoding accuracy of single-electrode FRs and power at individual frequencies. We then used a regularized LDA (rLDA) when all frequencies were used as predictors for single electrodes, and also when multiple electrodes were pooled together. To use the same number of trials for all stimulus conditions, we used an iterative process to randomly drop any extra trials, and accuracy values were averaged over five such iterations.
The decoder was implemented using the MATLAB “fitcdiscr” function (Statistics Toolbox). The stimuli (8 gratings or 16 images) were given discrete numeric labels. Using threefold cross validation throughout, two-thirds of trials were used to train the decoder. Since the recordings were simultaneous, the same trials were used for training across all scales in the given session. Log power values from each of the LFP, ECoG, and EEG electrodes at each of the frequencies were used to train the decoder to obtain frequency wise decoding accuracy. Data from the remaining trials were used to get the test accuracy, using the MATLAB “predict” function. The frequencywise decoding accuracies were then averaged across electrodes for each of the scales.
To get the single-electrode decoding accuracy for spiking, we used the FR from each electrode with the LDA, and averaged accuracies over folds and across electrodes. For the single-electrode accuracy of the other scales, we used power at all frequencies between 0 and 150 Hz (38 unique values, since the frequency resolution was 4 Hz) as predictors and trained a regularized LDA. The model was trained and the unregularized covariance matrix computed. Then, the level of regularization was determined iteratively by using 20 levels (between 0 and 1) of the parameter gamma, which determines the regularized covariance matrix. For each of these levels, we also used 20 levels of the parameter delta, which determines which predictors can be dropped from the model. Iterations over gamma and delta were performed using MATLAB's “cvshrink” function, which computed an error estimate for each combination. From these, we chose gamma and delta using the Min-Min rule (Guo et al., 2007). If there still were multiple pairs, we chose the one with the lesser gamma. This gamma then determined the regularization of the trained model, and delta was a threshold applied on the weights of the predictors to drop the ones below it. The rLDA was then used to predict the test trials, and the process repeated over 3 folds as before. Single-electrode accuracies were averaged across electrodes for each session.
We also did the decoding analyses (using rLDA) when electrodes were randomly pooled together. The same microelectrodes were used for calculating FR accuracy, LFP accuracy, and the combined accuracy of both. For each pool size, a maximum of 10 iterations were taken with a different subset of randomly chosen nonrepeating electrodes. To get pooled spiking accuracy, the FR responses of all electrodes picked in the pool were used as predictors. Similarly, log power at all frequencies from all electrodes in a pool was concatenated and used to train the regularized decoder. The combined accuracy of spiking and LFP was calculated by using both the FR and power values of the same electrodes to train the decoder. ECoG electrodes were also randomly pooled in the same way; fewer electrodes resulted in lesser iterations. Accuracies were averaged over folds and iterations. To get the pooled accuracy for selectively chosen electrodes, a similar process was used, but electrodes were not picked randomly over iterations. Instead, they were sorted according to their individual performance; and for each subsequent pool size, one electrode was added followed by the next best one and so on.
Image analysis.
We analyzed some of the low-level features in the images. Because gamma oscillations were highly dependent on the color of the stimuli, we focused on color features, in particular, the hue, saturation, and value (for details, see Shirhatti and Ray, 2018). For this, we first converted the RGB images into HSV space using MATLAB command rgb2hsv. In the HSV space, the hue values (H) represent colors as angles on a color wheel. To linearize this metric, we used the cosine and sine of the hue values. The saturation (S) represents the purity of the color (1 for pure hue and 0 for grayscale). The value (V) represents the intensity (1 represents the highest intensity achievable by the monitor for a particular hue).
We first obtained the spatial frequency spectrum of these features using fft2 in MATLAB to get a 2D Fourier transform, and then performed radial averaging to get power spectral density. To get the distribution of the features in the images (see Fig. 15D, black curve), we used the cos(H), sin(H), S, and V values of all the pixels. To get the same distributions for LFP and ECoG, we extracted the image pixels falling in the RFs of LFP and ECoG electrodes and used those HSV values.
We calculated the mean features in the RF as follows. The mean S and the mean V were the average of the S and V values of all the pixels in that RF. The mean hue was obtained as a vector sum of the pixel hue angles, weighted by their saturation. We also calculated the overall average of electrodes by averaging in the same way, using the mean values obtained for each individual electrode.
To get the statistic values at different distances, we first identified the size and arrangement of the ECoG RFs. We then picked similarly arranged patches at different distances from the LFP grid. For each distance between the LFP grid and the ECoG cluster, we picked 5 random clusters. The average HSV of all LFP electrodes and all ECoG electrodes was calculated as described above, for all images (80 pairwise values), and their correlation was computed. Finally, the correlation values were averaged over the (5 randomly chosen) clusters.
ECoG modeling.
We modeled the ECoG signal as an average of LFP signals, as explained in a previous report (Dubey and Ray, 2019a). In short, we chose LFP electrodes in a square or rectangular grid (with the maximum difference between the length and breadth set to 1) and averaged their responses for each trial to get a “simulated ECoG” signal from the LFP signals.
Statistical analysis.
The frequencywise MI and decoding accuracy were compared against chance levels using a one-sample t test. The comparison of overall performance among pairs of scales was done using a two-sample t test.
Results
Spectral responses to oriented gratings and natural images
Using a customized hybrid array, we simultaneously recorded spiking activity, LFP, and ECoG responses in V1 from 2 awake monkeys while they fixated on full screen-oriented gratings or natural images. For 1 monkey, we also recorded occipital EEG responses to gratings. Figure 1A shows the trial-averaged power spectra in the 250–500 ms window after stimulus onset (colored traces) or during baseline (−250 to 0 ms; gray trace), when full screen gratings at different orientations were presented. Figure 1B shows the change in power from baseline obtained by subtracting the gray trace from the colored traces (this is essentially the log of the change in power from baseline, with units of decibels). The LFP and ECoG power changes were averaged across electrodes from both monkeys. We observed gamma oscillations (30–80 Hz) across all scales (LFP, ECoG, and EEG) in V1. We have previously compared the tuning preferences and orientation selectivity of spikes, LFP, and EEG (Murty et al., 2018), as well as LFP versus ECoG (Dubey and Ray, 2019b) from the same 2 monkeys. As shown previously, spiking activity had high orientation selectivity, but the preferred orientation varied across the microelectrode array (Fig. 2) (see also Murty et al., 2018, their Fig. 8). On the other hand, LFP and ECoG had similar orientation preferences and selectivity across the array (Fig. 2C) (see also Dubey and Ray, 2019b, their Fig. 5). EEG power was lesser in magnitude and had lower orientation selectivity, but had similar orientation preference (Fig. 2) (see also Murty et al., 2018, their Fig. 2).
Frequency spectra for full screen-oriented gratings. A, Power spectral density of responses (calculated between 250 and 500 ms) to full screen gratings of 4 cpd at 8 equidistant orientations between 0° and 157.5°, averaged across electrodes for LFP (M1: n = 77, M2: n = 31), ECoG (M1: n = 5, M2: n = 4), and EEG (M1: n = 10). Gray traces represent the baseline power (−250 to 0 ms). B, Change in power spectra for all 8 stimuli, averaged across electrodes from both monkeys (108 LFP, 9 ECoG, 10 EEG).
Responses to oriented gratings. A, Raster plots showing spiking responses to full screen-oriented gratings from two example electrodes from the 2 monkeys. The stimulus came on at 0 s and stayed on for 500 ms. B, The averaged FR over trials showing the tuning of the electrodes in A. C, Histogram of the preferred orientation and orientation selectivity of the different scales. For LFP, ECoG, and EEG, mean power in the range 45–70 Hz was used. For spiking electrodes, the FR values were used.
Full screen colored natural images typically elicited a broadband increase in power, accompanied by a peak in the gamma frequencies for some images (Fig. 3A), especially stimuli for which reddish colors were present inside the RFs that have been shown to induce very strong gamma (Shirhatti and Ray, 2018). Figure 3B shows one such image stimulus, along with RFs of LFPs and ECoGs (for details, see Materials and Methods). As shown previously, ECoG RFs were local; only ∼3 times the LFP RFs (Dubey and Ray, 2019a). Other stimuli used are shown in Figure 4. Figure 3C shows the change in power from baseline averaged across all electrodes and image sets of both monkeys. Interestingly, the ECoG power change was greater than LFP across frequencies. The peak in gamma range was smoothed out due to the averaging over image stimuli, many of which did not produce gamma. Peaks in the gamma range reduced when corresponding grayscale images were presented (Fig. 3D) but remained high for scrambled images (Fig. 3E). Regardless of the presence of gamma oscillations, in all cases, ECoG power change from baseline was higher than the LFP.
Frequency spectra for full screen natural images. A, Power spectral density of responses to 16 full screen images from one category (Texture) for the 2 monkeys averaged across electrodes for LFP (M1: n = 77, M2: n = 31) and ECoG (M1: n = 5; M2: n = 4). Responses are taken 250–500 ms after onset. Gray traces represent the baseline power (−250–0 ms). Each colored trace represents one stimulus. B, A part of one stimulus image with the RF centers of LFP electrodes (blue dots) and RFs of ECoG electrodes (colored circles). The border is in the same color as the corresponding trace in A. White dot represents the center of the screen where the monkey was fixating. C, The change in power from baseline averaged across all LFP electrodes and ECoGs of both monkeys across all image sets (16 images × 5 sets; 108 × 5 LFP, 9 × 5 ECoG). D, Same as in C, using grayscale versions of the same images. E, Same as in C, for scrambled versions of the colored images in Fauna and Texture categories (16 images × 2 sets, 108 × 2 LFP, 9 × 2 ECoG).
Stimulus images. A, Full screen grating at one representative orientation. B, Representative natural image stimulus, with its grayscale and scrambled versions. C, Images used in each of the image classes: Fauna, Flora, Texture, Landscape, and Faces. Colored borders around the Texture images correspond to the response traces in Figure 3.
ECoG, LFP, and EEG response variations to stimuli
To understand the overall effect of stimulus on the responses, we investigated the variability in the power due to the presentation of different stimuli using signal coefficient of variance (sCV), and variability in power across trials for the same stimulus using the nCV. For high information and decoding potential, we need high sCV and low nCV. Figure 5 shows the sCV and nCV for all stimulus sets, averaged across electrodes and monkeys. We observed the highest sCV in gamma range frequencies for both LFP and ECoG, in all stimulus sets. Additionally, images also had a smaller peak at lower frequencies (0–12 Hz). Gratings had two sCV peaks within the gamma range, centered at 36 and 56 Hz (discussed later). Occasionally, a second peak was also observed at ∼100 Hz, especially for textures, but this was simply a harmonic of the gamma peak. EEG responses to gratings showed a higher sCV between 50 and 70 Hz, although much lower than either of the other two scales.
CV versus frequency. CVs for LFP, ECoG, and EEG responses, for 5 colored image sets and for 1 set of oriented gratings. Solid lines indicate the sCV. Dotted lines indicate the nCV. The values are averaged across electrodes from both monkeys (108 LFP, 9 ECoG, 10 EEG); in the shaded regions, SEM is shown.
Importantly, for all image sets, the sCV for ECoGs was much higher than the LFPs, especially in the gamma frequency range. This means that the image stimuli caused much greater interstimulus response variation in ECoG responses than LFPs. In case of gratings, the sCVs of LFPs and ECoGs were comparable. The nCVs of LFP, ECoG, and EEG settled around similar values across different categories and frequencies, but this is due to the variability in the spectral estimator itself, not the biological signal (Jarvis and Mitra, 2001; Chandran et al., 2018).
Information and decoding of grating orientation across scales
Using single-trial power estimates for response period, we calculated the MI (for details, see Materials and Methods) between the grating stimulus and the log power at all frequencies between 0 and 150 Hz, for each electrode and scale (Fig. 6A shows one representative electrode for each scale). Averaged information across electrodes from both monkeys (Fig. 6C) shows that the most informative frequencies were between 25 and 80 Hz. Further, this information appeared to have two peaks (one between 25 and 45 Hz, the other between 45 and 80 Hz) for LFP and ECoG. Averaged over electrodes, LFPs (maximum of ∼0.24 bits) and ECoGs (maximum of ∼0.15 bits) had much higher information than EEG (maximum of ∼0.03 bits). We also used a linear decoder to decode the orientation of the stimulus (Fig. 6B,D). We found the same trend as with MI, with high decoding accuracy between 25 and 80 Hz. LFPs (maximum of ∼0.2) and ECoGs (maximum of ∼0.19) had similar values, and EEGs (maximum of ∼0.15) were least efficient. We also computed MI and decoding accuracies using FRs (black marker on the y axis) and using FPR, a metric that can be computed in the time domain without spectral analysis (simply the difference between the maximum and minimum of the signal between 250 and 500 ms; for details, see Materials and Methods). FRs showed high information (0.19 bits) and decoding accuracy (0.20), comparable with the maximum values obtained in the gamma range (MI: 0.24 bits, accuracy: 0.21 for LFP; MI: 0.15 bits, accuracy: 0.19 for ECoG). The values for LFP and ECoG FPR (MI: 0.03 bits, accuracy: 0.15 for both) were lower, comparable with the values obtained using power at low frequencies (the maxima between 0 and 12 Hz were as follows: MI: 0.03 bits, accuracy: 0.14 for both LFP and ECoG). This is not surprising because the absolute power at low frequencies is much higher; therefore, the unfiltered raw signal is dominated by low frequencies.
Frequency dependence of orientation information. Frequencywise MI and accuracy, for full screen grating stimuli of 8 orientations and 4 cpd spatial frequency. A, MI values obtained by using power from typical LFP, ECoG, and EEG electrodes. Markers on the y axis are MI values obtained by using the FPR (magenta, blue, green for the three signals) and FR (black). B, Same as in A, but with classification accuracies at each frequency. Dotted line indicates chance accuracy. C, MI at all frequencies for LFP, ECoG, and EEGs, averaged across electrodes pooled for both monkeys (108 LFP, 9 ECoG, 10 EEG). Shaded regions represent SEM. The markers are MI values obtained by using the FPR and FR averaged across electrodes for both monkeys. D, Same as in C, but with classification accuracies. Dotted line indicates chance performance. C, D, Bottom, Horizontal colored patches represent regions where values were significantly different from chance value (0 for MI, 0.125 for accuracy), using a one-sided t test (p < 0.05). E, Information provided jointly by using power at two frequencies, for the typical electrodes in A. F, Same as in E, but with classification accuracies for pairs of frequencies. G, H, Population results for MI and classification over pairs of frequencies, averaged across electrodes from both monkeys.
We also tested whether different frequencies provided independent information. For this, we used power values at two frequencies from each electrode and computed the joint MI about the stimulus, as well as used them as features for the linear decoder. This allowed us to see which pairs of frequencies were most informative and complementary. The information increased when frequencies at ∼40 Hz were paired with those at ∼56 Hz (Fig. 6E,G). Using a pair of frequencies, the maximum information observed was ∼0.36 bits for LFPs, ∼0.26 bits for ECoGs, and ∼0.05 bits for EEG. Similar frequencies also performed better in decoding and had maximum accuracies of ∼0.27 for LFPs, ∼0.24 for ECoGs, and ∼0.16 for EEGs (Fig. 6F,H). Overall, similar frequencies in LFP, ECoG, and EEG contributed the most information about stimulus orientation.
Information about natural images in different recording scales
Unlike gratings, the MI was significantly higher for ECoG than LFP at almost all frequencies (Fig. 7A). As expected from the sCV plots (Fig. 5), MI for ECoG had large peaks in the gamma frequencies (30–80 Hz), which was almost twice the MI for LFP. Power at lower frequencies (0–12 Hz) also was informative for both signals. MI of FR (0.22 bits averaged over categories) and FPR (0.17 bits for LFP, 0.25 bits for ECoG) were lower than the maximum values in the gamma band of ECoG (0.43 bits averaged over categories). Combined analysis using a pair of frequencies revealed that gamma frequencies combined with other frequencies, mainly the lower ones (0–12 Hz), provided more information (Fig. 7B), suggesting that different frequencies carried independent information about the natural scenes (Belitski et al., 2008). Combining higher frequencies (>80 Hz) with gamma and the lower frequencies also resulted in some increase in information.
Frequency dependence of information for image identity (250–500 ms). A, MI versus frequency, averaged across LFP and ECoG electrodes from both monkeys (108 LFP, 9 ECoG), for the five colored image sets. Shaded error bars indicate SEM. Horizontal colored patches represent regions where values are significantly different from chance (one-sided t test, p < 0.05). Markers on the y axis are MI values obtained by using the FPR (magenta, blue for LFP and ECoG) and FR (black). B, Joint MI using pairs of frequency, averaged across all LFP and ECoG electrodes. C, D, Same as in A, B, but for scrambled versions of Fauna and Texture images.
We also analyzed responses to colored scrambled images (Fauna, Textures) and observed an information profile similar to colored images, with highest information when gamma range frequencies were combined with lower frequencies (Fig. 7C,D). As before, MI of FPR (0.10 bits for LFP, 0.16 bits for ECoG) and FRs (0.15 bits) were lower than that obtained using power in the gamma frequency range of ECoG (0.44 bits). The decoding analysis for these stimuli also revealed higher performance of ECoGs, especially in the gamma frequencies (Fig. 8). For grayscale versions of these images, these gamma peaks were substantially reduced (Fig. 9) with lesser information in ECoG (0.23 bits), comparable with the MI values for FRs (0.18 bits) and FPR (0.19 bits for ECoG).
Accuracy versus frequency for images (250–500 ms). Frequency dependence of decoding accuracy for LFP and ECoG electrodes across different image classes. A, Frequency wise accuracy, averaged across all electrodes of both monkeys. Horizontal patches represent regions where values are significantly different from chance (one-sided t test, p < 0.05). B, Joint accuracy using frequency pairs. C, D, Same as in A, B, but for scrambled images of 2 sets.
MI versus frequency for grayscale images (250–500 ms). A, MI versus frequency, averaged across LFP and ECoG electrodes from 2 monkeys (108 LFP, 9 ECoG), for the five grayscale image sets. Shaded error bars indicate SEM. Horizontal colored patches represent regions where values are significantly different from chance (one-sided t test, p < 0.05). Markers on the y axis are MI values obtained by using the FPR (magenta, blue for LFP and ECoG) and FR (black). B, Joint MI using pairs of frequency, averaged across all LFP and ECoG electrodes.
We also calculated the MI for images using responses for the early period (0–250 ms), just after stimulus onset (Fig. 10). Typically, the stimulus onset-related transients are strong during this period (which have power at low frequencies), whereas the gamma rhythm is weaker (Ray and Maunsell, 2010). The event-related potential is also more salient in the early period. This was reflected in the MI results, with a peak now at low frequencies (which consequently also yielded high values of FPR MI) and a less salient peak in the gamma range. Importantly, even in this time period, MI for ECoGs were generally higher than LFP. MIs obtained using FRs remained lower than the maximum values obtained using FPR and power for both LFP and ECoG. Overall, across image types (Colored, Grayscale, Scrambled) and both the early and late periods, we observed higher information in ECoGs than LFPs for most frequencies.
Frequency dependence of information for image identity (0–250 ms). Same as in Figure 7, but using responses in early period (0–250 ms).
Single-electrode decoding performance across scales
As different frequencies supply nonredundant information, we used the power at all frequencies between 0 and 150 Hz as features with a regularized LDA as a decoder (for details, see Materials and Methods). For spiking activity, we used FR from each of selected electrodes in that session as a feature for the decoder. Figure 11A shows the accuracy for each scale, averaged over electrodes and sessions, for both monkeys.
Single-electrode decoding accuracy across scales. A, Decoding accuracy across FR, LFP, ECoG, and EEG for images, grayscale images, scrambled images, and gratings. Power at all frequencies and FR in response period (250–500 ms) was used for decoding. Individual markers represent the values averaged across electrodes for each session, with SEM. Bar plots represent averages across sessions. All image bar plots are averaged across 2 sessions (1 per monkey), except Fauna and Texture colored images bar plot (which have 2 sessions per monkey). Grating bar plot averages results for sessions with 4 and 2 cpd spatial frequency (total 5 sessions). Dashed lines indicate chance levels (0.0625 for images, 0.125 for gratings). Two-sample unequal variance t test was used to compare each pair of scales. *p < 0.05, **p < 0.005. B, Same as in A, but using responses in early (0–250 ms) period.
ECoGs showed the highest decoding accuracy for all colored image sets. For the grayscale versions, the overall accuracies of LFP and ECoG were lesser than for colored images, but ECoGs had better performance than other scales. For scrambled colored image, we found that the accuracy of ECoGs was again the highest with values comparable with that of colored images. Decoding accuracy of FR did not vary much across image types. These observations point toward two important aspects: (1) better identification of images at the ECoG scale than LFP; and (2) color being a major contributor toward image identification, perhaps more than the image contents, in area V1. On the other hand, for grating stimuli, the accuracy of LFPs and ECoGs was not significantly different (p = 0.17). The accuracy of EEGs was lowest and very close to chance performance. It is to be noted that, although FRs were tuned to orientation, we obtained a higher accuracy using power values because all frequencies were used. Individual frequencies did not have higher performance than individual FRs, as can be observed from values in Figure 6 (maximum accuracy at any frequency was ∼0.2).
Figure 11B shows the corresponding plots when the early (0–250 ms) response was used. As seen from Figure 10, this is dominated by the lower frequencies, which led to high overall values for LFP and ECoG accuracies. These results are consistent with a recent study, which showed high image decoding using early response in a free viewing task (Lewis et al., 2016), as well as another study that performed category classification using ECoG signals from humans (Liu et al., 2009). Figure 12 shows the decoding accuracies for late (A) and early (B) periods using FPR. We observed that the accuracies were higher when using all frequencies (Fig. 11) compared with using the broadband voltage signal (Fig. 12). Further, LFPs and ECoGs generally performed equally well using FPRs, consistent with a small difference in MI/decoding accuracy using power at low frequencies between LFP and ECoG.
Single-electrode decoding accuracy across scales using FPR. A, Decoding accuracy across LFP, ECoG, and EEG for images, grayscale images, scrambled images, and gratings. Accuracy was calculated using the FPR values for each electrode in 250–500 ms period. Dashed lines indicate chance levels (0.0625 for images, 0.125 for gratings). Two-sample unequal variance t test was used to compare each pair of scales. *p < 0.05, **p < 0.005. B, Same as in A, but using FPR in early (0–250 ms) period.
Increase in performance with combining channels and scales
We next investigated the improvement in decoding performance by combining more electrodes. For this, we used a subset of LFP electrodes, which had good spiking activity as well. For each pool size, we randomly chose electrodes over 10 iterations and used their FR, power, or both as features for the regularized decoder (for details, see Materials and Methods). We did similar pooling across ECoGs as well.
Figure 13 shows the pooled decoding performance for all stimulus sets for both monkeys. As expected, adding more channels increased performance. ECoG performance increased very steeply, such that a few ECoG electrodes outperformed a much larger number of microelectrodes for image decoding. LFP and spikes contributed nonredundant information, and their joint decoding (Fig. 13, black curves) was better than either of the two, but lesser than ECoG for image sets. In case of gratings, ECoG accuracy did not increase to similar levels. It was higher than a randomly chosen set of LFPs of corresponding pool size, but lesser than the combined performance of FR and LFP.
Increase in decoding accuracy with pooling electrodes. Decoding accuracy as a function of number of electrodes, for colored image sets and gratings. Accuracy increases with pool size for all modalities. Black curve represents the combined decoding performance of FR and LFP. The same electrodes were used for FR, LFP, and combined activity at each pool size. Error bars indicate SEM over iterations of choosing each pool size.
The decoding performance did not keep increasing at the same rate, indicating that only a few channels contributed most toward classification. To test this further, we performed similar pooled decoding as above but added electrodes in a ranked manner. From the single-electrode performance (Fig. 11A), we ranked the individual electrodes for each session in order of their decoding performance, successively added them to the pool, and calculated the combined accuracy. We found that a small number of spiking or LFP electrodes were responsible for most of the decoding (Fig. 14). In image decoding, ECoGs outperformed the best ranked LFP and spiking electrodes, and ECoG pooled performance was close to (or higher than) that obtained by using all LFP electrodes. In case of gratings though, the highest ranked microelectrodes outperformed the ECoGs.
Increase in accuracy with pooling ranked electrodes. Decoding accuracy as a function of number of electrodes, when the individual electrodes are successively added as per individual performance, for each scale. *The 90% of maximum accuracy attained by FR and LFP after combining electrodes.
Features of image stimuli used
Why did ECoG outperform LFP in MI/decoding in our data? Since the microelectrode and ECoG RFs were on different parts of the image, we tested whether the difference in performance could be due to differences in low-level stimulus features in their RFs. We focused our analyses on color-based features because gamma rhythm critically depended on color (Shirhatti and Ray, 2018). We used the HSV space to obtain four statistics for our images: cosine and sine of angular hue, saturation of the hue, and value that represents the intensity of the pixel (for details, see Materials and Methods). The spatial frequency spectra of these features (Fig. 15A) shows that the features change slowly in space, since lower frequencies had higher amplitude for all the image categories. Consistent with this, correlations between the metric values in LFP RFs versus dummy ECoG RFs were high when the distance between the LFP and ECoG centers was small and decreased slowly with increasing distance. The actual distance between LFP grid center and the center of ECoG RFs cluster was only a few degrees (Fig. 15B, vertical lines), resulting in similar low-level features inside the RFs of LFP and ECoG electrodes (Fig. 15C). Histograms of the feature values for the entire image (Fig. 15D, black trace) were similar to the values inside the LFP and ECoG RFs (Fig. 15D, magenta and blue traces), suggesting that both LFP and ECoGs sampled similar low-level features present in the images. Together, these results show that the higher decoding of ECoGs compared with LFP cannot be attributed to differences in RF features.
Statistics of image stimuli showing four features from HSV representation of the images: cos and sine of angular hue, saturation, and value. A, Spatial frequency amplitude spectra for the four metrics. Colored lines indicate the 5 image categories (averaged over 16 images). Black represents their average. B, The correlation between the average feature in the LFP RFs and in dummy ECoG RFs obtained by maintaining the relative ECoG layout intact but moving the center of this ECoG cluster away from the LFP cluster. Vertical lines indicate the actual distance between the center of LFP grid and center of ECoG cluster for the 2 monkeys. C, Scatter plot between the feature values averaged across electrode RFs (LFP, ECoG) for all images (80) for both monkeys. Insets, Correlation values for the 2 monkeys. ○, M1; □, M2. Different colors represent different categories. D, Distribution of pixelwise features in the full image (black), and in pixels falling in the RFs of LFP (magenta) and ECoG (blue) electrodes. Vertical lines indicate the mean values. E, Feature values across all electrode RFs (108 LFP, 9 ECoG) for images and their scrambled version in 2 categories (16 fauna [black], 16 texture [red]. Correlation values for the 2 monkeys are shown at the top. ○, LFP; □, ECoG.
We also compared the features in the original versus scrambled images. Since the scrambling was done for the entire image (not over the pixels within the RFs) while the features were preserved over the entire image, they were dissimilar within the RFs of LFP and ECoG, leading to large differences between the original and scrambled image values (Fig. 15E).
Increase in performance by averaging LFP signals
Because the low-level color features changed slowly over space for the natural images (Fig. 15A), nearby brain areas could be coding for similar features, such that spatial averaging of local signals could lead to a better representation of the stimulus features. This could provide a simple explanation for the superior performance of ECoG compared with LFP (in the Discussion, we discuss a few other reasons as well). To test this hypothesis, we modeled the ECoG signal as an average of LFP signals over a grid of electrodes of varying sizes, as done in a previous study (Dubey and Ray, 2019a). Averaging LFP electrodes this way is different from the earlier pooling method where each electrode was added as a new feature. We observed that the sCV of the modeled ECoG signal indeed increased as the grid size increased (Fig. 16A). The nCV, which was mainly dependent on the spectral estimator, showed a negligible reduction (Fig. 16B). Consequently, the decoding accuracy (Fig. 16C) also increased as more electrodes were included, reaching a plateau after a grid size of ∼4 × 4.
Averaging LFP signals over a grid. A, Signal CV versus frequency for increasingly larger grid sizes. The averaged response from LFP electrodes in a grid was used to obtain power and CV over trials (for details, see Materials and Methods). B, The nCV for the combinations in A. C, Decoding accuracy as a function of grid size. Horizontal line indicates the average decoding accuracy of all ECoG electrodes across categories. A–C, Values were averaged over 5 image categories, 2 monkeys, and all iterations of a given grid size. Error bars indicate SEM.
Discussion
We simultaneously recorded signals from four scales (spiking, LFP, ECoG, and EEG) from monkey V1 using a hybrid array having both microelectrodes and ECoGs. We investigated which frequencies and scales were informative about stimuli by using both information theoretic and decoding approaches. ECoG responses were highly informative and outperformed others in decoding image identity. Gamma range frequencies (30–80 Hz) were informative across scales and stimuli, especially during the late stimulus period (250–500 ms). Low frequencies also had high information about natural images, especially during the early stimulus period (0–250 ms). Adding electrodes within and across scales led to better accuracy, suggesting that they conveyed nonredundant information. Higher performance of ECoG electrodes compared with LFP was not due to differences in low-level signal properties but instead due to larger spatial summation: a “modeled” ECoG signal obtained by averaging LFP signals over a grid of electrodes also improved performance. Although responses to naturalistic scenes in LFP and ECoG separately have been reported previously (Kayser et al., 2003; Belitski et al., 2008; Liu et al., 2009; Brunet et al., 2015; Hermes et al., 2015), to our knowledge, this is the first study where simultaneous responses have been recorded from four scales.
It is unclear whether these results are specific to V1 or generalizable to other brain areas. As discussed later, the information content may depend on the RF sizes of the neurons, statistical properties of the images, spatial spreads of various signals, as well as that of the networks that generate gamma or low-frequency oscillations. Although some of these properties are brain area specific (e.g., RFs), others may be more intrinsic (e.g., spatial spreads of signals). Similar studies in other visual areas are necessary to study the generalizability of our results. We note that the EEG signal quality in our recording setup was poor due to the presence of a large amount of metal hardware on the skull (see Materials and Methods). Additionally, we used only power values and did not explore other features, such as cross-frequency coupling (Whittingstall and Logothetis, 2009) or steady-state visual evoked potentials, which can both be quite informative (for a review of features used in BMIs, see Padfield et al., 2019). Further, task training has been used to improve the performance of EEG based BMIs (Wolpaw et al., 1991; McFarland et al., 2010), but we wanted to exclude such additional effects.
Relationship with previous studies
ECoGs have previously been used for decoding movements (Hu et al., 2018), speech (Mugler et al., 2014), object categories (Liu et al., 2009; Majima et al., 2014), and stimulus location and images (Lewis et al., 2016). We add to this literature by performing a direct comparison across simultaneously recorded scales, and showing that not only is ECoG efficient, but it outperforms other signals, at least in our recording conditions. Our results are consistent with recent attempts at comparing cortical signals using specialized arrays (Toda et al., 2011; Miyakawa and Hasegawa, 2013; Ibayashi et al., 2018) that have also shown that ECoGs have high decoding accuracy.
In both LFP and ECoG, we found high image information in gamma (30–80 Hz) and lower frequencies (1–12 Hz). Similar results in a free viewing task have been reported previously (Lewis et al., 2016). We observed increased information if these frequencies were combined. That different frequency ranges provide independent information in V1 has also been shown using movies (Belitski et al., 2008). Full-screen gratings elicit reliable gamma oscillations, which can have two components (slow and fast) preferring different orientations (Murty et al., 2018). We observed high information in gamma frequencies, which had two peaks (Fig. 6). The different preferred orientations of the two gammas can be responsible for increased information by combining two frequencies within gamma range (Fig. 6). However, we did not see two clear peaks in the power spectra, especially for M1 (Fig. 1). This could be because of lower resolution, shorter stimulus duration, and analysis period than the previous study. The gamma peak frequency of M1 also shifted with orientation and may contribute to observing two peaks in the population information. Consistent with the previous study, however, we found that 90° orientation had higher power in 45–70 Hz range for both monkeys.
In recent years, there has been some debate whether or not natural scenes elicit gamma (30–80 Hz) oscillations, with some reporting narrowband gamma peak in the power spectral density during free viewing of images (Brunet et al., 2015) and others showing weak or no narrowband gamma but a broadband increase in power ∼>80 Hz (Hermes et al., 2015), which has a different origin than narrowband gamma (Ray and Maunsell, 2011). We recently showed that reddish hues elicit strong gamma oscillations (Shirhatti and Ray, 2018); and indeed, gamma oscillations were induced mainly for images with reddish hues in the RFs; their grayscale counterparts did not elicit comparable gamma (Fig. 3C,D). Gamma peaks were observed for colored scrambled images, further showing that color is an important feature for gamma rhythm generation.
In the early period (0–250 ms), the gamma rhythm was less salient, and we observed better decoding at lower frequencies. This could simply be because of more variable event-related potential in the early period, which affects the lower frequencies. This was also observed in the FPR metric, which showed high decoding accuracy. In general, narrowband gamma is salient only for some stimuli, such as bars, gratings, and reddish hues (Bartoli et al., 2019) and hence is induced only for some stimuli, but broadband responses are elicited by all stimuli. Therefore, using low-frequency components of the signal (or using a metric, e.g., FPR) during the early period is a useful strategy to make quick decisions. On the other hand, gamma range has reasonably high accuracy, even in the early period (Fig. 10), and outperforms all other frequency ranges/metrics in the late period (Fig. 7), and is therefore a useful frequency range when more analysis time is available in BMI applications. High object category classification at low latencies (≤200 ms) has been observed in ECoG responses from human visual cortex (Liu et al., 2009, their Fig. 3). Since the early period has lesser feedback effects, better classification at short latencies has been argued to be consistent with short recurrent loops and feedforward mechanism of object recognition (Liu et al., 2009).
Why did ECoG outperform other signals in V1?
It is likely that the statistics of the images, in particular the spatial amplitude distribution (Fig. 15A), played an important role in the high performance of ECoG signals. As described in Results, because of the high amplitude at very low spatial frequencies, image features are likely to change slowly over space, leading to neural assemblies coding similar features. In such a situation, averaging the responses of such neural assemblies (which is effectively done by the ECoG electrode due to its larger size) leads to effective cancellation of random noise in the assemblies while preserving the common signal, and hence an improvement in the information content. It can therefore be argued that our results are therefore specific only for the images shown here. However, we used natural images over several categories, all of which showed a similar spatial amplitude distribution (Fig. 15A). Further, other studies have shown similar luminance amplitude spectra for natural images (Párraga et al., 1998, 2002) and high color correlation at short distances (Cecchi et al., 2010). Therefore, these results are likely to hold true for natural images in general. The preservation of MI and decoding accuracy after image scrambling can be explained based on this, since while the scrambling procedure changes in features inside the RFs (Fig. 15E), the overall image spatial distribution remains the same.
Apart from image statistics, there could be other intrinsic features that could have contributed to the high performance of ECoG. For example, gamma oscillations, which contributed to improved performance, were stronger (Fig. 3A), and the sCV was higher (Fig. 5) in ECoG than LFP. The representation of gamma rhythm in a signal is likely to depend on the spatial spread of the network that generates gamma, as well as the spatial spread of the signal itself (the cortical area around the electrode that contributes to it). In a previous study comparing the relative spreads of LFP and ECoG, we found that the ECoG spread is surprisingly local (only three times the LFP), with a diameter of ∼3 mm (Dubey and Ray, 2019a). Size of a “coherent gamma network” can be estimated by observing how the coherence between signals recorded from microelectrode pairs decreases with interelectrode distance, which we have computed for the same 2 monkeys (but different arrays) in a previous study (Murty et al., 2018, their Figs. 7 and 8), and has been previously reported by Jia et al. (2011, their Fig. 8). Gamma coherence appears to reduce at intraelectrode distances of ≥3–4 mm. Thus, ECoGs, at least in V1, may be recording from a brain area comparable with that over which coherent gamma oscillations are generated (which may depend on inhibitory network projections thought to generate gamma), and therefore capturing them much better than the LFP. If so, our results may change with the size of the gamma network, or the spatial spread of the signal itself (Lindén et al., 2011; Pesaran et al., 2018).
Even for stimuli that do not generate strong gamma, local features, such as orientation, contrast, and spatial frequency, which drive V1 responses, can be more effectively represented by a larger neural population. By having larger RFs and a larger cortical spread, the ECoGs may pick activity better than LFPs, leading to stronger modulation by image features. We show that, by averaging signals over a larger area (simulating ECoGs), the accuracy increases (Fig. 16). These considerations encourage developing appropriate models to study the relationship between response, electrode size, image features, and visual spread, as has been attempted recently for LFP gamma (Hermes et al., 2019).
The relationship between interelectrode distance and RF is likely to be another key factor. When combining electrodes, accuracy of ECoG may be increasing more steeply simply because ECoGs were farther apart (separated by at least 10 mm), had nonoverlapping RFs, and therefore sampled different locations in the visual space. Microelectrodes, on the other hand, had largely overlapping RFs, and consequently more redundant information. The separation between two microelectrodes is ∼400 μm for the popular Utah arrays that we used, assuming that it is enough to prevent picking the same SUA on nearby electrodes. But it may not be sufficient for LFPs, which have a larger spread (diameter 0.5–1 mm) (Katzner et al., 2009; Xing et al., 2009; Dubey and Ray, 2016, 2019a). Given that LFPs were more informative than FRs, larger interelectrode separation may be useful for BMI applications. More recordings with coarsely spaced microelectrodes or finely spaced smaller ECoGs are required to find the optimal configuration for BMIs.
Finally, ECoG performance could be superior because the electrodes were farther from the craniotomy and sampled a healthier neural population. While this is a technical issue, any recording setup using electrodes inserted in the brain is likely to cause some unavoidable tissue damage, which can affect BMI performance.
Implications for BMIs
Along with accuracy, the longevity of a signal is important for BMIs. Since most human ECoG recordings last only for a week or two, it has been difficult to get estimates of signal stability over time. In our recordings, we obtained clean signals for several months with the data shown here recorded within 8 (M1) and 13 (M2) weeks after surgery. Although we did not quantify the signal quality as a function of time, the stability of ECoG was comparable to, if not better than, microelectrode recordings. Clear gamma peaks could be observed in recordings taken ∼8 months after surgery in M1. Human ECoG signals have been shown to be stable over several days and used for decoding objects (Bansal et al., 2012). More recently, ECoG signals have been used to control an exoskeleton by a tetraplegic patient in a study ranging for ∼2 years (Benabid et al., 2019). These results have implications for BMIs where ECoGs can prove to be the most desirable implants as they are less invasive and have a long history of medical use.
Footnotes
This work was supported by Wellcome Trust/DBT India Alliance 500145/Z/09/Z Intermediate Fellowship to S.R., and Tata Trusts Grant and DBT-IISc Partnership Programme.
The authors declare no competing financial interests.
- Correspondence should be addressed to Supratim Ray at sray{at}iisc.ac.in