Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
  • EDITORIAL BOARD
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
  • SUBSCRIBE

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
  • EDITORIAL BOARD
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
  • SUBSCRIBE
PreviousNext
Featured ArticleResearch Articles, Behavioral/Cognitive

Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling

Camilla L. Nord, Vincent Valton, John Wood and Jonathan P. Roiser
Journal of Neuroscience 23 August 2017, 37 (34) 8051-8061; DOI: https://doi.org/10.1523/JNEUROSCI.3592-16.2017
Camilla L. Nord
1Institute of Cognitive Neuroscience, University College London, London, WC1N 3AZ, United Kingdom, and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Camilla L. Nord
Vincent Valton
1Institute of Cognitive Neuroscience, University College London, London, WC1N 3AZ, United Kingdom, and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vincent Valton
John Wood
2Research Department of Primary Care and Population Health, University College London Medical School, London NW3 2PF, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jonathan P. Roiser
1Institute of Cognitive Neuroscience, University College London, London, WC1N 3AZ, United Kingdom, and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonathan P. Roiser
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Recently, evidence for endemically low statistical power has cast neuroscience findings into doubt. If low statistical power plagues neuroscience, then this reduces confidence in the reported effects. However, if statistical power is not uniformly low, then such blanket mistrust might not be warranted. Here, we provide a different perspective on this issue, analyzing data from an influential study reporting a median power of 21% across 49 meta-analyses (Button et al., 2013). We demonstrate, using Gaussian mixture modeling, that the sample of 730 studies included in that analysis comprises several subcomponents so the use of a single summary statistic is insufficient to characterize the nature of the distribution. We find that statistical power is extremely low for studies included in meta-analyses that reported a null result and that it varies substantially across subfields of neuroscience, with particularly low power in candidate gene association studies. Therefore, whereas power in neuroscience remains a critical issue, the notion that studies are systematically underpowered is not the full story: low power is far from a universal problem.

SIGNIFICANCE STATEMENT Recently, researchers across the biomedical and psychological sciences have become concerned with the reliability of results. One marker for reliability is statistical power: the probability of finding a statistically significant result given that the effect exists. Previous evidence suggests that statistical power is low across the field of neuroscience. Our results present a more comprehensive picture of statistical power in neuroscience: on average, studies are indeed underpowered—some very seriously so—but many studies show acceptable or even exemplary statistical power. We show that this heterogeneity in statistical power is common across most subfields in neuroscience. This new, more nuanced picture of statistical power in neuroscience could affect not only scientific understanding, but potentially policy and funding decisions for neuroscience research.

  • neuroscience
  • power
  • statistics

Introduction

Trust in empirical findings is of vital importance to scientific advancement, but publishing biases and questionable research practices can cause unreliable results (Nosek et al., 2012; Button et al., 2013). In recent years, scientists and funders across the biomedical and psychological sciences have become concerned with what has been termed a crisis of replication and reliability (Barch and Yarkoni, 2013).

One putative marker for the reliability of results is statistical power: the probability that a statistically significant result will be declared given that the null hypothesis is false (i.e., a real effect exists). It can be shown that, in the context of field-wide underpowered studies, a smaller proportion of significant findings will reflect true positives than if power is universally high (Ioannidis, 2005). A recent influential study by Button et al. (2013) calculated statistical power across all meta-analyses published in 2011 that were labeled as “neuroscience” by Thomson Reuters Web of Science. It concluded that neuroscience studies were systematically underpowered, with a median statistical power of 21%, and that the proportion of statistically significant results that reflect true positives is therefore likely to be low. The prevalence of very low power has serious implications for the field. If the majority of studies are indeed underpowered, then statistically significant findings are untrustworthy and scientific inference will often be misinformed. This analysis provoked considerable debate in the field about whether neuroscience does indeed suffer from endemic low statistical power (Bacchetti, 2013; Quinlan, 2013). We sought to add nuance to this debate by reanalyzing the original dataset using a more fine-grained approach and provide a different perspective on statistical power in neuroscience.

We extended the analyses of Button et al. (2013) using data from all 730 individual studies, which provided initial results that were consistent with the original report (which used only the median-sized study in 49 meta-analyses). To quantify the heterogeneity of the dataset we made use of Gaussian mixture modeling (GMM) (Corduneanu and Bishop, 2001), which assumes that the data may be described as being composed of multiple Gaussian components. We then used model comparison to find the most parsimonious model for the data. We also categorized each study based on its methodology to examine whether low power is common to all fields of neuroscience.

We find strong evidence that the distribution of power across studies is multimodal, with the most parsimonious model tested including four components. Moreover, we show that candidate gene association studies and studies from meta-analyses with null results make up the majority of extremely low-powered studies in the analysis of Button et al. (2013). Although median power in neuroscience is low, the distribution of power is heterogeneous and there are clusters of adequately and even well powered studies in the field. Therefore, our in-depth analysis reveals that the crisis of power is not uniform: instead, statistical power is extremely diverse across neuroscience.

Materials and Methods

Reanalyzing “power failures”

Our initial analysis took a similar approach to that of Button et al. (2013), but, contrary to their protocol (which reported power only for the median-sized study in each meta-analysis: N = 49), we report power for each of the 730 individual studies (see Fig. 3a, Table 1). As in the original analysis, we defined power as the probability that a given study would declare a significant result assuming that the population effect size was equal to the weighted mean effect size derived from the corresponding meta-analysis (note that this differs from post hoc power, in which the effect size would be assumed to be equal to the reported effect size from each individual study; O'Keefe, 2007).

View this table:
  • View inline
  • View popup
Table 1.

Characteristics and classification of included meta-analyses

For experiments with a binary outcome, power was calculated by assuming that the expected incidence or response rate for the control group (i.e., the base rate) was equal to that reported in the corresponding meta-analysis and, similarly, used an assumed “treatment effect” (odds or risk ratio) equal to that given by each meta-analysis. The test statistic used for the calculation was the log odds-ratio divided by its SE. The latter was derived from a first-order approximation and estimated by the square root of the sum of the reciprocals of the expected values of the counts in the 2-by-2 summary table. The test statistic itself was then referenced to the standard normal distribution for the purposes of the power calculation. For studies reporting Cohen's d, the assumed treatment effect was again taken directly from the corresponding meta-analysis and all power calculations were based on the standard, noncentral t distribution. For comparability with the original study, we calculated the median power across all 730 individual studies which was equal to 23%, close to the 21% reported by Button et al. (2013).

Figure 1 shows an overview of our analytical process. We additionally classified each study according to methodology: candidate gene association studies (N = 234), psychology (N = 198), neuroimaging (N = 65), treatment trials (N = 145), neurochemistry (N = 50), and a miscellaneous category (N = 38 studies from N = 2 meta-analyses). Two independent raters categorized the 49 meta-analyses into these six subfields, with 47/49 classified consistently; the remaining two were resolved after discussion. Before continuing our analysis in more depth, we present the reader with results that are directly comparable to the analysis of Button et al. (2013) (with the addition of the subfields; Table 2). These results are intended for comparison with our more nuanced characterization of the distributions using GMMs presented below; given the results of those GMMs (which suggest the these distributions are multimodal and therefore not well characterized by a single measure of central tendency), they should not be used to draw strong inferences.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Classification of studies for analysis. Description of study methodology. GMM = Gaussian mixture model.

View this table:
  • View inline
  • View popup
Table 2.

Median, maximum, and minimum power subdivided by study type

One or many populations?

The common measures of central tendency (mean, median, and mode) may not always characterize populations accurately because distributions can be complex and made up of multiple “hidden” subpopulations. Consider the distribution of height in the United States: the mean is 168.8 cm (Fryar et al., 2012). This statistic is rarely reported because the distribution comprises two distinct populations: male (175.9 cm, 5th–95th percentile 163.2–188.2 cm) and female (162.1 cm, 5th–95th percentile 150.7–173.7 cm). The mean of the male population is greater than the 95th percentile of the female population. Therefore, a single measure of central tendency fails to describe this distribution adequately.

In an analogous fashion, the original study by Button et al. (2013) reported a median of 21% power, which could be interpreted as implying a degree of statistical homogeneity across neuroscience. The use of the median as a summary statistic while having the straightforward interpretation of “half above and half below” also implies that the power statistics are drawn from a distribution with a single central tendency. As we show below, this assumption is contradicted by our analyses, which makes the median statistic difficult to interpret. It should be noted that Button et al. (2013) themselves described their results as demonstrating a “clear bimodal distribution.” Therefore we explored the possibility that the power data originated from a combination of multiple distributions using GMM.

GMM (similar to latent class analysis and factor models; Lubke and Muthén, 2005) can be used to represent complex density functions in which the central limit theorem does not apply, such as in the case of bimodal or multimodal distributions. We fit GMMs with varying numbers of k unknown components to the data and performed model selection using Bayesian information criteria (BIC) scores to compare models with different fit and complexity (i.e., the higher the number of k unknown components, the more complex the model). This allowed us to take a data-driven approach, as opposed to direct mixture models using a set number of components: therefore, we were agnostic as to the number of components that emerged from the model. The GMM with the lowest BIC identifies the most parsimonious model, trading model fit against model complexity. A difference in BIC between models of 10 or above on a natural logarithm scale is indicative of strong evidence in support of the model with the lower score (Kass and Raftery, 1995). To ensure that we used the most suitable GMM for this dataset, we ran different GMM models: standard GMMs, regularized GMMs, and Dirichlet process GMMs (DPGMMs; see below for full methods and Fig. 2 for model comparison and selection). The results were similar using each of these techniques (Fig. 2).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Model comparison and model selection analysis for GMMs, regularized GMMs, and DPGMMs. The blue and red lines display BIC scores (natural log scale) for nonregularized GMMs and regularized GMMs, respectively, for different levels of model complexity (number of mixture components). The lowest BIC score indicates the model that provides the best compromise between model fit (likelihood) and model complexity for the given dataset. Winning models for GMMs (purple dotted-dash vertical line), regularized GMMs (yellow dashed vertical line), and DPGMMs (green dotted vertical line) are clearly present for each dataset, enabling direct comparison of the output for each methodology. The regularized GMM approach provided the most parsimonious interpretation of the data on the two main datasets: all studies (a), excluding null studies (b) as well as five out of six subfield datasets (c–h).

Finite Gaussian mixture model.

For a finite GMM, the corresponding likelihood function is given by the following (Corduneanu and Bishop, 2001): Embedded Image where πi denotes the mixing coefficient (proportions of the ith component), 𝒩(xn|θi) denotes the conditional probability of the observation xn given by a Gaussian distribution with parameters θi, and D denotes the whole dataset of observations, xn. Generally speaking, this means that we believe that there is an underlying generative structure to the observed data and that a mixture of Gaussian components would be a reasonable description/approximation of the true generative process of these data. That is, we assume that the data D have been generated from a mixture of Gaussian distributions with varying means, variances, and weights (model parameters), which we want to uncover. To do so, we perform model inversion and find the point estimates of the model parameters that maximize the likelihood (see the equation above) of the observed data (maximum likelihood estimation).

Model inversion is performed using the iterative expectation–maximization algorithm, which finds a local maximum of the likelihood function given initial starting parameters. We performed 50 restarts with kmeans++ initialization (Arthur and Vassilvitskii, 2007). Multiple restarts were performed to find the global maximum of the likelihood (i.e., the best GMM for the data: the parameters that maximize the chance of observing the data), as opposed to a local maximum. This allowed us to ensure that convergence was achieved for all GMMs on all datasets.

Traditionally, finite mixture modeling approaches require the number of components to be specified in advance of analyzing the data. That is, for each finite Gaussian mixture model fitted to the data, one is required to input the number of components K present in the mixture (model inversion only estimates the parameters for each component). Finding the number of components present in the data is a model selection problem and requires fitting multiple GMMs with varying numbers of components to the data, comparing the model evidence for each fit, and selecting the most parsimonious model for the data in question (Bishop, 2006; Gershman and Blei, 2012; Murphy, 2012).

It is worth noting, however, that GMMs can be subject to instabilities such as singularities of the likelihood function. Specifically, it is possible for one component to “collapse” all of its variance onto a single data point, leading to an infinite likelihood (Bishop, 2006; Murphy, 2012), and to incorrect parameter estimation for the model. Multiple techniques have been developed to address this problem. The simplest and most commonly used technique is to introduce a regularization parameter. Another is to adopt a fully Bayesian approach and apply soft constraints on the possible range of likely parameter values, therefore preventing problematic and unrealistic parameter values. Both methodologies were used in this study and we report on the resulting analysis for both implementations in the model selection section (below).

Finite Gaussian mixture model with regularization.

In typical finite mixture models, a regularization parameter can be added to avoid likelihood singularities. To do so, a very small value is added to the diagonal of the covariance matrix, enforcing positive-definite covariance and preventing infinitely small precision parameters for individual components. This model specification enables one to address the issue of “collapsing” components, but also enforces simpler explanations of the data, favoring models with fewer components. The larger the regularization parameter, the simpler the models will be, because single components will tend to encompass a larger subspace of the data partition. In this study, we introduced a regularization parameter of 0.001, which represents a reasonable trade-off between preventing overfitting components to noise in the dataset while capturing the most salient features from the data (the separate peaks), thus providing a better generative model of the data than using nonregularized GMMs. We used this approach for our primary inferences.

Dirichlet process Gaussian mixture model.

DPGMMs are a class of Bayesian nonparametric methods that avoid the issue of model selection when identifying the optimal number of components in a mixture model (Gershman and Blei, 2012; Murphy, 2012). With DPGMMs, we expand the original GMM model to incorporate a prior over the mixing distribution and a prior over the component parameters (mean and variance of components). Common choices for DPGMM priors are conjugate priors such as the normal-inverse-Wishart distribution over the mean and covariance matrix of components and a nonparametric prior over mixing proportions based on the DP.

The DP, often referred to as the Chinese restaurant process or the stick-breaking process, is a distribution over infinite partitions of integers (Gershman and Blei, 2012; Murphy, 2012). As a result, the DPGMM theoretically allows for an infinite number of components because it lets the number of components grow as the amount of data increases. The DP assigns each observation to a cluster with a probability that is proportional to the number of observations already assigned to that cluster. That is, the process will tend to cluster data points together, depending on the population of the existing cluster and a concentration parameter α. The smaller the α parameter, the more likely it is that an observation will be assigned to an existing cluster with probability proportional to the number of elements already assigned to this cluster. This phenomenon is often referred to as the “rich get richer.” This hyperparameter α indirectly controls how many clusters one expects to see from the data (another approach is to treat α as unknown, using a gamma hyperprior over α, and letting the Bayesian machinery infer the value; Blei and Jordan, 2006).

Implementation and analysis for the nonregularized finite GMMs, regularized finite GMMs, and DPGMMs was performed using MATLAB R2015b (The MathWorks) using the Statistics and Machine Learning toolbox, the Lightspeed toolbox, and the vdpgm toolbox (Kurihara et al., 2007).

Model selection

To identify the winning model we used the BIC, which allows one to compute an approximation to the Bayes factor (relative evidence) for a model. The BIC typically has two terms, the likelihood (how well the model fits the data) and a complexity term that penalizes more complex models with more free parameters (e.g., the number of components). The model with the lowest BIC metric is usually preferred because it provides the most parsimonious and generalizable model of the data.

For each one of the following datasets, model fits were performed using nonregularized and regularized finite mixtures with up to 15 components (up to 10 components for the subfield categories; Fig. 2): the original dataset; the original dataset excluding null studies; each methodological subfield within the original dataset (genetics, psychology, neurochemistry, treatment, imaging, and miscellaneous studies); and the original dataset excluding each methodological subfield. Model selection was then performed using the BIC to select the most parsimonious model for each dataset. Figure 2 presents (for each dataset) the corresponding BIC metric for increasing levels of model complexity. Plain blue lines denote the BIC metric using nonregularized GMMs and plain red lines denote the BIC using regularized GMMs. The BIC metric curve for nonregularized GMMs (blue line) exhibits wide jumps (Fig. 2), whereas the function should remain relatively smooth, as seen with regularized GMMs (red line). This suggests that nonregularized GMMs results were prone to overfitting and were inadequate for some of our datasets.

Finally, we compared different modeling methodologies to select and report the most robust findings in terms of the estimation of the number of components. We compared nonregularized GMMs, regularized GMMs, and DPGMMs on the same datasets (Fig. 2) and found that regularized GMMs generally provided the most conservative estimation of the number of components. We therefore opted to report these results as the main findings.

Results

We analyzed the original sample of 730 powers (see histogram in Fig. 3a). If the median were the most appropriate metric to describe the distribution of powers across studies, then we would expect the GMM to produce a solution containing only a single component. Instead, the most parsimonious GMM solution included four components, with strong evidence in favor of this model versus either of the next best models (i.e., GMMs with three or five components; Fig. 2). Importantly, this model revealed that the overall distribution of power appears to be composed of subgroups of lower- and higher-powered studies (Fig. 3a, overlay). We next explored possible sources of this variability, considering the influence of both null effects and specific subfields of neuroscience.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Power of studies. Shown are histograms depicting the distribution of study powers across all 730 studies (a) and across studies excluding null meta-analyses (b). However, we note that excluding power statistics from studies included in null meta-analyses may provide an overestimation of power because, in many instances, there remains uncertainty as to whether a true effect exists. Pale overlay shows the results of the regularized GMM, identifying four components (C1, C2, C3, and C4) and their relative weights within the dataset. Below the histogram, pie charts depict methodological subfields and null meta-analyses contributing to each component. The null studies (white pie-chart sections) comprise 52 genetic studies and 40 treatment studies. The dark blue line shows the sum of the components (overall GMM prediction). c–h, Histograms depicting the distribution of study powers across all meta-analyses separated by subfield: candidate gene association studies (c), psychology studies (d), neurochemistry studies (e), treatment studies (f), imaging studies (g), and miscellaneous studies (h). Pale overlays show the results of the regularized GMM for each subfield; the dark lines show the sum of the components (overall GMM prediction).

When is an effect not an effect?

The first important source of variability that we considered relates to the concept of power itself. The calculation of power depends not just on the precision of the experiment (heavily influenced by the sample size), but also on the true population effect size. Logically, power analysis requires that an effect (the difference between population distributions) actually exists. Conducting a power analysis when no effect exists violates this predicate and will therefore yield an uninterpretable result. Indeed, when no effect exists, the power statistic becomes independent of the sample size and is simply equal to the type I error rate, which by definition is the probability of declaring a significant result under the null hypothesis.

To illustrate this point, consider the meta-analysis titled “No association between APOE ε 4 allele and multiple sclerosis susceptibility” (Xuan et al., 2011), which included a total of 5472 cases and 4727 controls. The median effect size (odds ratio) reported was precisely 1.00, with a 95% confidence interval (CI) from 0.861 to 1.156. Button et al. (2013) calculated the median power to be 5%, which is equal to the type I error rate. However, as is evident from the study's title, this meta-analysis was clearly interpreted by its authors as indicating a null effect, which is consistent with the observed result. Indeed, in this case, the power is 5% for both the largest (N > 3000) and the smallest (N < 150) study in the meta-analysis. In such cases, the estimate of 5% power is not easily interpretable.

Conversely, it is problematic to assume that a nonsignificant meta-analytic finding can be taken as evidence that there is no true effect; in the frequentist statistical framework, failure to reject the null hypothesis cannot be interpreted as unambiguous evidence that no effect exists (due to the potential for false-negative results). For example, the study by Chung and Chua (2011) entitled “Effects on prolongation of Bazett's corrected QT interval of seven second-generation antipsychotics in the treatment of schizophrenia: a meta-analysis” reported a median effect size (odds ratio) of 0.67, with a 95% CI from 0.43 to 1.04. Although this result was nonsignificant, the point estimate of the effect size is greater than those from several meta-analyses that did achieve statistical significance and, in our view, it would be premature to conclude that this effect does not exist.

These examples illustrate the difficulty in deciding whether conducting a power analysis is appropriate. Even tiny effect sizes could hypothetically still exist: in any biological system, the probability that an effect is precisely null is itself zero; therefore, all effects “exist” by this definition (with certain exceptions, e.g., in the context of randomization), even if to detect them we might need to test more individuals than are currently alive. However, the notion of “falsely rejecting the null hypothesis” then loses its meaning (Jacob Cohen, 1994). One approach would be to assume that an effect does not exist until the observed evidence suggests that the null hypothesis can be rejected, consistent with the logical basis of classical statistical inference. This would avoid any potential bias toward very-low-power estimates due to nonexistent effects. Conversely, this approach raises the potential problem of excluding effects that are genuinely very small, which may cause a bias in the other direction. Within the constraints of the null hypothesis significance testing framework, it is impossible to be confident that an effect does not exist at all. Therefore, we cannot simply assume that an effect does not exist after failing to reject the null hypothesis because a small effect could go undetected.

Motivated by this logic (specifically, that excluding power statistics from studies included in null meta-analyses may provide an overestimation of power because, in many instances, there remains uncertainty as to whether a true effect exists), we initially included studies from “null meta-analyses” (i.e., those in which the estimated effect size from the meta-analysis was not significantly different from the null at the conventional α = 0.05) in our GMMs (Fig. 3a). However, we note that excluding power statistics from studies included in null meta-analyses may provide an overestimation of power because, in many instances, there remains uncertainty as to whether a true effect exists. Nonetheless, with the above caveats in mind, we also wished to assess the degree to which null meta-analyses may have affected the results. Null results occurred in seven of the 49 meta-analyses (92 of the 730 individual studies), contributing a substantial proportion of the extremely low-powered studies (<10% power; Fig. 3a, white pie chart segment of C1). When we restricted our analysis only to studies within meta-analyses that reported statistically significant results (“non-null” meta-analyses), the median study power (unsurprisingly) increased, but only slightly, to 30%, and the nature of the resulting GMM distribution did not change substantially (Fig. 3b). In other words, excluding null meta-analyses does not provide a radically different picture. Therefore, we also examined another potential contributor to power variability in neuroscience: the influence of specific subfields of neuroscience.

Power in neuroscience subfields

As described above, we categorized each meta-analysis into one of six methodological subfields. Interestingly, statistical power varied significantly according to subfield (permutation test of equivalence: p < 0.001), with genetic association studies lower (11% median power) than any other subfield examined (all p < 0.001, Mann–Whitney U tests). Such variability across neuroscience subfields is consistent with the original report by Button et al. (2013), which reported the median power of animal studies (18% and 31% for two meta-analyses) and case-control structural brain imaging studies (8% across 41 meta-analyses). However, even within specific subfields, the distribution of power is multimodal (Fig. 3c–h). This could represent variability in statistical practices across studies, but another possible explanation is that the size of the effect being studied varies substantially between meta-analyses, even within the same subfield. This alternative explanation may, at least in part, account for the variability between (and within) subfields of neuroscience.

The large number of extremely low-powered candidate gene association studies warrants additional comment. These were included in the original analysis because the Web of Science classifies such studies as “neuroscience” if the phenotypes in question are neurological or psychiatric disorders. However, modern genome-wide association studies have revealed that the overwhelming majority of candidate gene association studies have been underpowered because the reliable associations that have been identified are extremely weak (Flint and Munafò, 2013); therefore, very low power is expected within this subgroup, which our analysis confirms (Fig. 3c). This subgroup of studies can offer important lessons to the rest of neuroscience: without large genetic consortia, the field of neuropsychiatric genetics might still be laboring under the misapprehension that individual common variants make substantial contributions to the risk for developing disorders. Providing that sampling and measurement are standardized, pooling data across multiple sites has the potential to improve dramatically, not only statistical power, but also the precision on estimates of effect size.

Because numerous studies report that candidate gene association studies are severely underpowered (Klerk et al., 2002; Colhoun et al., 2003; Duncan and Keller, 2011), and given that candidate gene association studies comprised more than one-third of our total sample of studies, we suspected that they might contribute heavily to the lowest-power peak in our distribution. We confirmed this: in the absence of genetic studies, many studies remained underpowered, but the distribution contained proportionally fewer studies in the lowest-power peak (∼10% power; Fig. 4a). Although low power is clearly not limited to candidate gene association studies, they have a greater influence on the overall power distribution than any other subfield, skewing the distribution toward the lowest-power peak (Fig. 4b–f).

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

GMMs excluding each subfield. GMMs for the whole population of studies excluding genetic studies (a), psychology studies (b), neurochemistry studies (c), treatment studies (d), imaging studies (e), and the remaining miscellaneous studies (f). Compare with the distribution including all studies (Fig. 3a).

Simulating power in hypothetical fields

One clear conclusion of our analyses is that the interplay between the proportion of true effects and the power to detect those effects is crucial in determining the power distribution of a field. We simulated four power graphs for hypothetical fields to illustrate this point: one with low power (∼50%) in which all effects exist (Fig. 5a); one with high power (∼90%) in which all effects exist (Fig. 5b); one with low power (∼50%), in which only a minority (25%) of effects exist (Fig. 5c); and one with high power (∼90%) in which only a minority (25%) of effects exist (Fig. 5d). We found that the “low-power” field did not resemble the distribution of power in neuroscience that we observed (Fig. 3a). Instead, our findings were closest to a mixture of two distributions: Figure 5c with low (∼50%) power in which only 25% of findings are true effects and Figure 5d with high (∼90%) power in which only 25% of findings are true effects. This would be consistent with the notion that the absence of true effects may contribute to the distribution of statistical power in neuroscience.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Simulated power distributions for four hypothetical fields: “easy field” with low power (∼0.5) and all effects exist (a); “easy field' with high power (∼0.9) and all effects exist (b); “hard field” with low power (∼0.5) (for those effects that exist), but where effects exist in only 25% of cases (c); and “hard field” with high power (∼0.9) (for those effects that exist), but where effects exist in only exist in 25% of cases (d). Power distributions were simulated by generating 50,000 power estimates for a one sample t-test with a fixed sample size (N = 45) while varying effect size. For each panel, the effect size was sampled from a truncated (effect size >0) Gaussian distribution with mean 0.3 (a, c) or 0.49 (b, d) to represent low or high power, respectively. For the “hard” fields (c, d), 75% of the effect size sample was generated from a half-Gaussian distribution with mean = 0. SD was set to 0.07 for all effect size distributions. Similar results can be obtained by fixing the effect size and varying the sample size.

Discussion

Implications for neuroscience

We argue that a very influential analysis (cited >1500 times at the time of writing) does not adequately describe the full variety of statistical power in neuroscience. Our analyses show that the dataset is insufficiently characterized by a single distribution. Instead, power varies considerably, including between subfields of neuroscience, and is particularly low for candidate gene association studies. Conducting power analyses for null effects may also contribute to low estimates in some cases, though determining when this has occurred is challenging. Importantly, however, power is far from adequate in every subfield.

Our analyses do not negate the importance of the original work in highlighting poor statistical practice in the field, but they do reveal a more nuanced picture. In such a diverse field as neuroscience, it is not surprising that statistical practices differ. Whereas Button et al. (2013) were careful to point out that they identified a range of powers in neuroscience, their reporting of a median result could be interpreted as implying that the results were drawn from a single distribution, which our analyses suggest is not the case. We confirm that low power is clearly present in many studies and agree that focusing on power is a critical step in improving the replicability and reliability of findings in neuroscience. However, we also argue that low statistical power in neuroscience is neither consistent nor universal.

Ethical issues accompany both underpowered and overpowered studies. Animal deaths, drugs taken to human trials, and government funding are all wasted if power is too low. However, blindly increasing sample size across the board simply to satisfy concerns about field-wide power failures is also not the best use of resources. Instead, each study design needs to be considered on its own merits. In this vein, one response to the original article pointed out that any measure of a study's projected value suffers from diminishing marginal returns: every additional animal or human participant adds less statistical value than the previous one (Bacchetti et al., 2005, 2008, 2013).

Studies with extremely large sample sizes can also fall prey to statistically significant findings for trivial effects that are unlikely to be either theoretically or clinical important (Lenth, 2001; Ioannidis, 2005; Friston, 2012; Quinlan, 2013). In other words, the assessment of power is determined by what we consider to be an interesting (i.e., nontrivial) effect size (Cohen, 1988). This dependency means that power considerations are meaningless in the absence of assumptions about how large effect sizes need to be to be considered theoretically or clinically important and this may vary dramatically across different fields. This is particularly relevant in fields in which multiple comparisons are performed routinely, such as genetics and neuroimaging (Friston, 2012). Conversely, smaller studies can only detect large effect sizes and may suffer from imprecise estimates of effect size and interpretive difficulties. Crucially, there is no single study design that will optimize power for every genetic locus or brain area. In fact, power estimates for individual studies are themselves extremely noisy and may say little about the actual power in any given study. A move away from presenting only p-values and toward reporting point estimates and CIs (as long advocated by statisticians) and toward sharing data to improve such estimates would allow researchers to make better informed decisions about whether an effect is likely to be clinically or theoretically useful.

Estimations of effect size

An important factor contributing to the estimation of power (at least using the approach followed here) is whether the effect size was estimated accurately a priori. If researchers initially overestimated the effect size, then even the sample size specified by a power calculation would be insufficient to detect a real, but smaller effect. Interestingly, our analysis also shows the existence of very-high-powered studies within neuroscience, in which far more subjects have been included than would technically be warranted by a power analysis. In this case, an a priori underestimate of effect size could yield a very-high-powered study if an effect proves to be larger than initially expected (which has occasionally been reported; Open Science Collaboration, 2015). Another important consideration is that an overestimation of effect size might occur due to publication bias, which will skew effect size estimates from meta-analyses upwards, resulting in an optimistic power estimate. This is an important caveat to the results that we report here: a bias toward publishing significant results means that the power estimates that we report will represent upper bounds on the true power statistics. Unfortunately, we could not adequately address this potential confound directly because tests of publication bias themselves have very low power, particularly if the number of studies in a meta-analysis is low. However, publication bias has long been reported in psychology (Francis, 2012) and neuroscience (Sena et al., 2010), so it is reasonable to assume that it has inflated estimates of statistical power in these analyses.

Conclusion

We have demonstrated the great diversity of statistical power in neuroscience. Do our findings lessen concerns about statistical power in neuroscience? Unfortunately not. In fact, the finding that the distribution of power is highly heterogeneous demonstrates an undesirable inconsistency both within and between methodological subfields. However, within this variability are several appropriately powered and even very-high-powered studies. Therefore, we should not tar all studies with the same brush, but instead should encourage investigators to engage in the best research practices, including preregistration of study protocols (ensuring that the study will have sufficient power), routine publication of null results, and avoiding practices such as p-hacking that lead to biases in the published literature.

Notes

Supplemental material for this article is available at http://osf.io/duyxb. Available are open data, including analysis scripts and meta-analysis data. This material has not been peer reviewed.

Footnotes

  • We thank Karl Friston, Geraint Rees, James Kilner, and Oliver Robinson for comments on an earlier draft of the manuscript; Katherine Button and Marcus Munafò for invaluable help with the replication portion of the analysis; Oon Him Peh for assistance with publication bias analyses; and the reviewers of the manuscript for helpful comments. J.P.R. is a consultant for Cambridge Cognition and Takeda. The remaining authors declare no competing financial interests.

  • Correspondence should be addressed to Camilla L. Nord, Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London, WC1N 3AZ, UK. camilla.nord.11{at}ucl.ac.uk

This is an open-access article distributed under the terms of the Creative Commons Attribution License Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. ↵
    1. Arthur D,
    2. Vassilvitskii S
    (2007) K-means++: the advantages of careful seeding. In: SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. pp 1027–1035. Society for Industrial and Applied Mathematics.
    1. Babbage DR,
    2. Yim J,
    3. Zupan B,
    4. Neumann D,
    5. Tomita MR,
    6. Willer B
    (2011) Meta-analysis of facial affect recognition difficulties after traumatic brain injury. Neuropsychology 25:277–285. doi:10.1037/a0021908 pmid:21463043
    OpenUrlCrossRefPubMed
  2. ↵
    1. Bacchetti P
    (2013) Small sample size is not the real problem. Nat Rev Neurosci 14:585. doi:10.1038/nrn3475-c3 pmid:23820775
    OpenUrlCrossRefPubMed
  3. ↵
    1. Bacchetti P,
    2. Wolf LE,
    3. Segal MR,
    4. McCulloch CE
    (2005) Ethics and sample size. Am J Epidemiol 161:105–110. doi:10.1093/aje/kwi014 pmid:15632258
    OpenUrlCrossRefPubMed
  4. ↵
    1. Bacchetti P,
    2. McCulloch CE,
    3. Segal MR
    (2008) Simple, defensible sample sizes based on cost efficiency. Biometrics 64:577–585; discussion 586–594. doi:10.1111/j.1541-0420.2008.01004_1.x pmid:18482055
    OpenUrlCrossRefPubMed
    1. Bai H
    (2011) Meta-analysis of 5, 10-methylenetetrahydrofolate reductase gene poymorphism as a risk factor for ischemic cerebrovascular disease in a Chinese Han population. Neural Regen Res 6:277–285.
    OpenUrl
  5. ↵
    1. Barch DM,
    2. Yarkoni T
    (2013) Introduction to the special issue on reliability and replication in cognitive and affective neuroscience research. Cogn Affect Behav Neurosci 13:687–689. doi:10.3758/s13415-013-0201-7 pmid:23922199
    OpenUrlCrossRefPubMed
  6. ↵
    1. Bishop CM
    (2006) Pattern recognition and machine learning. New York: Springer.
    1. Björkhem-Bergman L,
    2. Asplund AB,
    3. Lindh JD
    (2011) Metformin for weight reduction in non-diabetic patients on antipsychotic drugs: a systematic review and meta-analysis. J Psychopharmacol 25:299–305. doi:10.1177/0269881109353461 pmid:20080925
    OpenUrlCrossRefPubMed
  7. ↵
    1. Blei DM,
    2. Jordan MI
    (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–143. doi:10.1214/06-BA104
    OpenUrlCrossRef
    1. Bucossi S,
    2. Ventriglia M,
    3. Panetta V,
    4. Salustri C,
    5. Pasqualetti P,
    6. Mariani S,
    7. Siotto M,
    8. Rossini PM,
    9. Squitti R
    (2011) Copper in Alzheimer's disease: a meta-analysis of serum, plasma, and cerebrospinal fluid studies. J Alzheimers Dis 24:175–185. doi:10.3233/JAD-2010-101473 pmid:21187586
    OpenUrlCrossRefPubMed
  8. ↵
    1. Button KS,
    2. Ioannidis JP,
    3. Mokrysz C,
    4. Nosek BA,
    5. Flint J,
    6. Robinson ES,
    7. Munafò MR
    (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14:365–376. doi:10.1038/nrn3475 pmid:23571845
    OpenUrlCrossRefPubMed
    1. Chamberlain SR,
    2. Robbins TW,
    3. Winder-Rhodes S,
    4. Müller U,
    5. Sahakian BJ,
    6. Blackwell AD,
    7. Barnett JH
    (2011) Translational approaches to frontostriatal dysfunction in attention-deficit/hyperactivity disorder using a computerized neuropsychological battery. Biol Psychiatry 69:1192–1203. doi:10.1016/j.biopsych.2010.08.019 pmid:21047621
    OpenUrlCrossRefPubMed
    1. Chang WP,
    2. Arfken CL,
    3. Sangal MP,
    4. Boutros NN
    (2011a) Probing the relative contribution of the first and second responses to sensory gating indices: A meta-analysis. Psychophysiology 48:980–992. doi:10.1111/j.1469-8986.2010.01168.x pmid:21214588
    OpenUrlCrossRefPubMed
    1. Chang XL,
    2. Mao XY,
    3. Li HH,
    4. Zhang JH,
    5. Li NN,
    6. Burgunder JM,
    7. Peng R,
    8. Tan EK
    (2011b) Functional parkin promoter polymorphism in Parkinson's disease: new data and meta-analysis. J Neurol Sci 302:68–71. doi:10.1016/j.jns.2010.11.023 pmid:21176923
    OpenUrlCrossRefPubMed
    1. Chen C,
    2. Xu T,
    3. Chen J,
    4. Zhou J,
    5. Yan Y,
    6. Lu Y,
    7. Wu S
    (2011) Allergy and risk of glioma: a meta-analysis. Eur J Neurol 18:387–395. doi:10.1111/j.1468-1331.2010.03187.x pmid:20722711
    OpenUrlCrossRefPubMed
  9. ↵
    1. Chung AK,
    2. Chua SE
    (2011) Effects on prolongation of Bazett's corrected QT interval of seven second-generation antipsychotics in the treatment of schizophrenia: a meta-analysis. J Psychopharmacol 25:646–666. doi:10.1177/0269881110376685 pmid:20826552
    OpenUrlCrossRefPubMed
  10. ↵
    1. Cohen J
    (1988) Statistical power analysis for the behavioral sciences, Vol 2. Hillsdale, NJ: Lawrence Earlbaum.
  11. ↵
    1. Cohen J
    (1994) The earth is round (p < 0.05). Am Psychol 49:997–1003. doi:10.1037/0003-066X.49.12.997
    OpenUrlCrossRef
  12. ↵
    1. Colhoun HM,
    2. McKeigue PM,
    3. Smith GD
    (2003) Problems of reporting genetic associations with complex outcomes. Lancet 361:865–872. doi:10.1016/s0140-6736(03)12715-8 pmid:12642066
    OpenUrlCrossRefPubMed
  13. ↵
    1. Corduneanu A,
    2. Bishop C
    (2001) Variational Bayesian model selection for mixture distributions. In: Proceedings Eighth International Conference on Artificial Intelligence and Statistics. pp 27–34. Morgan Kaufmann Publishers Inc.
    1. Domellöf E,
    2. Johansson AM,
    3. Rönnqvist L
    (2011) Handedness in preterm born children: a systematic review and a meta-analysis. Neuropsychologia 49:2299–2310. doi:10.1016/j.neuropsychologia.2011.04.033 pmid:21601584
    OpenUrlCrossRefPubMed
  14. ↵
    1. Duncan LE,
    2. Keller MC
    (2011) A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. Am J Psychiatry 168:1041–1049. doi:10.1176/appi.ajp.2011.11020191 pmid:21890791
    OpenUrlCrossRefPubMed
    1. Etminan N,
    2. Vergouwen MD,
    3. Ilodigwe D,
    4. Macdonald RL
    (2011) Effect of pharmaceutical treatment on vasospasm, delayed cerebral ischemia, and clinical outcome in patients with aneurysmal subarachnoid hemorrhage: a systematic review and meta-analysis. J Cereb Blood Flow Metab 31:1443–1451. doi:10.1038/jcbfm.2011.7 pmid:21285966
    OpenUrlCrossRefPubMed
    1. Feng XL,
    2. Wang F,
    3. Zou YF,
    4. Li WF,
    5. Tian YH,
    6. Pan FM,
    7. Huang F
    (2011) Association of FK506 binding protein 5 (FKBP5) gene rs4713916 polymorphism with mood disorders: a meta-analysis. Acta Neuropsychiatr 23:12–19. doi:10.1111/j.1601-5215.2010.00514.x pmid:25379692
    OpenUrlCrossRefPubMed
  15. ↵
    1. Flint J,
    2. Munafò MR
    (2013) Candidate and non-candidate genes in behavior genetics. Curr Opin Neurobiol 23:57–61. doi:10.1016/j.conb.2012.07.005 pmid:22878161
    OpenUrlCrossRefPubMed
  16. ↵
    1. Francis G
    (2012) Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychon Bull Rev 19:151–156. doi:10.3758/s13423-012-0227-9 pmid:22351589
    OpenUrlCrossRefPubMed
  17. ↵
    1. Friston K
    (2012) Ten ironic rules for non-statistical reviewers. Neuroimage 61:1300–1310. doi:10.1016/j.neuroimage.2012.04.018 pmid:22521475
    OpenUrlCrossRefPubMed
  18. ↵
    1. Fryar C,
    2. Gu Q,
    3. Ogden
    (2012) Anthropometric Reference Data for Children and Adults: United States, 2007–2010. Washington, D.C.: U.S. Department of Health and Human Services.
  19. ↵
    1. Gershman SJ,
    2. Blei DM
    (2012) A tutorial on Bayesian nonparametric models. J Math Psychol 56:1–12. doi:10.1016/j.jmp.2011.08.004
    OpenUrlCrossRef
    1. Green MJ,
    2. Matheson SL,
    3. Shepherd A,
    4. Weickert CS,
    5. Carr VJ
    (2011) Brain-derived neurotrophic factor levels in schizophrenia: a systematic review with meta-analysis. Mol Psychiatry 16:960–972. doi:10.1038/mp.2010.88 pmid:20733577
    OpenUrlCrossRefPubMed
    1. Han XM,
    2. Wang CH,
    3. Sima X,
    4. Liu SY
    (2011) Interleukin-6-174G/C polymorphism and the risk of Alzheimer's disease in Caucasians: a meta-analysis. Neurosci Lett 504:4–8. doi:10.1016/j.neulet.2011.06.055 pmid:21767605
    OpenUrlCrossRefPubMed
    1. Hannestad J,
    2. DellaGioia N,
    3. Bloch M
    (2011) The effect of antidepressant medication treatment on serum levels of inflammatory cytokines: a meta-analysis. Neuropsychopharmacology 36:2452–2459. doi:10.1038/npp.2011.132 pmid:21796103
    OpenUrlCrossRefPubMed
    1. Hua Y,
    2. Zhao H,
    3. Kong Y,
    4. Ye M
    (2011) Association between the MTHFR gene and Alzheimer's disease: a meta-analysis. Int J Neurosci 121:462–471. doi:10.3109/00207454.2011.578778 pmid:21663380
    OpenUrlCrossRefPubMed
    1. Hyndman RJ
    (1996) Computing and graphing highest density regions. Am Stat 50:120–126.
    OpenUrlCrossRef
  20. ↵
    1. Ioannidis JP
    (2005) Why most published research findings are false. PLoS Med 2:e124. doi:10.1371/journal.pmed.0020124 pmid:16060722
    OpenUrlCrossRefPubMed
  21. ↵
    1. Kass RE,
    2. Raftery AE
    (1995) Bayes factors. J Am Stat Assoc 90:773–795. doi:10.1080/01621459.1995.10476572
    OpenUrlCrossRefPubMed
  22. ↵
    1. Klerk M,
    2. Verhoef P,
    3. Clarke R,
    4. Blom HJ,
    5. Kok FJ,
    6. Schouten EG
    (2002) MTHFR 677C→ T polymorphism and risk of coronary heart disease: a meta-analysis. JAMA 288:2023–2031. pmid:12387655
    OpenUrlCrossRefPubMed
  23. ↵
    1. Kurihara K,
    2. Welling M,
    3. Teh YW
    (2007) Collapsed variational Dirichlet process mixture models. In: IJCAI'07 Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp 2796–2801. Morgan Kaufmann Publishers Inc.
  24. ↵
    1. Lenth RV
    (2001) Some practical guidelines for effective sample size determination. Am Stat 55:187–193. doi:10.1198/000313001317098149
    OpenUrlCrossRef
    1. Lindson N,
    2. Aveyard P
    (2011) An updated meta-analysis of nicotine preloading for smoking cessation: investigating mediators of the effect. Psychopharmacology (Berl) 214:579–592. doi:10.1007/s00213-010-2069-3 pmid:21060996
    OpenUrlCrossRefPubMed
    1. Liu H,
    2. Liu M,
    3. Wang Y,
    4. Wang XM,
    5. Qiu Y,
    6. Long JF,
    7. Zhang SP
    (2011a) Association of 5-HTT gene polymorphisms with migraine: a systematic review and meta-analysis. J Neurol Sci 305:57–66. doi:10.1016/j.jns.2011.03.016 pmid:21450309
    OpenUrlCrossRefPubMed
    1. Liu J,
    2. Sun QY,
    3. Tang BS,
    4. Hu L,
    5. Yu RH,
    6. Wang L,
    7. Shi CH,
    8. Yan XX,
    9. Pan Q,
    10. Xia K,
    11. Guo JF
    (2011b) PITX3 gene polymorphism is associated with Parkinson's disease in Chinese population. Brain Res 1392:116–120. doi:10.1016/j.brainres.2011.03.064 pmid:21524731
    OpenUrlCrossRefPubMed
  25. ↵
    1. Lubke GH,
    2. Muthén B
    (2005) Investigating population heterogeneity with factor mixture models. Psychol Methods 10:21–39. doi:10.1037/1082-989X.10.1.21 pmid:15810867
    OpenUrlCrossRefPubMed
    1. MacKillop J,
    2. Amlung MT,
    3. Few LR,
    4. Ray LA,
    5. Sweet LH,
    6. Munafò MR
    (2011) Delayed reward discounting and addictive behavior: a meta-analysis. Psychopharmacology (Berl) 216:305–321. doi:10.1007/s00213-011-2229-0 pmid:21373791
    OpenUrlCrossRefPubMed
    1. Maneeton N,
    2. Maneeton B,
    3. Srisurapanont M,
    4. Martin SD
    (2011) Bupropion for adults with attention-deficit hyperactivity disorder: Meta-analysis of randomized, placebo-controlled trials. Psychiatry Clin Neurosci 65:611–617. doi:10.1111/j.1440-1819.2011.02264.x pmid:22176279
    OpenUrlCrossRefPubMed
  26. ↵
    1. Murphy KP
    (2012) Machine learning: a probabilistic perspective. Cambridge, MA: MIT.
  27. ↵
    1. Nosek BA,
    2. Spies JR,
    3. Motyl M
    (2012) Scientific utopia II. Restructuring incentives and practices to promote truth over publishability. Perspect Psychol Sci 7:615–631. doi:10.1177/1745691612459058 pmid:26168121
    OpenUrlCrossRefPubMed
    1. Ohi K,
    2. Hashimoto R,
    3. Yasuda Y,
    4. Fukumoto M,
    5. Yamamori H,
    6. Umeda-Yano S,
    7. Kamino K,
    8. Ikezawa K,
    9. Azechi M,
    10. Iwase M,
    11. Kazui H,
    12. Kasai K,
    13. Takeda M
    (2011) The SIGMAR1 gene is associated with a risk of schizophrenia and activation of the prefrontal cortex. Prog Neuropsychopharmacol Biol Psychiatry 35:1309–1315. doi:10.1016/j.pnpbp.2011.04.008 pmid:21549171
    OpenUrlCrossRefPubMed
  28. ↵
    1. O'Keefe DJ
    (2007) Brief report: post hoc power, observed power, a priori power, retrospective power, prospective power, achieved power: sorting out appropriate uses of statistical power analyses. Commun Methods Meas 1:291–299. doi:10.1080/19312450701641375
    OpenUrlCrossRef
    1. Olabi B,
    2. Ellison-Wright I,
    3. McIntosh AM,
    4. Wood SJ,
    5. Bullmore E,
    6. Lawrie SM
    (2011) Are there progressive brain changes in schizophrenia? A meta-analysis of structural magnetic resonance imaging studies. Biol Psychiatry 70:88–96. doi:10.1016/j.biopsych.2011.01.032 pmid:21457946
    OpenUrlCrossRefPubMed
    1. Oldershaw A,
    2. Hambrook D,
    3. Stahl D,
    4. Tchanturia K,
    5. Treasure J,
    6. Schmidt U
    (2011) The socio-emotional processing stream in anorexia nervosa. Neurosci Biobehav Rev 35:970–988. doi:10.1016/j.neubiorev.2010.11.001 pmid:21070808
    OpenUrlCrossRefPubMed
    1. Oliver BJ,
    2. Kohli E,
    3. Kasper LH
    (2011) Interferon therapy in relapsing-remitting multiple sclerosis: a systematic review and meta-analysis of the comparative trials. J Neurol Sci 302:96–105. doi:10.1016/j.jns.2010.11.003 pmid:21167504
    OpenUrlCrossRefPubMed
  29. ↵
    Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349:aac4716. doi:10.1126/science.aac4716 pmid:26315443
    OpenUrlAbstract/FREE Full Text
    1. Peerbooms OL,
    2. van Os J,
    3. Drukker M,
    4. Kenis G,
    5. Hoogveld L,
    6. de Hert M,
    7. Delespaul P,
    8. van Winkel R,
    9. Rutten BP
    (2011) Meta-analysis of MTHFR gene variants in schizophrenia, bipolar disorder and unipolar depressive disorder: evidence for a common genetic vulnerability? Brain Behav Immun 25:1530–1543. doi:10.1016/j.bbi.2010.12.006 pmid:21185933
    OpenUrlCrossRefPubMed
    1. Pizzagalli DA
    (2011) Frontocingulate dysfunction in depression: toward biomarkers of treatment response. Neuropsychopharmacology 36:183–206. doi:10.1038/npp.2010.166 pmid:20861828
    OpenUrlCrossRefPubMed
  30. ↵
    1. Quinlan PT
    (2013) Misuse of power: in defence of small-scale science. Nat Rev Neurosci 14:585. doi:10.1038/nrn3475-c1 pmid:23820772
    OpenUrlCrossRefPubMed
    1. Rist PM,
    2. Diener HC,
    3. Kurth T,
    4. Schürks M
    (2011) Migraine, migraine aura, and cervical artery dissection: a systematic review and meta-analysis. Cephalalgia 31:886–896. doi:10.1177/0333102411401634 pmid:21511950
    OpenUrlCrossRefPubMed
    1. Samworth R,
    2. Wand M
    (2010) Asymptotics and optimal bandwidth selection for highest density region estimation. Ann Stat 38:1767–1792. doi:10.1214/09-AOS766
    OpenUrlCrossRef
  31. ↵
    1. Sena ES,
    2. van der Worp HB,
    3. Bath PM,
    4. Howells DW,
    5. Macleod MR
    (2010) Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol 8:e1000344. doi:10.1371/journal.pbio.1000344 pmid:20361022
    OpenUrlCrossRefPubMed
    1. Sexton CE,
    2. Kalu UG,
    3. Filippini N,
    4. Mackay CE,
    5. Ebmeier KP
    (2011) A meta-analysis of diffusion tensor imaging in mild cognitive impairment and Alzheimer's disease. Neurobiol Aging 32:2322.e5–e18. doi:10.1016/j.neurobiolaging.2010.05.019 pmid:20619504
    OpenUrlCrossRefPubMed
    1. Shum D,
    2. Levin H,
    3. Chan RC
    (2011) Prospective memory in patients with closed head injury: a review. Neuropsychologia 49:2156–2165. doi:10.1016/j.neuropsychologia.2011.02.006 pmid:21315750
    OpenUrlCrossRefPubMed
    1. Sim H,
    2. Shin BC,
    3. Lee MS,
    4. Jung A,
    5. Lee H,
    6. Ernst E
    (2011) Acupuncture for carpal tunnel syndrome: a systematic review of randomized controlled trials. J Pain 12:307–314. doi:10.1016/j.jpain.2010.08.006 pmid:21093382
    OpenUrlCrossRefPubMed
    1. Song F,
    2. Poljak A,
    3. Valenzuela M,
    4. Mayeux R,
    5. Smythe GA,
    6. Sachdev PS
    (2011) Meta-analysis of plasma amyloid-β levels in Alzheimer's disease. J Alzheimers Dis 26:365–375. doi:10.3233/JAD-2011-101977 pmid:21709378
    OpenUrlCrossRefPubMed
    1. Sun Q,
    2. Fu Y,
    3. Sun A,
    4. Shou Y,
    5. Zheng M,
    6. Li X,
    7. Fan D
    (2011) Correlation of E-selectin gene polymorphisms with risk of ischemic stroke: a meta-analysis. Neural Regen Res 6.
    1. Tian Y,
    2. Kang L,
    3. Wang H,
    4. Liu Z
    (2011) Meta-analysis of transcranial magnetic stimulation to treat post-stroke dysfunction. Neural Regen Res 6.
    1. Trzesniak C,
    2. Kempton MJ,
    3. Busatto GF,
    4. de Oliveira IR,
    5. Galvão-de Almeida A,
    6. Kambeitz J,
    7. Ferrari MC,
    8. Filho AS,
    9. Chagas MH,
    10. Zuardi AW,
    11. Hallak JE,
    12. McGuire PK,
    13. Crippa JA
    (2011) Adhesio interthalamica alterations in schizophrenia spectrum disorders: A systematic review and meta-analysis. Prog Neuropsychopharmacol Biol Psychiatry 35:877–886. doi:10.1016/j.pnpbp.2010.12.024 pmid:21300129
    OpenUrlCrossRefPubMed
    1. Veehof MM,
    2. Oskam MJ,
    3. Schreurs KM,
    4. Bohlmeijer ET
    (2011) Acceptance-based interventions for the treatment of chronic pain: a systematic review and meta-analysis. Pain 152:533–542. doi:10.1016/j.pain.2010.11.002 pmid:21251756
    OpenUrlCrossRefPubMed
    1. Vergouwen MD,
    2. Etminan N,
    3. Ilodigwe D,
    4. Macdonald RL
    (2011) Lower incidence of cerebral infarction correlates with improved functional outcome after aneurysmal subarachnoid hemorrhage. J Cereb Blood Flow Metab 31:1545–1553. doi:10.1038/jcbfm.2011.56 pmid:21505477
    OpenUrlCrossRefPubMed
    1. Vieta E,
    2. Günther O,
    3. Locklear J,
    4. Ekman M,
    5. Miltenburger C,
    6. Chatterton ML,
    7. Åström M,
    8. Paulsson B
    (2011) Effectiveness of psychotropic medications in the maintenance phase of bipolar disorder: a meta-analysis of randomized controlled trials. Int J Neuropsychopharmacol 14:1029–1049. doi:10.1017/S1461145711000885 pmid:21733231
    OpenUrlCrossRefPubMed
    1. Wand MP,
    2. Marron JS,
    3. Ruppert D
    (1991) Transformations in density estimation. J Am Stat Assoc 86:343–353. doi:10.1080/01621459.1991.10475041
    OpenUrlCrossRef
    1. Wisdom NM,
    2. Callahan JL,
    3. Hawkins KA
    (2011) The effects of apolipoprotein E on non-impaired cognitive functioning: a meta-analysis. Neurobiol Aging 32:63–74. doi:10.1016/j.neurobiolaging.2009.02.003 pmid:19285755
    OpenUrlCrossRefPubMed
    1. Witteman J,
    2. van Ijzendoorn MH,
    3. van de Velde D,
    4. van Heuven VJ,
    5. Schiller NO
    (2011) The nature of hemispheric specialization for linguistic and emotional prosodic perception: a meta-analysis of the lesion literature. Neuropsychologia 49:3722–3738. doi:10.1016/j.neuropsychologia.2011.09.028 pmid:21964199
    OpenUrlCrossRefPubMed
    1. Woon F,
    2. Hedges DW
    (2011) Gender does not moderate hippocampal volume deficits in adults with posttraumatic stress disorder: A meta-analysis. Hippocampus 21:243–252. doi:10.1002/hipo.20746 pmid:20882539
    OpenUrlCrossRefPubMed
  32. ↵
    1. Xuan C,
    2. Zhang BB,
    3. Li M,
    4. Deng KF,
    5. Yang T,
    6. Zhang XE
    (2011) No association between APOE epsilon 4 allele and multiple sclerosis susceptibility: a meta-analysis from 5472 cases and 4727 controls. J Neurol Sci 308:110–116. doi:10.1016/j.jns.2011.05.040 pmid:21679970
    OpenUrlCrossRefPubMed
    1. Yang W,
    2. Kong F,
    3. Liu M,
    4. Hao Z
    (2011a) Systematic review of risk factors for progressive ischemic stroke. Neural Regen Res 6:346–352.
    OpenUrl
    1. Yang Z,
    2. Li W,
    3. Huang T,
    4. Chen J,
    5. Zhang X
    (2011b) Meta-analysis of Ginkgo biloba extract for the treatment of Alzheimer's disease. Neural Regen Res 6:1125–1129.
    OpenUrl
    1. Yuan H,
    2. Yang X,
    3. Kang H,
    4. Cheng Y,
    5. Ren H,
    6. Wang X
    (2011) Meta-analysis of tau genetic polymorphism and sporadic progressive supranuclear palsy susceptibility. Neural Regen Res 6:353–359.
    OpenUrl
    1. Zafar SN,
    2. Iqbal A,
    3. Farez MF,
    4. Kamatkar S,
    5. de Moya MA
    (2011) Intensive insulin therapy in brain injury: a meta-analysis. J Neurotrauma 28:1307–1317. doi:10.1089/neu.2010.1724 pmid:21534731
    OpenUrlCrossRefPubMed
    1. Zhang Y,
    2. Zhang J,
    3. Tian C,
    4. Xiao Y,
    5. Li X,
    6. He C,
    7. Huang J,
    8. Fan H
    (2011) The 1082G/A polymorphism in IL-10 gene is associated with risk of Alzheimer's disease: a meta-analysis. J Neurol Sci 303:133–138. doi:10.1016/j.jns.2010.12.005 pmid:21255795
    OpenUrlCrossRefPubMed
    1. Zhu Y,
    2. He ZY,
    3. Liu HN
    (2011) Meta-analysis of the relationship between homocysteine, vitamin B 12, folate, and multiple sclerosis. J Clin Neurosci 18:933–938. doi:10.1016/j.jocn.2010.12.022 pmid:21570300
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 37 (34)
Journal of Neuroscience
Vol. 37, Issue 34
23 Aug 2017
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Advertising (PDF)
  • Ed Board (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling
Camilla L. Nord, Vincent Valton, John Wood, Jonathan P. Roiser
Journal of Neuroscience 23 August 2017, 37 (34) 8051-8061; DOI: 10.1523/JNEUROSCI.3592-16.2017

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling
Camilla L. Nord, Vincent Valton, John Wood, Jonathan P. Roiser
Journal of Neuroscience 23 August 2017, 37 (34) 8051-8061; DOI: 10.1523/JNEUROSCI.3592-16.2017
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Notes
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • neuroscience
  • power
  • statistics

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • The amyloid precursor protein modulates the position and length of the axon initial segment
  • Cortical Parvalbumin-Positive Interneuron Development and Function Are Altered in the APC Conditional Knockout Mouse Model of Infantile and Epileptic Spasms Syndrome
  • Post-synaptic NMDA Receptor Expression Is Required for Visual Corticocollicular Projection Refinement in the Mouse Superior Colliculus
Show more Research Articles

Behavioral/Cognitive

  • Blunted Expected Reward Value Signals in Binge Alcohol Drinkers
  • Disentangling Object Category Representations Driven by Dynamic and Static Visual Input
  • Irrelevant Threats Linger and Affect Behavior in High Anxiety
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Policy
  • Contact
(JNeurosci logo)
(SfN logo)

Copyright © 2023 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.