FlashReport
Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median

https://doi.org/10.1016/j.jesp.2013.03.013Get rights and content

Abstract

A survey revealed that researchers still seem to encounter difficulties to cope with outliers. Detecting outliers by determining an interval spanning over the mean plus/minus three standard deviations remains a common practice. However, since both the mean and the standard deviation are particularly sensitive to outliers, this method is problematic. We highlight the disadvantages of this method and present the median absolute deviation, an alternative and more robust measure of dispersion that is easy to implement. We also explain the procedures for calculating this indicator in SPSS and R software.

Section snippets

The mean plus or minus three standard deviations

Notwithstanding the decision to remove, correct or leave an outlier (for a discussion on this topic see McClelland, 2000), it is necessary to be able to detect its presence. The method of the mean plus or minus three SD is based on the characteristics of a normal distribution for which 99.87% of the data appear within this range (Howell, 1998). Therefore, the decision that consists in removing the values that occur only in 0.13% of all cases does not seem too conservative. Other authors (e.g.,

An alternative: the median absolute deviation (MAD)

Absolute deviation from the median was (re-)discovered and popularized by Hampel (1974) who attributes the idea to Carl Friedrich Gauss (1777–1855). The median (M) is, like the mean, a measure of central tendency but offers the advantage of being very insensitive to the presence of outliers. One indicator of this insensitivity is the “breakdown point” (see, e.g., Donoho & Huber, 1983). The estimator's breakdown point is the maximum proportion of observations that can be contaminated (i.e., set

Procedure implemented in the statistical software SPSS and R

SPSS (statistical package for social sciences) is the software commonly used by many researchers in social sciences. The procedure for calculating the MAD is simple, we have to: (a) compute the median using the menu “Analysis” and the command “Frequency”; (b) subtract this value from all observations in the statistical series using the command “Compute” in the menu “Transform”; (c) compute the median of the resulting new variable as in the first point, and (d) multiply this value by 1.4826 (if

Discussion

Given the results of our survey of two journals, emphasizing a poor management of outliers, we showed that the method conventionally used (“The mean plus or minus three standard deviations” rule) is problematic and we argued in favor of a robust alternative. We have finally explained that, whatever the method selected, the decision-making concerning the exclusion criteria of outliers (a deviation of 3, 2.5 or 2 units) is necessarily subjective. This leads us to three important recommendations:

  • 1.

    In

References (11)

  • D. Cousineau et al.

    Outliers detection and treatment: A review

    International Journal of Psychological Research

    (2010)
  • D.L. Donoho et al.
  • F.R. Hampel

    The influence curve and its role in robust estimation

    Journal of the American Statistical Association

    (1974)
  • D.C. Howell

    Statistical methods in human sciences

    (1998)
  • P.J. Huber

    Robust statistics

    (1981)
There are more references available in the full text version of this article.

Cited by (2590)

View all citing articles on Scopus
1

Christophe Ley and Philippe Bernard thank the Fonds National de la Recherche Scientifique, Communauté Française de Belgique, for financial support via a Mandat de Chargé de Recherche FNRS.

View full text