Elsevier

Signal Processing

Volume 75, Issue 1, 5 January 1999, Pages 51-63
Signal Processing

A statistic to estimate the variance of the histogram-based mutual information estimator based on dependent pairs of observations

https://doi.org/10.1016/S0165-1684(98)00224-2Get rights and content

Abstract

In the case of two signals with independent pairs of observations (xn,yn) a statistic to estimate the variance of the histogram based mutual information estimator has been derived earlier. We present such a statistic for dependent pairs. To derive this statistic it is necessary to avail of a reliable statistic to estimate the variance of the sample mean in case of dependent observations. We derive and discuss this statistic and a statistic to estimate the variance of the mutual information estimator. These statistics are validated by simulations.

Zusammenfassung

Im Fall zweier Signale mit unabhängigen Paaren von Beobachtungen (xn,yn) wurde schon früher eine Statistik zur Schätzung der Varianz des histogramm-basierten Schätzers für die Transinformation (mutual information) abgeleitet. Wir stellen eine solche Statistik für abhängige Paare vor. Um diese Statistik abzuleiten, ist es erforderlich, auf eine zuverlässige Statistik zur Schätzung der Varianz des Mittelwertes für den Fall abhängiger Beobachtungen zurückzugreifen. Eine solche Statistik, sowie eine Statistik zur Schätzung der Varianz des Transinformations-Schätzers wird von uns abgeleitet und diskutiert. Die Statistiken werden durch Simulationsergebnisse bestätigt.

Résumé

Dans le cas de deux signaux avec des paires indépendantes d'observations (xn,yn), nous avons dérivé précédemment une statistique permettant d'estimer la variance de l'estimateur de l'information mutuelle basé sur l'histogramme. Nous présentons ici une telle statistique pour des paires dépendantes. Pour dériver cette statistique, il est nécessaire d'avoir une statistique fiable pour estimer la variance de la moyenne des échantillons dans le cas d'observations dépendantes. Nous dérivons et discutons cette statistique ainsi qu'une statistique pour estimer la variance de l'estimateur de l'information mutuelle. Ces statistiques sont validées par simulations.

Introduction

To estimate time-delays between recordings of electroencephalogram (EEG-) signals Mars and van Arragon introduced a method based on maximum mutual information [18]. Consider two stationary and ergodic stochastic processes x and y and the point measurements (samples) xn=x(nΔt) and yn=y(nΔt). To avoid confusion about the meaning of samples, which depend on the context (signal processing or statistics), we use the term point measurements instead. The mutual information I{xn;yn+m}, given an observed sequence of pairs of point measurements (xn,yn) where 1⩽nN, is estimated as a function of the lag m. Due to the stationarity of these processes this mutual information is independent of n. The lag for which the mutual information reaches a maximum is considered to be the delay of y with respect to x. Maximum mutual information is typically used in the alignment of 1D (time signals), 2D (images) or even 3D signals.

To validate the estimated lag, it is necessary to know the accuracy, e.g. the variance, of the mutual information estimates. In practical situations, e.g. the analysis of electroencephalogram (EEG-) signals, the estimation of this variance is complicated because in general the subsequent pairs of point measurements (xn,yn+m) are not independent. Therefore, we need a statistic to estimate the variance of the mutual information estimator in case of dependent pairs of observations.

The concept of mutual information originates from the theory of communication. The mutual information measures the information of a random variable contained in another random variable. The definition of mutual information goes back to Shannon 28, 29, 30. The theory was extended and generalized by Gel'fand and Yaglom [6]. The mutual information of two discrete random variables x and y is defined byI{x;y}=i,jpijlogpijp·jpi·,where pij=Pr{x=Xi,y=Yj}, p·j=Pr{y=Yj} and pi·=Pr{x=Xi}. The outcomes Xi and Yj are the possible (discrete) values of x and y, e.g. x∈{X1,…,XI} and y∈{Y1,…,YJ}. The mutual information between continuous random variables can be estimated by the same technique after quantization of the random variables. We assume base e for the logarithm, consequently the unit of measurement is nat; base 2 (bit) and base 10 (Hartley) can also be used. The mutual information measures the dependence of x and y; I{x;y}=0 if and only if x and y are independent, otherwise I{x;y}>0. The mutual information is independent of any transformation of the arguments having a unique inverse (bijection). The mutual information reaches a maximum if there is a bijective relation between x and y. We can express the mutual information using entropies:I{x;y}=H{x}+H{y}−H{x,y},whereH{x}=i−pi·logpi·,H{x,y}=i,j−pijlogpij.There exist essentially three different methods to estimate mutual information:

  • 1.

    histogram-based estimators 20, 21,

  • 2.

    kernel-based estimators 18, 22 and

  • 3.

    parametric methods.

For the application of parametric methods a model of the stochastic processes is necessary. Both the histogram- and the kernel-based estimator can be used as a general-purpose tool. Because of its relative simplicity the properties of the histogram-based estimator, such as bias, variance (for independent pairs of observations) and robustness, are fairly well known 20, 21. Within this article, we will construct a statistic to estimate the variance which can be applied to sequences of dependent pairs of observations. The kernel-based estimators have too many adjustable parameters such as the optimal kernel width and the optimal kernel form. Properties like bias and variance are still unknown. Especially in case of the iterative determination of the optimal kernel form as proposed by Mars and van Arragon [18] the method is computationally intensive.

Mutual information is often used to replace the correlation coefficient. The mutual information does not depend on a linear relation of x and y. The possible outcomes of these random variables do not need to be numerical or ordered (words in language processing). Due to these properties mutual information can be applied as a correlator in many areas where the correlation coefficient fails. To compare mutual information estimates amongst each other it is necessary to know whether differences are significant or whether they are caused by the bias of the statistic or statistical fluctuations.

The histogram-based statistic to estimate mutual information suffers from

  • 1.

    variance,

  • 2.

    bias caused by the finite number of observations,

  • 3.

    bias caused by quantization, and

  • 4.

    bias caused by the finite histogram.

The relative contribution of these causes depends on the application; especially it depends on the number of observations, the configuration of the histogram cells and the smoothness of the probability density function. The last two causes are independent of the number of observation and only relevant in case of continuous random variables. In case of independent pairs of observations these error causes were studied in earlier work 20, 21. Both the variance and the bias caused by the finite number of observations depend on the number of observed pairs and the dependencies amongst those pairs. An estimator for the variance in case of dependent pairs will be presented. An estimator for the bias due to the finite number of observations in case of dependent pairs of observations is unknown and is beyond the scope of this article. This bias will probably in first-order approximation only depend on the number of histogram cells and the interdependence of the pairs (xn,yn) and not on the joint distribution of xn and yn.

Mutual information is used in several areas of science: language processing 4, 14, 15, speech processing [24], image processing 8, 16, 17, 25, 32, analyses of non-linear systems 10, 11, 13, time delay estimation 11, 18, 20, 21, neural networks 1, 2 and bio-medical applications [27].

InSection 2we develop the theory to estimate the variance of the mutual information estimator in case of dependent pairs. It is necessary to avail of a reliable estimator for variance of the sample mean in case of dependentpoint measurements of a stationary stochastic signal. In Section 3, we evaluate three methods to estimate this variance. Finally, we verify our theory by simulations inSection 4.

Section snippets

The variance of the mutual information estimator

In this section we show that both the entropy estimator and the mutual information estimator are approximately sample means of stochastic signals, which can be derived from x and y. Applying the estimator for the variance of the sample mean (to be reviewed in Section 3) to these stochastic signals provides us with an estimator for the variance of the entropy estimator and an estimator for the variance of the mutual information estimator in case of dependent pairs of point measurements.

We

The variance of the sample mean

Our method to derive statistics to estimate the variance of the entropy and the mutual information estimator depends on the availability of reliable statistics to estimate the variance of a sample mean (see , ) in case of dependent point measurements. Therefore, we review in this section three statistics to estimate the variance of the sample mean.

Assume N point measurements un and vn, where 1⩽nN, of the stationary stochastic processes u and v. In this section, we adopt the convention that the

Simulations

The variance of the random variable average log-likelihood ratio on the right-hand side of Eq. (16)can be estimated from experimental data with methods (a)–(c) of Section 3. This was done for 100 experiments; the results and the associated accuracies are given in Table 2. This table is the analogue of Table 1. The signal model used isxnxn−1+1−λ2εn,ynxn+1−ρ2νn,νn=λνn−1+1−λ2μn,where ε and μ are independent and are both normally distributed white noise. The mutual information isI{xn;yn}=−12log

Conclusions

We havereviewed three statistics to estimate the variance of the sample mean of dependent point measurements of a stationary stochastic signal. These statistics are based on: windowing of the correlation function, the jackknife method and AR-models. If the point measurements are not correlated strongly, all methods produce acceptable results. In our simulations the results using the AR-method are slightly better than the others.

These statistics to estimate the variance were used to derive

References (32)

  • N.J.I. Mars et al.

    Time delay estimation in non-linear systems using average amount of mutual information analysis

    Signal Processing

    (1982)
  • R. Moddemeijer

    On estimation of entropy and mutual information of continuous distributions

    Signal Processing

    (1989)
  • R. Battini

    Using mutual information for selecting features in supervised neural net learning

    IEEE Trans. Neural Networks

    (1994)
  • B.V. Bonnlander, A.S. Weigend, Selecting input variables using mutual information and nonparametric density estimation,...
  • S. Brandt, Statistical and Computational Methods in Data Analysis, 2nd ed., North-Holland, Amsterdam, 1976. ISBN...
  • P.F. Brown

    Class based n-gram models of natural language

    Comput. Linguistics

    (1992)
  • B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability, Vol. 57,...
  • I.M. Gel'fand et al.

    Calculation of amount of information about a random function contained in another such function

    Amer. Math. Soc. Trans.

    (1959)
  • F.J. Haris

    On the use of windows for harmonic analysis with discrete Fourier transform

    Proc. IEEE

    (1978)
  • P. Hastreiter et al., Fast mutual information based registration and fusion of registered tomographic image data, in:...
  • S. Haykin, S. Kesler, Prediction-error filtering and maximum entropy estimation, in: S. Haykin (Ed.), Nonlinear Methods...
  • K. Henning, Die Entropie in der Systemtheorie, Technical Report, Rheinisch-Westfälischen Technische Hochschule, Aachen,...
  • K. Henning et al.

    Zur Bestimmung van totzeiten dynamischer nichtlinearer Übertragungssysteme mittels Entropiefunktionen

    Automatisierungstechnik

    (1986)
  • G.M. Jenkins, D.G.H.E. Watts, Spectral Analysis and its Applications, Holden-Day, San Francisco, CA,...
  • S. Kutscha, Statistischer Bewertungskriterien für die Entropieanalyse dynamischer Systeme, Ph.D Thesis,...
  • M.M. Lankhorst, R. Moddemeijer, Automatic word categorization: an information-theoretic approach, Technical Report,...
  • Cited by (54)

    • A new method for automatically modelling brain functional networks

      2018, Biomedical Signal Processing and Control
      Citation Excerpt :

      The MI (see Ref. [8]. for detailed description and definition) is calculated by a software written by Moddemeijer [21]. In this study, we constructed brain functional network model with network motifs theories based on the physical distances and weights of the functional connectivities.

    • Encoding efficiency of suprathreshold stochastic resonance on stimulus-specific information

      2016, Physics Letters, Section A: General, Atomic and Solid State Physics
    • An image selection method for tobacco leave grading based on image information

      2015, Engineering in Agriculture, Environment and Food
      Citation Excerpt :

      The MI measures the amount of information that is shared between two sets of variables. Compared with SI, MI is a metric derived from Shannon's information theory to estimate the information content gained from observations of one random variable on another (Fraser and Swinney, 1986; Moddemeijer, 1999), and also can describe the relevance between two random events (Pluim et al., 2001; Kuijper, 2004). SI and MI have been widely used in evaluation of image quality (Tsai et al., 2008; Indrajit and Java, 2012), image registration (Gao et al., 2008; Du and Li, 2010), feature selection (Boyan et al., 2008; Vinh et al., 2012), segmentation images (Tsai et al., 2004; Herbulot et al., 2006) and object tracking (Krotosky and Trivedi, 2007; Panin and Knoll, 2008).

    • Phase transfer entropy: A novel phase-based measure for directed connectivity in networks coupled by oscillatory interactions

      2014, NeuroImage
      Citation Excerpt :

      Critically, there was no systematic overestimation of Phase TD-MI (Steuer et al., 2002) for low to moderate coupling values, reducing the risk of false positives. While binning estimators may not be appropriate for trial-based estimation of IT measures because of data scarcity (Moddemeijer, 1999), they seem to yield adequate entropy estimates in the case of collapsed trial analysis. Since the existing TE implementations are not suitable for periodic variables, we could not compare Phase TE estimations to completely rule out the possibility of a systematic bias in our Phase TE estimates.

    View all citing articles on Scopus
    View full text