A statistic to estimate the variance of the histogram-based mutual information estimator based on dependent pairs of observations
Introduction
To estimate time-delays between recordings of electroencephalogram (EEG-) signals Mars and van Arragon introduced a method based on maximum mutual information [18]. Consider two stationary and ergodic stochastic processes x and y and the point measurements (samples) xn=x(nΔt) and yn=y(nΔt). To avoid confusion about the meaning of samples, which depend on the context (signal processing or statistics), we use the term point measurements instead. The mutual information , given an observed sequence of pairs of point measurements where 1⩽n⩽N, is estimated as a function of the lag m. Due to the stationarity of these processes this mutual information is independent of n. The lag for which the mutual information reaches a maximum is considered to be the delay of y with respect to x. Maximum mutual information is typically used in the alignment of 1D (time signals), 2D (images) or even 3D signals.
To validate the estimated lag, it is necessary to know the accuracy, e.g. the variance, of the mutual information estimates. In practical situations, e.g. the analysis of electroencephalogram (EEG-) signals, the estimation of this variance is complicated because in general the subsequent pairs of point measurements are not independent. Therefore, we need a statistic to estimate the variance of the mutual information estimator in case of dependent pairs of observations.
The concept of mutual information originates from the theory of communication. The mutual information measures the information of a random variable contained in another random variable. The definition of mutual information goes back to Shannon 28, 29, 30. The theory was extended and generalized by Gel'fand and Yaglom [6]. The mutual information of two discrete random variables x and y is defined bywhere , p·j=Pr{y=Yj} and pi·=Pr{x=Xi}. The outcomes Xi and Yj are the possible (discrete) values of x and y, e.g. x∈{X1,…,XI} and y∈{Y1,…,YJ}. The mutual information between continuous random variables can be estimated by the same technique after quantization of the random variables. We assume base e for the logarithm, consequently the unit of measurement is nat; base 2 (bit) and base 10 (Hartley) can also be used. The mutual information measures the dependence of x and y; if and only if x and y are independent, otherwise . The mutual information is independent of any transformation of the arguments having a unique inverse (bijection). The mutual information reaches a maximum if there is a bijective relation between x and y. We can express the mutual information using entropies:whereThere exist essentially three different methods to estimate mutual information:
- 1.
histogram-based estimators 20, 21,
- 2.
kernel-based estimators 18, 22 and
- 3.
parametric methods.
Mutual information is often used to replace the correlation coefficient. The mutual information does not depend on a linear relation of x and y. The possible outcomes of these random variables do not need to be numerical or ordered (words in language processing). Due to these properties mutual information can be applied as a correlator in many areas where the correlation coefficient fails. To compare mutual information estimates amongst each other it is necessary to know whether differences are significant or whether they are caused by the bias of the statistic or statistical fluctuations.
The histogram-based statistic to estimate mutual information suffers from
- 1.
variance,
- 2.
bias caused by the finite number of observations,
- 3.
bias caused by quantization, and
- 4.
bias caused by the finite histogram.
Mutual information is used in several areas of science: language processing 4, 14, 15, speech processing [24], image processing 8, 16, 17, 25, 32, analyses of non-linear systems 10, 11, 13, time delay estimation 11, 18, 20, 21, neural networks 1, 2 and bio-medical applications [27].
InSection 2we develop the theory to estimate the variance of the mutual information estimator in case of dependent pairs. It is necessary to avail of a reliable estimator for variance of the sample mean in case of dependentpoint measurements of a stationary stochastic signal. In Section 3, we evaluate three methods to estimate this variance. Finally, we verify our theory by simulations inSection 4.
Section snippets
The variance of the mutual information estimator
In this section we show that both the entropy estimator and the mutual information estimator are approximately sample means of stochastic signals, which can be derived from x and y. Applying the estimator for the variance of the sample mean (to be reviewed in Section 3) to these stochastic signals provides us with an estimator for the variance of the entropy estimator and an estimator for the variance of the mutual information estimator in case of dependent pairs of point measurements.
We
The variance of the sample mean
Our method to derive statistics to estimate the variance of the entropy and the mutual information estimator depends on the availability of reliable statistics to estimate the variance of a sample mean (see , ) in case of dependent point measurements. Therefore, we review in this section three statistics to estimate the variance of the sample mean.
Assume N point measurements un and vn, where 1⩽n⩽N, of the stationary stochastic processes u and v. In this section, we adopt the convention that the
Simulations
The variance of the random variable average log-likelihood ratio on the right-hand side of Eq. (16)can be estimated from experimental data with methods (a)–(c) of Section 3. This was done for 100 experiments; the results and the associated accuracies are given in Table 2. This table is the analogue of Table 1. The signal model used iswhere ε and μ are independent and are both normally distributed white noise. The mutual information is
Conclusions
We havereviewed three statistics to estimate the variance of the sample mean of dependent point measurements of a stationary stochastic signal. These statistics are based on: windowing of the correlation function, the jackknife method and AR-models. If the point measurements are not correlated strongly, all methods produce acceptable results. In our simulations the results using the AR-method are slightly better than the others.
These statistics to estimate the variance were used to derive
References (32)
- et al.
Time delay estimation in non-linear systems using average amount of mutual information analysis
Signal Processing
(1982) On estimation of entropy and mutual information of continuous distributions
Signal Processing
(1989)Using mutual information for selecting features in supervised neural net learning
IEEE Trans. Neural Networks
(1994)- B.V. Bonnlander, A.S. Weigend, Selecting input variables using mutual information and nonparametric density estimation,...
- S. Brandt, Statistical and Computational Methods in Data Analysis, 2nd ed., North-Holland, Amsterdam, 1976. ISBN...
Class based n-gram models of natural language
Comput. Linguistics
(1992)- B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability, Vol. 57,...
- et al.
Calculation of amount of information about a random function contained in another such function
Amer. Math. Soc. Trans.
(1959) On the use of windows for harmonic analysis with discrete Fourier transform
Proc. IEEE
(1978)- P. Hastreiter et al., Fast mutual information based registration and fusion of registered tomographic image data, in:...
Zur Bestimmung van totzeiten dynamischer nichtlinearer Übertragungssysteme mittels Entropiefunktionen
Automatisierungstechnik
Cited by (54)
A new method for automatically modelling brain functional networks
2018, Biomedical Signal Processing and ControlCitation Excerpt :The MI (see Ref. [8]. for detailed description and definition) is calculated by a software written by Moddemeijer [21]. In this study, we constructed brain functional network model with network motifs theories based on the physical distances and weights of the functional connectivities.
Encoding efficiency of suprathreshold stochastic resonance on stimulus-specific information
2016, Physics Letters, Section A: General, Atomic and Solid State PhysicsAn image selection method for tobacco leave grading based on image information
2015, Engineering in Agriculture, Environment and FoodCitation Excerpt :The MI measures the amount of information that is shared between two sets of variables. Compared with SI, MI is a metric derived from Shannon's information theory to estimate the information content gained from observations of one random variable on another (Fraser and Swinney, 1986; Moddemeijer, 1999), and also can describe the relevance between two random events (Pluim et al., 2001; Kuijper, 2004). SI and MI have been widely used in evaluation of image quality (Tsai et al., 2008; Indrajit and Java, 2012), image registration (Gao et al., 2008; Du and Li, 2010), feature selection (Boyan et al., 2008; Vinh et al., 2012), segmentation images (Tsai et al., 2004; Herbulot et al., 2006) and object tracking (Krotosky and Trivedi, 2007; Panin and Knoll, 2008).
Phase transfer entropy: A novel phase-based measure for directed connectivity in networks coupled by oscillatory interactions
2014, NeuroImageCitation Excerpt :Critically, there was no systematic overestimation of Phase TD-MI (Steuer et al., 2002) for low to moderate coupling values, reducing the risk of false positives. While binning estimators may not be appropriate for trial-based estimation of IT measures because of data scarcity (Moddemeijer, 1999), they seem to yield adequate entropy estimates in the case of collapsed trial analysis. Since the existing TE implementations are not suitable for periodic variables, we could not compare Phase TE estimations to completely rule out the possibility of a systematic bias in our Phase TE estimates.
Market Efficiency Dynamics and Chaotic Behavior of Dhaka Stock Exchange: Evidence from Mutual Information and Lyapunov Exponents Models
2021, Universal Journal of Accounting and Finance