A statistic to estimate the variance of the histogram-based mutual information estimator based on dependent pairs of observations

doi:10.1016/S0165-1684(98)00224-2

Signal Processing

Volume 75, Issue 1, 5 January 1999, Pages 51-63

https://doi.org/10.1016/S0165-1684(98)00224-2 Get rights and content

Abstract

In the case of two signals with independent pairs of observations (x_n,y_n) a statistic to estimate the variance of the histogram based mutual information estimator has been derived earlier. We present such a statistic for dependent pairs. To derive this statistic it is necessary to avail of a reliable statistic to estimate the variance of the sample mean in case of dependent observations. We derive and discuss this statistic and a statistic to estimate the variance of the mutual information estimator. These statistics are validated by simulations.

Zusammenfassung

Im Fall zweier Signale mit unabhängigen Paaren von Beobachtungen (x_n,y_n) wurde schon früher eine Statistik zur Schätzung der Varianz des histogramm-basierten Schätzers für die Transinformation (mutual information) abgeleitet. Wir stellen eine solche Statistik für abhängige Paare vor. Um diese Statistik abzuleiten, ist es erforderlich, auf eine zuverlässige Statistik zur Schätzung der Varianz des Mittelwertes für den Fall abhängiger Beobachtungen zurückzugreifen. Eine solche Statistik, sowie eine Statistik zur Schätzung der Varianz des Transinformations-Schätzers wird von uns abgeleitet und diskutiert. Die Statistiken werden durch Simulationsergebnisse bestätigt.

Résumé

Dans le cas de deux signaux avec des paires indépendantes d'observations (x_n,y_n), nous avons dérivé précédemment une statistique permettant d'estimer la variance de l'estimateur de l'information mutuelle basé sur l'histogramme. Nous présentons ici une telle statistique pour des paires dépendantes. Pour dériver cette statistique, il est nécessaire d'avoir une statistique fiable pour estimer la variance de la moyenne des échantillons dans le cas d'observations dépendantes. Nous dérivons et discutons cette statistique ainsi qu'une statistique pour estimer la variance de l'estimateur de l'information mutuelle. Ces statistiques sont validées par simulations.

Introduction

To estimate time-delays between recordings of electroencephalogram (EEG-) signals Mars and van Arragon introduced a method based on maximum mutual information [18]. Consider two stationary and ergodic stochastic processes x and y and the point measurements (samples) x_n=x(nΔt) and y_n=y(nΔt). To avoid confusion about the meaning of samples, which depend on the context (signal processing or statistics), we use the term point measurements instead. The mutual information $I{x_{n}; y_{n+m}}$ , given an observed sequence of pairs of point measurements $(x_{n}, y_{n})$ where 1⩽n⩽N, is estimated as a function of the lag m. Due to the stationarity of these processes this mutual information is independent of n. The lag for which the mutual information reaches a maximum is considered to be the delay of y with respect to x. Maximum mutual information is typically used in the alignment of 1D (time signals), 2D (images) or even 3D signals.

To validate the estimated lag, it is necessary to know the accuracy, e.g. the variance, of the mutual information estimates. In practical situations, e.g. the analysis of electroencephalogram (EEG-) signals, the estimation of this variance is complicated because in general the subsequent pairs of point measurements $(x_{n}, y_{n+m})$ are not independent. Therefore, we need a statistic to estimate the variance of the mutual information estimator in case of dependent pairs of observations.

The concept of mutual information originates from the theory of communication. The mutual information measures the information of a random variable contained in another random variable. The definition of mutual information goes back to Shannon 28, 29, 30. The theory was extended and generalized by Gel'fand and Yaglom [6]. The mutual information of two discrete random variables x and y is defined by $I{x; y}= ∑ i,j p_{ij} log p_{ij} p_{·j} p_{i·},$ where $p_{ij} = Pr {x =X_{i}, y =Y_{j}}$ , p_·j=Pr{y=Y_j} and p_i·=Pr{x=X_i}. The outcomes X_i and Y_j are the possible (discrete) values of x and y, e.g. x∈{X₁,…,X_I} and y∈{Y₁,…,Y_J}. The mutual information between continuous random variables can be estimated by the same technique after quantization of the random variables. We assume base e for the logarithm, consequently the unit of measurement is nat; base 2 (bit) and base 10 (Hartley) can also be used. The mutual information measures the dependence of x and y; $I{x; y}=0$ if and only if x and y are independent, otherwise $I{x; y}>0$ . The mutual information is independent of any transformation of the arguments having a unique inverse (bijection). The mutual information reaches a maximum if there is a bijective relation between x and y. We can express the mutual information using entropies: $I{x; y}=H{x}+H{y}−H{x, y},$ where $H{x}= ∑ i −p_{i·} log p_{i·}, H{x, y}= ∑ i,j −p_{ij} log p_{ij} .$ There exist essentially three different methods to estimate mutual information:

1.
histogram-based estimators 20, 21,
2.
kernel-based estimators 18, 22 and
3.
parametric methods.

For the application of parametric methods a model of the stochastic processes is necessary. Both the histogram- and the kernel-based estimator can be used as a general-purpose tool. Because of its relative simplicity the properties of the histogram-based estimator, such as bias, variance (for independent pairs of observations) and robustness, are fairly well known 20, 21. Within this article, we will construct a statistic to estimate the variance which can be applied to sequences of dependent pairs of observations. The kernel-based estimators have too many adjustable parameters such as the optimal kernel width and the optimal kernel form. Properties like bias and variance are still unknown. Especially in case of the iterative determination of the optimal kernel form as proposed by Mars and van Arragon [18] the method is computationally intensive.

Mutual information is often used to replace the correlation coefficient. The mutual information does not depend on a linear relation of x and y. The possible outcomes of these random variables do not need to be numerical or ordered (words in language processing). Due to these properties mutual information can be applied as a correlator in many areas where the correlation coefficient fails. To compare mutual information estimates amongst each other it is necessary to know whether differences are significant or whether they are caused by the bias of the statistic or statistical fluctuations.

The histogram-based statistic to estimate mutual information suffers from

1.
variance,
2.
bias caused by the finite number of observations,
3.
bias caused by quantization, and
4.
bias caused by the finite histogram.

The relative contribution of these causes depends on the application; especially it depends on the number of observations, the configuration of the histogram cells and the smoothness of the probability density function. The last two causes are independent of the number of observation and only relevant in case of continuous random variables. In case of independent pairs of observations these error causes were studied in earlier work 20, 21. Both the variance and the bias caused by the finite number of observations depend on the number of observed pairs and the dependencies amongst those pairs. An estimator for the variance in case of dependent pairs will be presented. An estimator for the bias due to the finite number of observations in case of dependent pairs of observations is unknown and is beyond the scope of this article. This bias will probably in first-order approximation only depend on the number of histogram cells and the interdependence of the pairs

(x_{n}, y_{n})

and not on the joint distribution of x_n and y_n.

Mutual information is used in several areas of science: language processing 4, 14, 15, speech processing [24], image processing 8, 16, 17, 25, 32, analyses of non-linear systems 10, 11, 13, time delay estimation 11, 18, 20, 21, neural networks 1, 2 and bio-medical applications [27].

InSection 2we develop the theory to estimate the variance of the mutual information estimator in case of dependent pairs. It is necessary to avail of a reliable estimator for variance of the sample mean in case of dependentpoint measurements of a stationary stochastic signal. In Section 3, we evaluate three methods to estimate this variance. Finally, we verify our theory by simulations inSection 4.

Section snippets

The variance of the mutual information estimator

In this section we show that both the entropy estimator and the mutual information estimator are approximately sample means of stochastic signals, which can be derived from x and y. Applying the estimator for the variance of the sample mean (to be reviewed in Section 3) to these stochastic signals provides us with an estimator for the variance of the entropy estimator and an estimator for the variance of the mutual information estimator in case of dependent pairs of point measurements.

The variance of the sample mean

Our method to derive statistics to estimate the variance of the entropy and the mutual information estimator depends on the availability of reliable statistics to estimate the variance of a sample mean (see , ) in case of dependent point measurements. Therefore, we review in this section three statistics to estimate the variance of the sample mean.

Assume N point measurements u_n and v_n, where 1⩽n⩽N, of the stationary stochastic processes u and v. In this section, we adopt the convention that the

Simulations

The variance of the random variable average log-likelihood ratio on the right-hand side of Eq. (16)can be estimated from experimental data with methods (a)–(c) of Section 3. This was done for 100 experiments; the results and the associated accuracies are given in Table 2. This table is the analogue of Table 1. The signal model used is $x_{n} =λ x_{n−1} + 1−λ^{2} ε_{n},$ $y_{n} =ρ x_{n} + 1−ρ^{2} ν_{n},$ $ν_{n} =λν_{n−1} + 1−λ^{2} μ_{n},$ where ε and μ are independent and are both normally distributed white noise. The mutual information is $I{x_{n}; y_{n}}=− 12 log$

Conclusions

We havereviewed three statistics to estimate the variance of the sample mean of dependent point measurements of a stationary stochastic signal. These statistics are based on: windowing of the correlation function, the jackknife method and AR-models. If the point measurements are not correlated strongly, all methods produce acceptable results. In our simulations the results using the AR-method are slightly better than the others.

These statistics to estimate the variance were used to derive

References (32)

N.J.I. Mars et al.
Time delay estimation in non-linear systems using average amount of mutual information analysis
Signal Processing
(1982)
R. Moddemeijer
On estimation of entropy and mutual information of continuous distributions
Signal Processing
(1989)
R. Battini
Using mutual information for selecting features in supervised neural net learning
IEEE Trans. Neural Networks
(1994)
B.V. Bonnlander, A.S. Weigend, Selecting input variables using mutual information and nonparametric density estimation,...
S. Brandt, Statistical and Computational Methods in Data Analysis, 2nd ed., North-Holland, Amsterdam, 1976. ISBN...
P.F. Brown
Class based n-gram models of natural language
Comput. Linguistics
(1992)
B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability, Vol. 57,...
I.M. Gel'fand et al.
Calculation of amount of information about a random function contained in another such function
Amer. Math. Soc. Trans.
(1959)
F.J. Haris
On the use of windows for harmonic analysis with discrete Fourier transform
Proc. IEEE
(1978)
P. Hastreiter et al., Fast mutual information based registration and fusion of registered tomographic image data, in:...

S. Haykin, S. Kesler, Prediction-error filtering and maximum entropy estimation, in: S. Haykin (Ed.), Nonlinear Methods...

K. Henning, Die Entropie in der Systemtheorie, Technical Report, Rheinisch-Westfälischen Technische Hochschule, Aachen,...

K. Henning et al.

Zur Bestimmung van totzeiten dynamischer nichtlinearer Übertragungssysteme mittels Entropiefunktionen

Automatisierungstechnik

(1986)

G.M. Jenkins, D.G.H.E. Watts, Spectral Analysis and its Applications, Holden-Day, San Francisco, CA,...

S. Kutscha, Statistischer Bewertungskriterien für die Entropieanalyse dynamischer Systeme, Ph.D Thesis,...

M.M. Lankhorst, R. Moddemeijer, Automatic word categorization: an information-theoretic approach, Technical Report,...

Cited by (54)

A new method for automatically modelling brain functional networks
2018, Biomedical Signal Processing and Control
Citation Excerpt :
The MI (see Ref. [8]. for detailed description and definition) is calculated by a software written by Moddemeijer [21]. In this study, we constructed brain functional network model with network motifs theories based on the physical distances and weights of the functional connectivities.
Traditional methods for constructing brain functional network often need to artificially set a certain threshold, which requires professional and technical personnel to do this work. In order to overcome this deficiency, this study proposed a new method that can automatically construct brain functional network from electroencephalogram (EEG) data, based on positional relations among the vertices and network motif theories. To verify this method, resting state and task state EEG data were converted into brain functional networks with both the new method and traditional methods to explore the discrepancies of network features. The results showed that the mean physical distance increased with the increasing of network edges, evidently suggesting that higher weights of the edges have shorter physical distances, which is the direct model foundation. Besides, consistent results of network features were obtained among these methods, especially in weighted networks, indicating that this new method had the same capacity in accurately characterizing network features compared with the traditional methods. Moreover, this new method can efficiently distinguish the networks that have big differences in the weights, if the network has higher weights, the corresponding network would have more edges, which is in line with one of the traditional methods that using a threshold of weight. We also applied this model in mental fatigue detection, and the results of network characteristics, which obtained from the model and traditional method, have the same variation tendency, approximate values, and similar statistical differences, demonstrating that the proposed model can replace the traditional methods in differentiating similar brain functional states. The new method have potential applications in real-time brain functional networks construction.
Encoding efficiency of suprathreshold stochastic resonance on stimulus-specific information
2016, Physics Letters, Section A: General, Atomic and Solid State Physics
In this paper, we evaluate the encoding efficiency of suprathreshold stochastic resonance (SSR) based on a local information-theoretic measure of stimulus-specific information (SSI), which is the average specific information of responses associated with a particular stimulus. The theoretical and numerical analyses of SSIs reveal that noise can improve neuronal coding efficiency for a large population of neurons, which leads to produce increased information-rich responses. The SSI measure, in contrast to the global measure of average mutual information, can characterize the noise benefits in finer detail for describing the enhancement of neuronal encoding efficiency of a particular stimulus, which may be of general utility in the design and implementation of a SSR coding scheme.
An image selection method for tobacco leave grading based on image information
2015, Engineering in Agriculture, Environment and Food
Citation Excerpt :
The MI measures the amount of information that is shared between two sets of variables. Compared with SI, MI is a metric derived from Shannon's information theory to estimate the information content gained from observations of one random variable on another (Fraser and Swinney, 1986; Moddemeijer, 1999), and also can describe the relevance between two random events (Pluim et al., 2001; Kuijper, 2004). SI and MI have been widely used in evaluation of image quality (Tsai et al., 2008; Indrajit and Java, 2012), image registration (Gao et al., 2008; Du and Li, 2010), feature selection (Boyan et al., 2008; Vinh et al., 2012), segmentation images (Tsai et al., 2004; Herbulot et al., 2006) and object tracking (Krotosky and Trivedi, 2007; Panin and Knoll, 2008).
In order to select grading images of tobacco leaves, the reflecting light images of all tobacco leaf samples are gathered under the condition of seven color light sources. Then for the each color light, the self-information (SI) of each sample image is calculated, and the mean SI of all images corresponding to one grade of tobacco leaf samples (MSI) is also calculated. Furthermore, the mutual information (MI) between two sample images corresponding respectively two grades of tobacco leaf samples is calculated, and the mean MI between arbitrary two grades of tobacco leaves images (MMI) is also calculated. It is to be found that the ratio of the MMI to sum of the two MSI which correspond to two grades of samples can be employed to select the grading images, and the sample images obtained under the condition of the cyan light are selected for tobacco leaves grading. At the same time, 18 kinds of color features and 13 kinds of texture features are extracted from all sample images, and Fisher discriminant analysis (FDA) is used to distinguish these samples. The FDA results indicate that the grading accuracy based on cyan image is the highest. This is in accord with the above selection result. Therefore, the proposed selection method is effective and independent of features extracted from sample images.
Phase transfer entropy: A novel phase-based measure for directed connectivity in networks coupled by oscillatory interactions
2014, NeuroImage
Citation Excerpt :
Critically, there was no systematic overestimation of Phase TD-MI (Steuer et al., 2002) for low to moderate coupling values, reducing the risk of false positives. While binning estimators may not be appropriate for trial-based estimation of IT measures because of data scarcity (Moddemeijer, 1999), they seem to yield adequate entropy estimates in the case of collapsed trial analysis. Since the existing TE implementations are not suitable for periodic variables, we could not compare Phase TE estimations to completely rule out the possibility of a systematic bias in our Phase TE estimates.
We introduce here phase transfer entropy (Phase TE) as a measure of directed connectivity among neuronal oscillations. Phase TE quantifies the transfer entropy between phase time-series extracted from neuronal signals by filtering for instance. To validate the measure, we used coupled Neuronal Mass Models to both evaluate the characteristics of Phase TE and compare its performance with that of a real-valued TE implementation. We showed that Phase TE detects the strength and direction of connectivity even in the presence of such amounts of noise and linear mixing that typically characterize MEG and EEG recordings. Phase TE performed well across a wide range of analysis lags and sample sizes. Comparisons between Phase TE and real-valued TE estimates showed that Phase TE is more robust to nuisance parameters and considerably more efficient computationally. In addition, Phase TE accurately untangled bidirectional frequency band specific interaction patterns that confounded real-valued TE. Finally, we found that surrogate data can be used to construct appropriate null-hypothesis distributions and to estimate statistical significance of Phase TE. These results hence suggest that Phase TE is well suited for the estimation of directed phase-based connectivity in large-scale investigations of the human functional connectome.
Improving Mutual Information based Feature Selection by Boosting Unique Relevance
2022, arXiv
Market Efficiency Dynamics and Chaotic Behavior of Dhaka Stock Exchange: Evidence from Mutual Information and Lyapunov Exponents Models
2021, Universal Journal of Accounting and Finance

View all citing articles on Scopus

View full text

A statistic to estimate the variance of the histogram-based mutual information estimator based on dependent pairs of observations

Abstract

Zusammenfassung

Résumé

Introduction

Section snippets

The variance of the mutual information estimator

The variance of the sample mean

Simulations

Conclusions

Signal Processing

Signal Processing

Using mutual information for selecting features in supervised neural net learning

IEEE Trans. Neural Networks

Class based n-gram models of natural language

Comput. Linguistics

Calculation of amount of information about a random function contained in another such function

Amer. Math. Soc. Trans.

On the use of windows for harmonic analysis with discrete Fourier transform

Proc. IEEE

Zur Bestimmung van totzeiten dynamischer nichtlinearer Übertragungssysteme mittels Entropiefunktionen

Automatisierungstechnik