Estimating the errors on measured entropy and mutual information☆
Introduction
Information theoretic functionals such as entropy and the related quantity of mutual information can be used to identify general relationships between variables. Information entropy has been used to analyze the behavior of nonlinear dynamical systems and time series [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Information entropy is also used to quantify the complexity of symbol sequences such as DNA sequences [11]. Invariably the analysis of real data involves a finite amount of data. Furthermore, in the case of continuous variables, a quantization must be chosen. The calculated entropy of the data will have a functional dependence on the amount of data and the quantization chosen. To know the significance of the calculated entropy the effect of finite data and quantization on the probability distribution of the calculated entropy should be known. Expressions for the systematic and random errors in observed entropies have been calculated before by Basharin [12], Harris [13] and Herzel et al. [14] but such expressions have rarely been used in the nonlinear dynamics literature.
In this paper error estimates will be derived using the standard error formulae familiar to physicists. The formulae are presented in concise form in Section 6 of this paper.
These formulae will be verified by numerical experiments before being applied to the well-known logistic equation. This will demonstrate that when small datasets are being analyzed the bias and random error in mutual information can be significant and should be estimated.
Section snippets
Entropy and mutual information
The most common information theoretic functional is entropy. For a discrete variable, X, the entropy is defined aswhere the sum is over the BX “states” that X can assume and pi is the probability that X will be in state i. The joint entropy of two discrete variables, X and Y, is defined aswhere the sum is over the BX states that X can assume and the BY states that Y can assume and pij is the probability that X is in state i and Y is in state j.
Estimating the error on an observed entropy
In this section the systematic and random error of an observed entropy of a series of values will be estimated.
Consider an ensemble of series. Let there be N values in each series. Let each value be assigned to one of B states, which will be labeled i (i=1,2,…,B). Let the probability that a value will be in the ith state be pi. Let the number of values in the ith state be ni. The number of values in the ith state, ni, is a binomial random variable. This can be seen by considering each member of
Estimating the error on an observed mutual information
The error analysis of an observed mutual information can be performed in a similar manner to that of an observed entropy.
The observed mutual information, Iobs, is given byIf X can assume one of BX states and Y can assume one of BY states and if there are N pairs (X,Y) then the expectation value of Iobs is given bywhere I∞ is the “true” mutual information which would be measured when N
Application to the logistic equation
To illustrate how the error estimates can be used they were applied to datasets generated using the famous logistic equationTime series of , N=5000, N=500 and N=200 data points were generated. The points lay on the real interval [0,1] but were binned into 10 bins, each of width 0.1. The mutual information of each time series and a lagged version of itself was then calculated. The results are shown in Fig. 3. In each panel the solid line shows the observed mutual
Summary
Estimates of the systematic and standard error on observed entropies and mutual informations have been derived. The result for entropy iswhere qi is the observed distribution of states and B* is the number of bins for which qi≠0. The result for mutual information iswhere qX and qY are the observed distributions of X and Y respectively, that is
Acknowledgements
The author would like to thank Hans-Peter Herzel for drawing his attention to some of the previous work in this field and to the two anonymous reviewers whose suggestions greatly improved this paper.
References (18)
Singular-value decomposition in attractor reconstruction – pitfalls and precautions
Physica D
(1992)Information theoretic test for nonlinearity in time-series
Phys. Lett. A
(1993)Testing for nonlinearity in weather records
Phys. Lett. A
(1994)Testing for nonlinearity using redundancies – quantitative and qualitative aspects
Physica D
(1995)Detecting nonlinearity in multivariate time-series
Phys. Lett. A
(1996)Coarse-grained entropy rates for characterization of complex time-series
Physica D
(1996)- et al.
Extraction of delay information from chaotic time series based on information entropy
Physica D
(1997) Significance testing of information theoretic functionals
Physica D
(1997)- et al.
Finite sample effects in sequence analysis
Chaos, Solitons and Fractals
(1994)
Cited by (238)
Resolving heterogeneity in dynamics of synchronization stability within the salience network in autism spectrum disorder
2024, Progress in Neuro-Psychopharmacology and Biological PsychiatrySubstrate induced dynamical remodeling of the binding pocket generates GTPase specificity in DOCK family of guanine nucleotide exchange factors
2022, Biochemical and Biophysical Research CommunicationsCitation Excerpt :To understand the dynamical coupling between GEF and GTPase structures, we calculated the canonical residue-wise dynamical cross-correlation (DCC). Ten clusters of residues, grouped as independent components (ICs), with varying coupling hierarchy, were obtained by decomposing the DCC matrix using the statistical coupling analysis method [27] (Fig. S4). Surprisingly, there were no clusters that had residues from both GTPases and GEFs.
Structural damage identification under nonstationary excitations through recurrence plot and multi-label convolutional neural network
2021, Measurement: Journal of the International Measurement Confederation
- ☆
Contribution number 5759, California Institute of Technology Division of Geological and Planetary Sciences.