Contextually guided unsupervised learning using local multivariate binary processors

Neural Netw. 1998 Jan;11(1):117-140. doi: 10.1016/s0893-6080(97)00110-x.

Abstract

We consider the role of contextual guidance in learning and processing within multi-stream neural networks. Earlier work ([Kay and Phillips, 1994][Kay and Phillips, 1996]; [Phillips et al., 1995]) showed how the goals of feature discovery and associative learning could be fused within a single objective and made precise using information theory in such a way that local binary processors could extract a single feature that is coherent across streams. In this paper, we consider multi-unit local processors with multivariate binary outputs that enable a greater number of coherent features to be extracted. Using the Ising model, we define a class of information-theoretic objective functions and also local approximations and derive the learning rules in both cases. These rules have similarities to, and differences from, the celebrated BCM rule. Local and global versions of infomax appear as by-products of the general approach, as well as multivariate versions of coherent infomax. Focussing on the more biologically plausible local rules, we describe some computational experiments designed to investigate specific properties of the processors and the general approach. The main conclusions are: (1) the local methodology introduced in the paper has the required functionality. (2) Different units within the multi-unit processors learned to respond to different aspects of their receptive fields. (3) The units within each processor generally produced a distributed code in which the outputs were correlated and which was robust to damage; in the special case where the number of units available was only just sufficient to transmit the relevant information, a form of competitive learning was produced. (4) The contextual connections enabled the information correlated across streams to be extracted and, by improving feature detection with weak or noisy inputs, they played a useful role in short-term processing and in improving generalization. (5) The methodology allows the statistical associations between distributed self-organizing population codes to be learned.