An efficient discriminant-based solution for small sample size problem

doi:10.1016/j.patcog.2008.08.036

Pattern Recognition

Volume 42, Issue 5, May 2009, Pages 857-866

https://doi.org/10.1016/j.patcog.2008.08.036 Get rights and content

Abstract

Classification of high-dimensional statistical data is usually not amenable to standard pattern recognition techniques because of an underlying small sample size problem. To address the problem of high-dimensional data classification in the face of a limited number of samples, a novel principal component analysis (PCA) based feature extraction/classification scheme is proposed. The proposed method yields a piecewise linear feature subspace and is particularly well-suited to difficult recognition problems where achievable classification rates are intrinsically low. Such problems are often encountered in cases where classes are highly overlapped, or in cases where a prominent curvature in data renders a projection onto a single linear subspace inadequate. The proposed feature extraction/classification method uses class-dependent PCA in conjunction with linear discriminant feature extraction and performs well on a variety of real-world datasets, ranging from digit recognition to classification of high-dimensional bioinformatics and brain imaging data.

Introduction

Feature extraction can be defined as a mapping from a typically high-dimensional data space to a space of reduced dimension, while maintaining some key data properties [1, p. 11]. Designing efficient feature extraction techniques that preserve, or enhance class separability, has been an active research area [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. In general, feature extraction renders the data management less cumbersome, while enabling more accurate estimates of data statistics [11], [12]. Modern technologies generate data at unprecedented rate and scale, and for classifying such large volumes of data, feature extraction becomes ubiquitous. The process is especially challenging under the small sample size conditions [13, p. 39], common to many applications such as face recognition [14], [15], object recognition [16], bioinformatics [17], [18], or large-scale biomedical data classification [19], [20], where the dimension of data, $n$ , by far exceeds the number of samples, $N$ , in the training database. Under the small sample size conditions, the data space is extremely sparse [13, p. 70], [21], and consequently the sample statistics may be grossly inaccurate. For predominantly used second-order techniques, the small sample size conditions imply highly singular covariance matrices, hence a direct implementation of classical discriminant analysis methods like linear discriminant analysis (LDA) [2], [3] is not possible.

A standard technique for coping with the singular covariance matrix problem is covariance matrix shrinkage [22], [23], [24], [25]. While shrinkage-based methods regularize a covariance matrix, they do not address the problem of high-dimensional data. This problem, commonly referred as the curse of dimensionality, presents challenges in handling and manipulating statistical data. In particular the inversion and spectral decomposition of large matrices may not be feasible with standard computer architectures. In addition, one or more shrinkage parameters need to be estimated, which typically requires expensive procedures [26], [27].

A popular approach to the small sample size problem is the removal of non-informative features via subspace-based decomposition techniques. Based on assumption that the sparse portion of the data space carries little, or no useful information, this approach proceeds by discarding this irrelevant subspace. Perhaps the most widely used subspace-based approach is to reduce the data dimension via principal component analysis (PCA) [28]. More elaborate subspace decomposition techniques have been developed, especially in the context of face recognition [14], [15], [29], [30], [31], [32]. Motivated by the success of PCA as a face representation technique [33], Turk and Pentland [14] proposed a face recognition technique widely referred to as the eigenface method. Upon removal of the extraneous subspace, this technique facilitates recognition by calculating the distance of a test face from the known face classes in the feature space. The related Fisherface method [15] uses PCA on the input data to make the within-class-scatter matrix regular, followed by LDA on the reduced-dimension data. The coupling of LDA to PCA in the Fisherface method improves its face recognition performance with respect to the eigenface method. Several other LDA-based approaches, addressing the small sample size problem, have been developed recently [29], [30], [31], [32], [34], [35], [36]. Yu and Yang in Ref. [30] proposed a direct linear discriminant analysis (DLDA) method, which discards the null space of the between-class-scatter matrix with the assumption that this space does not contain discriminative information. This assumption was subsequently disproved in Refs. [31], [37]. Wang and Tang in Ref. [31] presented a dual-space LDA which combines two complementary LDA subspaces at the feature level and their approach has similarities with the idea of regularization by shrinkage. Huang et al. in Ref. [34] suggested removing the null space of the total-scatter matrix, followed by computing the principal components of the between-class-scatter matrix restricted to the null space of the within-class-scatter matrix. Recently, Zhang and Sim [36] used the Fukunaga–Koontz transform [38] to analyze the discriminant subspace and suggested an alternative subspace decomposition technique for LDA.

An arsenal of kernel-based feature extraction techniques, such as kernel principal component analysis (KPCA) [39] and several variants of kernel Fisher discriminant (KFD) [40], [41], [42], [43], [44], [45], have gained some attention recently. Inspired by the success of nonlinear dimensionality reduction techniques [46], [47], an important class of manifold-based learning methods [48], [49], [50], [51], [52], [53] have also been proposed recently. A common assumption in these methods is that high-dimensional data lies on a low-dimensional submanifold, which is then linearly “flattened” into a subspace. The flattening is typically performed by preserving local data structure facilitated by using the nearest neighbor approach. For example, He et al. [48], [54] developed the so-called locality preserving projection (LPP) approach, where the data submanifold is uncovered by constructing a weighted graph of local interactions. They have also shown that PCA and LDA can be derived using this framework by an appropriate choice of graph weights. Fu and Huang's method [49], termed locally embedded analysis (LEA), represents a simplification of the method in Ref. [46], in that nonlinear transformations are replaced by linear projections. The method of Chen et al. [50] is similar to the LPP approach, except that class labels are explicitly utilized in the manifold learning. Further refinement and unification of the above ideas within a graph-theoretic framework were presented in Ref. [51], which also introduced the so-called marginal Fisher analysis (MFA). A related technique was developed in Ref. [52], which essentially combines the above graph-theoretic framework with LEA. A common feature to all of these techniques is that they result in a single (linear) subspace found by solving the generalized eigenvalue/eigenvector problem, similar to the classical LDA method. Therefore, under the small sample size conditions, these methods face similar matrix singularity problems as LDA, and a typical solution is to apply PCA to remove the excess dimension and render the matrices at hand regular [48], [51], [52]. Many of these methods also come with their kernel versions, e.g. Refs. [50], [51], which are suitable for problems where linear projections seem inadequate. A related extension of these ideas is based on tensor analysis, where the structure in high-dimensional data is preserved by treating samples as tensors, rather than vectors. By applying the rules of multilinear algebra, the generalization of vector spaces and submanifolds to tensor spaces and submanifolds follows readily. In particular, the tensor versions of PCA, LDA, LPP and MFA have been developed in Refs. [51], [55], [56], [57], respectively. While conceived as general dimensionality reduction/feature extraction techniques, the above methods have been mainly applied and tested on face and object recognition problems.

To overcome difficulties imposed by singular covariance matrices and/or the curse of dimensionality, popular pattern recognition approaches rely on either global dimensionality reduction techniques, such as PCA [14], or discarding of some specific subspace, typically within the LDA framework [29], [30], [31], [32], [34], [36]. Since it utilizes the statistical properties of a common data distribution, while ignoring class-conditional statistics, it can be argued that PCA and other global dimensionality reduction techniques are suboptimal for classification purposes. On the other hand, LDA-based subspace decomposition techniques, as well as the manifold learning techniques, described in Section 1.1, are well-suited to pattern recognition problems, but have not been tested outside of their primary application domain (face recognition). Our preliminary analysis [58], [59] shows that some of these techniques perform poorly when tested in low signal-to-noise ratio (SNR) conditions, characteristic of bioinformatics and biomedical measurements. Additionally, some of the methods like [48], [49], [50], [51], [53], [57] are computationally expensive when applied to large-scale biomedical data analysis.

In bioinformatics, the small sample size problem is quite common; for example, a typical microarray study consists of up to a hundred samples with thousands of gene-expression measurements. Reduction of data dimension also becomes essential for tasks like the biomarker discovery [18] and interpretive analysis [17]. Classification of brain signals, such as electroencephalograms (EEG), electrocorticograms (ECoG) [20], [58], [59] and functional magnetic resonance images (fMRI) [60], faces similar feature extraction challenges. Usual solutions to the small sample size problem in these applications are mostly heuristic and include data binning [19], [20], ranking and scoring of features based on some criteria [17], [18], [19] (e.g. Fisher score), as well as the selection of ad hoc features [61], [62] (e.g. power spectrum). In the case of high-dimensional spatio-temporal signals, a space–time separability assumption is often invoked; the data are first processed spatially (e.g. Laplacian filtering), followed by temporal processing (e.g. autoregressive frequency analysis) [61], [62].

In this article a computationally efficient, locally adaptable feature extraction and classification technique is developed, suitable for statistical data under the small sample size conditions. The proposed method exploits the strength of PCA as a dimensionality reduction technique, while preserving the class-specific information to facilitate subsequent classification. The technique is based on a classwise principal component analysis (CPCA), resulting in a simple piecewise linear dimensionality reduction. The idea behind the proposed approach is simple: a useless (non-informative) subspace in data is identified and discarded by applying PCA to each class. Classification is then carried out in the residual space, where the small sample size conditions and the curse of dimensionality are no longer concerns. Fig. 1 illustrates the advantage of a piecewise linear feature extraction over single-subspace techniques such as PCA and LDA. Monte Carlo integration confirms this assertion; the classification error estimated with the Bayes classifier is $\sim 42 %$ in the PCA and LDA subspaces, and $\sim 24 %$ in the CPCA subspace (Fig. 1A). Similarly, the respective errors are $\sim 50 %$ and $\sim 17 %$ in Fig. 1B.

There have been several attempts to develop a CPCA-based classifier. For example, CPCA has been explored as a face recognition technique in Refs. [63], [64], although, these methods represent a mere extension of the eigenface technique [33]. Namely, the test data are projected onto the classwise principal component subspaces and associated with the class whose subspace provides the best data approximation. Apart from using CPCA, this “classification by approximation” approach is otherwise ignorant of class discriminant information. Liu et al. [65] developed a CPCA system for unsupervised detection and classification of stars and galaxies in astronomical images. Because of its unsupervised nature, the method relies on model selection within the Gaussian mixture framework. The classwise principal components are learned through a neural network and the features from multiple subspaces are classified with a back-propagation neural network. Min et al. [66] developed a local PCA embedding technique, suitable for both image representation and classification. The local PCA, however, is performed over neighborhoods (rather than classes), and the method ultimately results in a single (linear) subspace. The method proposed in this article, however, is piecewise linear and results in multiple subspaces. To the best of the authors’ knowledge, the proposed method is original and has not been published previously.¹

The article is organized as follows. In Section 2, the CPCA method is introduced by describing the process of piecewise linear subspace estimation (Section 2.1), followed by a detailed description of the feature extraction and classification technique (Section 2.2). In Section 3 the performance of the proposed technique is validated on real-world datasets. Discussion and concluding remarks are given in Section 4.

Section snippets

Classwise PCA

The basic principle behind CPCA will be illustrated on a simple two-class example (see Fig. 2). In this example two-dimensional ( $2 D$ ) data are principally confined to a one-dimensional ( $1 D$ ) manifold, and so the class-specific $1 D$ principal component subspaces, $S_{1}$ and $S_{2}$ , can be seen as a piecewise linear approximation of the data manifold. Both classes are then projected to $S_{1}$ and $S_{2}$ , together with an unlabeled test data, $x$ . Within each subspace, statistical tests can be performed to determine

Description of experiments

The proposed feature extraction/classification technique was applied to various datasets whose properties are summarized in <?MCtwidthcolumnwidth?>Table 1. Five datasets [(c)–(g)] present genuine small sample size problems ( $n ⪢ N$ ), whereas the datasets (a) and (b) yield singular covariance matrices and are not directly amenable to standard second-order techniques, such as [2], [3], [10], [11], [12], [70], [71].

The datasets (a) and (b) are taken from the UCI Machine Learning Repository [72]. The

Discussion

The superior performance of the $CPCA + AIDA$ method may be attributed to its ability to extract features in a nonlinear (piecewise linear) fashion, while retaining computational simplicity, characteristic of eigenvalue-based techniques. The competing techniques, whether LDA- or manifold-based, are linear and consequently result in a single feature extraction subspace. This feature of the proposed technique may be especially advantageous in dealing with classes that are highly overlapped and where

Conclusion

A novel piecewise linear dimensionality reduction technique has been developed, suitable for classification of high-dimensional data under the small sample size conditions. At the core of the technique is a classwise (local) principal component analysis, allied to linear discriminant tools. The technique yields as many linear feature subspaces as the number of classes. An unlabeled sample is then classified according to the maximum a posteriori probability rule, and assigned to the class with

About the Author—KOEL DAS received her M.S. in electrical engineering from Wright State University, OH, 2003 and Ph.D. in electrical engineering and computer science from University of California, Irvine, CA, 2007. She is currently working as a research scholar at University of California, Santa Barbara. Her research interests are in the area of brain–machine interfaces, pattern recognition, image processing and geometric processing.

References (78)

N. Kumar et al.
Heteroscedastic discriminant analysis and reduced rank hmms for improved speech recognition
Speech Commun.
(1998)
L.-F. Chen et al.
A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000)
H. Yu et al.
A direct LDA algorithm for high-dimensional data—with application to face recognition
Pattern Recognition Lett.
(2001)
H. Gao et al.
Why direct lda is not equivalent to lda
Pattern Recognition
(2006)
G. Pfurtscheller et al.
EEG-based discrimination between imagination of right and left hand movement
Electroen. Clin. Neuro.
(1997)
Z.-Y. Liu et al.
Improved system for object detection and star/galaxy classification via local subspace analysis
Neural Networks
(2003)
W. Min et al.
Locality pursuit embedding
Pattern Recognition
(2004)
J.D. Tubbs et al.
Linear dimension reduction and Bayes classification with unknown population parameters
Pattern Recognition
(1982)
H. Brunzell et al.
Feature reduction for classification of multidimensional data
Pattern Recognition
(2000)
E.F. Petricoin et al.
Use of proteomic patterns in serum to identify ovarian cancer
Lancet
(2002)

R.O. Duda et al.

Pattern Classification

(2001)

R.A. Fisher

The use of multiple measurements in taxonomic problems

Ann. Eugenics

(1936)

C.R. Rao

The utilization of multiple measurements in problems of biological classification

J. R. Stat. Soc. B Methods

(1948)

P.M. Lewis II

The characteristic selection problem in recognition systems

IEEE Trans. Inf. Theory

(1962)

H.P. Decell, J.A. Quirein, An iterative approach to the feature selection problem, in: Proceedings of Purdue Conference...

P.A. Devijver et al.

Pattern Recognition: A Statistical Approach

(1982)

G. Saon, M. Padmanabhan, Minimum Bayes error feature selection for continuous speech recognition, in: Advances in...

K. Torkkola

Discriminative features for document classification

M. Loog et al.

Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion

IEEE Trans. Pattern Anal.

(2004)

Z. Nenadic

Information discriminant analysis: feature extraction with and information theoretic objective

IEEE Trans. Pattern Anal.

(2007)

K. Das et al.

Approximate information discriminant analysis: a computationally simple heteroscedastic feature extraction technique

Pattern Recognition

(2008)

K. Fukunaga

Introduction to Statistical Pattern Recognition

(1990)

M. Turk et al.

Eigenfaces for recognition

J. Cognitive Neurosci.

(winter 1991)

P.N. Belhumeur et al.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal.

(1997)

M. Pontil et al.

Support vector machines for 3d object recognition

IEEE Trans. Pattern Anal.

(1998)

T.R. Golub et al.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Science

(1999)

V.G. Tusher et al.

Significance analysis of microarrays applied to the ionizing radiation response

Proc. Natl. Acad. Sci. USA

(2001)

D.S. Rizzuto et al.

Spatial selectivity in human ventrolateral prefrontal cortex

Nat. Neurosci.

(2005)

Z. Nenadic et al.

Advances in cognitive neural prosthesis: recognition of neural data with an information-theoretic objective

P.J. Huber

Projection pursuit

Ann. Stat.

(1985)

J. Schäfer et al.

A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics

Stat. Appl. Genet. Mol. Biol.

(2005)

M.J. Daniels et al.

Shrinkage estimators for covariance matrices

Biometrics

(2001)

T. Hastie et al.

The Elements of Statistical Learning

(2001)

E.P. Xing, M.I. Jordan, R.M. Karp, Feature selection for high-dimensional genomic microarray data, in: ICML ’01:...

C.E. Thomaz, D.F. Gillies, R.Q. Feitosa, Small sample problem in Bayes plug-in classifier for image recognition, in:...

J.P. Hoffbeck et al.

Covariance matrix estimation and classification with limited training data

IEEE Trans. Pattern Anal.

(1996)

I.T. Jolliffe

Principal Component Analysis

(1986)

X. Wang et al.

Dual-space linear discriminant analysis for face recognition

2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)

(2004)

M. Zhu et al.

Subclass discriminant analysis

IEEE Trans. Pattern Anal.

(2006)

Cited by (57)

Functional ultrasound imaging of the human spinal cord
2024, Neuron
Utilizing the first in-human functional ultrasound imaging (fUSI) of the spinal cord, we demonstrate the integration of spinal functional responses to electrical stimulation. We record and characterize the hemodynamic responses of the spinal cord to a neuromodulatory intervention commonly used for treating pain and increasingly used for the restoration of sensorimotor and autonomic function. We found that the hemodynamic response to stimulation reflects a spatiotemporal modulation of the spinal cord circuitry not previously recognized. Our analytical capability offers a mechanism to assess blood flow changes with a new level of spatial and temporal precision in vivo and demonstrates that fUSI can decode the functional state of spinal networks in a single trial, which is of fundamental importance for developing real-time closed-loop neuromodulation systems. This work is a critical step toward developing a vital technique to study spinal cord function and effects of clinical neuromodulation.
Single-trial decoding of movement intentions using functional ultrasound neuroimaging
2021, Neuron
New technologies are key to understanding the dynamic activity of neural circuits and systems in the brain. Here, we show that a minimally invasive approach based on ultrasound can be used to detect the neural correlates of movement planning, including directions and effectors. While non-human primates (NHPs) performed memory-guided movements, we used functional ultrasound (fUS) neuroimaging to record changes in cerebral blood volume with 100 μm resolution. We recorded from outside the dura above the posterior parietal cortex, a brain area important for spatial perception, multisensory integration, and movement planning. We then used fUS signals from the delay period before movement to decode the animals’ intended direction and effector. Single-trial decoding is a prerequisite to brain-machine interfaces, a key application that could benefit from this technology. These results are a critical step in the development of neuro-recording and brain interface tools that are less invasive, high resolution, and scalable.
Class-wise feature extraction technique for multimodal data
2016, Neurocomputing
Within-class multimodality happens when the scattering of the patterns having the same class label follows more than one modal distribution. In this multimodal scenario, it is important to preserve the intrinsic information of the classes when reducing the dimensionality of the data. However, many feature extraction techniques are incapable of dealing properly with this kind of scenario. This paper proposes a method called Class-dependent Locality Preserving Projections that operates in scenarios that present within-class multimodality. Class-dependent Locality Preserving Projections evaluates each class separately, creates a specific projection for each one of them, analyzes a query pattern using the output of each class and classifies based on the class that better fits the pattern. The experimental study involves real and artificial datasets that were created to show within-class multimodality. The results indicate that Class-dependent Locality Preserving Projections can be used as a feature extraction technique for general purposes, and it is particularly advantageous when applied to within-class multimodal scenarios.
Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy
2015, Journal of Neuroscience Methods
Citation Excerpt :
Adaptations of this method might be particularly suited to assessing classifier performance in BCI research (Hamadicharef, 2010). Other solutions that have been proposed to tackle the small sample size problem include frameworks that combine cross-validation with bootstrapping (e.g. Fu et al., 2005) and the use of class-dependent PCA in conjunction with linear discriminant feature extraction (Das and Nenadic, 2009). It is noteworthy that a few authors have even suggested that classification studies should be based primarily on effect size estimation with confidence intervals, rather than on significance tests and p-values (Berrar and Lozano, 2013).
Machine learning techniques are increasingly used in neuroscience to classify brain signals. Decoding performance is reflected by how much the classification results depart from the rate achieved by purely random classification. In a 2-class or 4-class classification problem, the chance levels are thus 50% or 25% respectively. However, such thresholds hold for an infinite number of data samples but not for small data sets. While this limitation is widely recognized in the machine learning field, it is unfortunately sometimes still overlooked or ignored in the emerging field of brain signal classification. Incidentally, this field is often faced with the difficulty of low sample size. In this study we demonstrate how applying signal classification to Gaussian random signals can yield decoding accuracies of up to 70% or higher in two-class decoding with small sample sets. Most importantly, we provide a thorough quantification of the severity and the parameters affecting this limitation using simulations in which we manipulate sample size, class number, cross-validation parameters (k-fold, leave-one-out and repetition number) and classifier type (Linear-Discriminant Analysis, Naïve Bayesian and Support Vector Machine). In addition to raising a red flag of caution, we illustrate the use of analytical and empirical solutions (binomial formula and permutation tests) that tackle the problem by providing statistical significance levels (p-values) for the decoding accuracy, taking sample size into account. Finally, we illustrate the relevance of our simulations and statistical tests on real brain data by assessing noise-level classifications in Magnetoencephalography (MEG) and intracranial EEG (iEEG) baseline recordings.
Current state of digital signal processing in myoelectric interfaces and related applications
2015, Biomedical Signal Processing and Control
This review discusses the critical issues and recommended practices from the perspective of myoelectric interfaces. The major benefits and challenges of myoelectric interfaces are evaluated. The article aims to fill gaps left by previous reviews and identify avenues for future research. Recommendations are given, for example, for electrode placement, sampling rate, segmentation, and classifiers. Four groups of applications where myoelectric interfaces have been adopted are identified: assistive technology, rehabilitation technology, input devices, and silent speech interfaces. The state-of-the-art applications in each of these groups are presented.
Discriminative common vector approach based feature selection in face recognition
2014, Computers and Electrical Engineering
Citation Excerpt :
Reviews of the well-known feature selection methods are given in various early works [2,8]. To overcome small sample size problems related to the facial recognition methods, several methods have been proposed [9,10]. The Discriminative Common Vector Approach, DCVA [9], was based on the idea of using the common vectors of all classes.
A novel feature selection algorithm is proposed, which is related to the Discriminative Common Vector Approach (DCVA) utilized as a means to reduce the computational complexity of the facial recognition problem. The recognition performance of the selected features is tested with DCVA and well known subspace methods over AR and YALE face databases. Moreover, the scheme indicates that important facial parts like eyes, eyebrows, noses, and lips must be kept for recognition purposes while eliminating the pixels in cheek, chin, and forehead areas. This additional knowledge comes out in the form of T-shaped and elliptical face masks used to specify the region of interest (ROI). Hence, besides the excellent dimensionality reduction given by the use of the DCVA technique, there is an intelligent use of the original database that provides superior results even in the presence of an occlusion as it is the case when the facial images have scarves.

View all citing articles on Scopus

About the Author—ZORAN NENADIC received the diploma in control engineering from the University of Belgrade, Serbia, in 1995 and the M.S. and D.Sc. degrees in systems science and mathematics from Washington University, St. Louis, Missouri, in 1998 and 2001, respectively. From 2001 to 2005, he was a postdoctoral fellow with the Division of Engineering and Applied Science at the California Institute of Technology, Pasadena. Since 2005, he has been with the Department of Biomedical Engineering, University of California, Irvine, where he is currently an assistant professor. His research interests are in the area of adaptive biomedical signal processing, control algorithms for biomedical devices, brain–machine interfaces, and modeling and analysis of biological neural networks. He is a member of the IEEE, the Mathematical Association of America, and the Society for Neuroscience.

¹: Present address: University of California, Santa Barbara, CA 92697, USA.

View full text

An efficient discriminant-based solution for small sample size problem

Abstract

Introduction

Section snippets

Classwise PCA

Description of experiments

Discussion

Conclusion

Speech Commun.

Pattern Recognition

Pattern Recognition Lett.

Pattern Recognition

Electroen. Clin. Neuro.

Neural Networks

Pattern Recognition

Pattern Recognition

Pattern Recognition

Lancet

Pattern Classification

The use of multiple measurements in taxonomic problems

Ann. Eugenics

The utilization of multiple measurements in problems of biological classification

J. R. Stat. Soc. B Methods

The characteristic selection problem in recognition systems

IEEE Trans. Inf. Theory

Pattern Recognition: A Statistical Approach

Discriminative features for document classification

Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion

IEEE Trans. Pattern Anal.

Information discriminant analysis: feature extraction with and information theoretic objective

IEEE Trans. Pattern Anal.

Approximate information discriminant analysis: a computationally simple heteroscedastic feature extraction technique

Pattern Recognition

Introduction to Statistical Pattern Recognition

Eigenfaces for recognition

J. Cognitive Neurosci.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal.

Support vector machines for 3d object recognition

IEEE Trans. Pattern Anal.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Science

Significance analysis of microarrays applied to the ionizing radiation response

Proc. Natl. Acad. Sci. USA

Spatial selectivity in human ventrolateral prefrontal cortex

Nat. Neurosci.

Advances in cognitive neural prosthesis: recognition of neural data with an information-theoretic objective

Projection pursuit

Ann. Stat.

A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics

Stat. Appl. Genet. Mol. Biol.

Shrinkage estimators for covariance matrices

Biometrics

The Elements of Statistical Learning

Covariance matrix estimation and classification with limited training data

IEEE Trans. Pattern Anal.

Principal Component Analysis

Dual-space linear discriminant analysis for face recognition

2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)

Subclass discriminant analysis

IEEE Trans. Pattern Anal.