Abstract
We present a novel pairwise clustering method. Given a proximity matrix of pairwise relations (i.e. pairwise similarity or dissimilarity estimates) between data points, our algorithm extracts the two most prominent clusters in the data set. The algorithm, which is completely nonparametric, iteratively employs a two-step transformation on the proximity matrix. The first step of the transformation represents each point by its relation to all other data points, and the second step re-estimates the pairwise distances using a statistically motivated proximity measure on these representations. Using this transformation, the algorithm iteratively partitions the data points, until it finally converges to two clusters. Although the algorithm is simple and intuitive, it generates a complex dynamics of the proximity matrices. Based on this bipartition procedure we devise a hierarchical clustering algorithm, which employs the basic bipartition algorithm in a straightforward divisive manner. The hierarchical clustering algorithm copes with the model validation problem using a general cross-validation approach, which may be combined with various hierarchical clustering methods.
We further present an experimental study of this algorithm. We examine some of the algorithm's properties and performance on some synthetic and ‘standard’ data sets. The experiments demonstrate the robustness of the algorithm and indicate that it generates a good clustering partition even when the data is noisy or corrupted.
Article PDF
Similar content being viewed by others
References
Bishop, C. (1995). Neural networks for pattern recognition. Oxford: Oxford Press.
Blake, C., Keogh, E., & Merz, C. (1998). UCI Repository of machine learning databases. http://www.ics.uci.edu/ ~mlearn/MLRepository.html.
Blatt, M., Wisemann, S., & Domany, E. (1996). Clustering data through an analogy to the Potts model. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in neural information processing system (Vol. 8, pp. 416–422), Cambridge, MA: The MIT Press.
Buhmann, J., & Hofmann, T. (1995). Pairwise data clustering by deterministic annealing. Technical Report IAITR–95–7, Department of Computer Science, University of Bonn. ftp://ftp.informatik.uni-bonn.de/pub/paper/ infIII/IAI-TR–95–7.ps.gz.
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1988). AUTOCLASS: A Bayesian classification system. In Proceedings of the Fifth International Machine Learing Conference (pp. 54–64). Ann Arbor MI: Morgan Kaufmann.
Clark, J., & Yallop, C. (1995). An introduction to phonetics and phonology (Blackwell textbooks in linguistics, Vol. 9). Oxford: Blackwell Pub.
Crawford, T., Iliopolous, C., & Raman, R. (1998). String matching techniques for musical similarity and melodic recognition. Computing in Muiscology, 11, 73–100.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39 (Series B), 1–38.
Dubnov, S., El-Yaniv, R., & Assayag, G. (1997). Universal classification applied to musical sequences. In Proceedings of the International Computer Music Conference. Ann Arbor, Michigan.
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
El-Yaniv, R., Fine, S., & Tishby, N. (1997). Agnostic classification of markovian sequences. In M. Jordan, M. Kearns, & S. Solla (Eds.), Advances in neural information processing system (Vol. 10). Cambridge, MA: The MIT Press.
Fanty, M., & Cole, R. (1991). Spoken letter recognition. In R. Lippmann, J. Moody, D. Touretzky, & S. Hanson (Eds.), Advances in neural information processing system (Vol. 3., pp. 220–226). Cambridge, MA: The MIT Press.
Fisher, D. (1996). Iterative optimization and simplification of hierarchical clusterings. J. Artifi. Intel. Res., 4, 147–179.
Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:2, 179–188.
Gdalyahu, Y., Weinshall, D., & Werman, M. (1988).Arandomized algorithm for pairwise clustering. In M. Kearns, S. Solla, & D. Cohn (Eds.), Advances in neural information processing systems (Vol. 11). Cambridge, MA: The MIT Press.
Gomory, R., & Hu, T. (1961). Multi-terminal network flows. SIAM Journal of Applied Mathematics, 9, 551–570.
Gowda, K.C., & Krishna, G. (1979). The condensed nearest neighbor rule using the concept of mutual nearest neighborhood. IEEE Transactions on Information Theory, 25, 488–490.
Gutman, M. (1989). Asymptotically optimal classification for multiple tests with empirically obsereved statistics. IEEE Transactions on Information Theory, 35:2, 401–408.
Jain, A., & Dubes, R. (1988). Algorithms for clustering data. New Jersey: Prentice-Hall.
Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. In Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.
Kruskal, J. (1978). A theorem about CONCOR. Technical Report MH 2C-571, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974.
Kullback, S. (1959). Information theory and statistics. New York: Wiley & Sons.
Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37:1, 145–151.
Linial, M., Linial, N., Tishby, N., & Yona, G. (1997). Global organization of protein segments using newalgorithms for metric empbedding and clustering. Journal of Molecular Biology, 268, 539–556.
McQuitty, L., & Clark, J. (1968). Clusters from iterative intercolumnar correlational analysis. Educ. Psych. Meas., 28, 211–238.
Michalski, R., & Stepp, R. (1983). Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligenc, 5, 219–243.
Pereira, F., Tishby, N., & Lee, L. (1993). Distributional clustering of English words. In Proc. of the 31st Annual Meeting of the Association for Computational Linguistics.
Perona, P., & Freeman, W.T. (1998). A factorization approach to grouping. In Proceedings of ECCV.
Rolland, P., & Ganascia, J. (1999). Musical pattern extraction and similarity assesment. In E. Miranda (Ed.), Reading in music and AI. New York: Harwood Academic Press.
Rose, K., Gurewitz, E., & Fox, G. (1992). Vector quantization by deterministic annealing. IEEE Transactions on Information Theory, 38.
Schreibman, A. (2000). Stochastic modeling for efficient computation of information theoretic quantities. Master's thesis, Hebrew University.
Schwob, R. (1998). The classical midi archive. http://www.prs.net/midi.html.
Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. In Proceedings of CVPR.
Slonim, N., & Tishby, N. (2000). Data clustering by Markovian relaxation and the information bottleneck method. Advances in Neural Information Processing Systems (Vol. 13, pp. 640–646). Cambridge, MA: The MIT Press.
Smith, S. (1993). Threshold validity for mutual neighborhood clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 89–92.
Tishby, N., Pereira, F., & Bialek, W. (1999). The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing (pp. 368–377).
Wallace, C., & Dowe, D. (1994). Intrinsic classification by MML-the SNOB program. In Proceedings of the 7th Australian Joint Conference on Artificial Intelligence (pp. 37–44). UNE, Armidale, NSW, Australia: World Scientific.
Wong, Y. (1993). Clustering data by melting. Neural Computation, 5, 89–104.
Wu, Z., & Leahy, R. (1993). An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 1101–1113.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dubnov, S., El-Yaniv, R., Gdalyahu, Y. et al. A New Nonparametric Pairwise Clustering Algorithm Based on Iterative Estimation of Distance Profiles. Machine Learning 47, 35–61 (2002). https://doi.org/10.1023/A:1013631728342
Issue Date:
DOI: https://doi.org/10.1023/A:1013631728342