| An Information Theoretic Approach to Machine Learning (2005) | |||||||||||||||
Abstract | |||||||||||||||
| In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the Cauchy-Schwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS divergence is estimated non-parametrically using the Parzen window technique for density estimation. The problem domain is transformed from discrete 0/1 cluster membership values to continuous membership values. A constrained gradient descent maximization algorithm is implemented. The gradients are stochastically approximated to reduce computational complexity, making the algorithm more practical. Parzen window annealing is incorporated into the algorithm to help avoid convergence to a local maximum. The clustering results obtained on synthetic and real data are encouraging. The Parzen window-based estimator for the CS divergence is shown to have a dual expression as a measure of the cosine of the angle between cluster mean vectors in a feature space determined by the eigenspectrum of a Mercer kernel matrix. A spectral clustering | |||||||||||||||
Details der Publikation | |||||||||||||||
| |||||||||||||||