Speaker: Anna Little
Title: Clustering High-dimensional Data with Path Metrics: A Balance of Density and Geometry
Abstract: This talk discusses multiple methods for clustering high-dimensional data, and explores the delicate balance between utilizing data density and data geometry. I will first present path-based spectral clustering, a novel approach which combines a density-based metric with graph-based clustering. This density-based path metric allows for fast algorithms and strong theoretical guarantees when clusters concentrate around low-dimensional sets. However, the method suffers from a loss of geometric information, information which is preserved by simple linear dimension reduction methods such as classic multidimensional scaling (CMDS). The second part of the talk will explore when CMDS followed by a simple clustering algorithm can exactly recover all cluster labels with high probability. However, scaling conditions become increasingly restrictive as the ambient dimension increases, and the method will fail for irregularly shaped clusters. Finally, I will discuss how a more general family of path metrics, combined with MDS, give low-dimensional embeddings which respect both data density and data geometry. This new method exhibits promising performance on single cell RNA sequence data and can be computed efficiently by restriction to a sparse graph.