DOI
https://doi.org/10.25772/PJKZ-Q880
Author ORCID Identifier
Defense Date
2021
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Computer Science
First Advisor
Dr. Krzysztof J Cios
Abstract
Traditional density-based clustering approaches rely on a distance-based parameter to define data connectivity and density. However, an appropriate value of this parameter can be difficult to determine as it is highly dependent on the underlying distribution of the data. In particular, distribution parameters affect the scale of inter-group distances (e.g., variance); this dependence leads to a well-known inability to simultaneously detect clusters at varying levels of density. In this work, connectivity and density are defined according to the rank-order induced by the distance metric (i.e., invariant to the expected scale of the distances). Connectivity by k-nearest neighbors and density by the number of reverse k-nearest neighbors (i.e., vertex in-degree in the directed k-nearest neighbors graph).
Two novel density-based clustering algorithms are proposed, the non-hierarchical RNN-DBSCAN and its hierarchical generalization Hk-DC. The advantage of RNN- DBSCAN is that it requires a single parameter k and is robust to varying levels of cluster density, whereas Hk-DC provides an efficient solution for producing a hierarchical clustering of RNN-DBSCAN solutions over k for a fixed density threshold. Importantly, heuristics are proposed for selecting k and density threshold for RNN- DBSCAN and Hk-DC, along with a method for extracting a flat clustering solution from the hierarchy. Additionally, a cluster-dependent solution for handling noise is proposed.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
8-13-2021