DOI

https://doi.org/10.25772/PJKZ-Q880

Author ORCID Identifier

0000-0001-9653-6704

Defense Date

2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Computer Science

First Advisor

Dr. Krzysztof J Cios

Abstract

Traditional density-based clustering approaches rely on a distance-based parameter to define data connectivity and density. However, an appropriate value of this parameter can be difficult to determine as it is highly dependent on the underlying distribution of the data. In particular, distribution parameters affect the scale of inter-group distances (e.g., variance); this dependence leads to a well-known inability to simultaneously detect clusters at varying levels of density. In this work, connectivity and density are defined according to the rank-order induced by the distance metric (i.e., invariant to the expected scale of the distances). Connectivity by k-nearest neighbors and density by the number of reverse k-nearest neighbors (i.e., vertex in-degree in the directed k-nearest neighbors graph).

Two novel density-based clustering algorithms are proposed, the non-hierarchical RNN-DBSCAN and its hierarchical generalization Hk-DC. The advantage of RNN- DBSCAN is that it requires a single parameter k and is robust to varying levels of cluster density, whereas Hk-DC provides an efficient solution for producing a hierarchical clustering of RNN-DBSCAN solutions over k for a fixed density threshold. Importantly, heuristics are proposed for selecting k and density threshold for RNN- DBSCAN and Hk-DC, along with a method for extracting a flat clustering solution from the hierarchy. Additionally, a cluster-dependent solution for handling noise is proposed.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

8-13-2021

Share

COinS