DOI

https://doi.org/10.25772/SYNN-TJ31

Author ORCID Identifier

https://orcid.org/0000-0001-7070-4810

Defense Date

2021

Document Type

Thesis

Degree Name

Master of Science

Department

Computer Science

First Advisor

Alberto Cano

Second Advisor

Bartosz Krawczyk

Third Advisor

Semi Ryu

Abstract

Multi-label data streams are sequences of multi-label instances arriving over time to a multi-label classifier. The properties of the data stream may continuously change due to concept drift. Therefore, algorithms must adapt constantly to the new data distributions. In this paper we propose a novel ensemble method for multi-label drifting streams named Homogeneous Ensemble of Self-Adjusting Nearest Neighbors (HESAkNN). It leverages a self-adjusting kNN as a base classifier with the advantages of ensembles to adapt to concept drift in the multi-label environment. To promote diverse knowledge within the ensemble, each base classifier is given a unique subset of features and samples to train on. These samples are distributed to classifiers in a probabilistic manner that follows a Poisson distribution as in online bagging. Accompanying these mechanisms, a collection of ADWIN detectors monitor each classifier for the occurrence of a concept drift. Upon detection, the algorithm automatically trains additional classifiers in the background to attempt to capture new concepts. After a pre-determined number of instances, both active and background classifiers are compared and only the most accurate classifiers are selected to populate the new active ensemble. The experimental study compares the proposed approach with 30 other classifiers including problem transformation, algorithm adaptation, kNNs, and ensembles on 30 diverse multi-label datasets and 11 performance metrics. Results validated using non-parametric statistical analysis support the better performance of the heterogeneous ensemble and highlights the contribution of the feature and instance diversity in improving the performance of the ensemble.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

5-9-2021

Included in

Data Science Commons

Share

COinS