Defense Date

2024

Document Type

Thesis

Degree Name

Master of Science

Department

Chemical and Life Science Engineering

First Advisor

Dr. Charles McGill

Abstract

Abstract In this study, we investigate active learning in machine learning for chemical systems, with a focus on improving model performance and reducing data collection costs through uncertainty quantification methods. Data collection in chemistry is often resource-intensive, and active learning offers a cost-effective alternative by selectively choosing informative data points for iterative training. We evaluated various uncertainty quantification techniques, including ensemble-based methods (Evidential Ensemble and MVE Ensemble), dropout, and random se- lection, across several chemical datasets of varying properties. Our findings highlight that ensemble-based methods significantly improve predictive performance and are effective across a range of datasets. Evidential Ensemble and MVE Ensemble consistently outperformed other methods, underscoring the robustness of ensemble approaches for uncertainty estimation in chemical applications. Conversely, dropout and random selection proved less effective, with dropout yielding suboptimal results across all datasets tested. To assess practical scenarios, we simulated clustered initial training data, reflecting situations where initial samples are highly similar. Our analysis revealed that clustering impacts early active learning stages, with random selection performing reasonably well in the early iterations until ensemble-based methods accumulated sufficient data to excel. This underscores the influence of dataset structure on active learning efficacy and emphasizes the need to tailor uncertainty methods to dataset characteristics. Ultimately, our study demonstrates that strategic selection of uncertainty quantification methods based on the target dataset can optimize resource use and enhance model performance, allowing for effective data-driven insights in chemical systems with minimal data requirements.

Comments

Hello, Due to starting engineering career I could not spend so much time on my thesis. Please let me know if I missed anything in formatting and I'll fix it as soon as possible.

Thanks

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

11-11-2024

Available for download on Tuesday, November 11, 2025

Share

COinS