Defense Date
2024
Document Type
Thesis
Degree Name
Master of Science
Department
Chemical and Life Science Engineering
First Advisor
Dr. Charles McGill
Abstract
Abstract In this study, we investigate active learning in machine learning for chemical systems, with a focus on improving model performance and reducing data collection costs through uncertainty quantification methods. Data collection in chemistry is often resource-intensive, and active learning offers a cost-effective alternative by selectively choosing informative data points for iterative training. We evaluated various uncertainty quantification techniques, including ensemble-based methods (Evidential Ensemble and MVE Ensemble), dropout, and random se- lection, across several chemical datasets of varying properties. Our findings highlight that ensemble-based methods significantly improve predictive performance and are effective across a range of datasets. Evidential Ensemble and MVE Ensemble consistently outperformed other methods, underscoring the robustness of ensemble approaches for uncertainty estimation in chemical applications. Conversely, dropout and random selection proved less effective, with dropout yielding suboptimal results across all datasets tested. To assess practical scenarios, we simulated clustered initial training data, reflecting situations where initial samples are highly similar. Our analysis revealed that clustering impacts early active learning stages, with random selection performing reasonably well in the early iterations until ensemble-based methods accumulated sufficient data to excel. This underscores the influence of dataset structure on active learning efficacy and emphasizes the need to tailor uncertainty methods to dataset characteristics. Ultimately, our study demonstrates that strategic selection of uncertainty quantification methods based on the target dataset can optimize resource use and enhance model performance, allowing for effective data-driven insights in chemical systems with minimal data requirements.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
11-11-2024
Comments
Hello, Due to starting engineering career I could not spend so much time on my thesis. Please let me know if I missed anything in formatting and I'll fix it as soon as possible.
Thanks