Author ORCID Identifier
https://orcid.org/0000-0003-4921-6454
Defense Date
2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Systems Modeling and Analysis
First Advisor
Paul Brooks
Second Advisor
Craig Larson
Third Advisor
Yanjun Qian
Fourth Advisor
Cesar Zamudio
Fifth Advisor
David Edwards
Abstract
Recommendation systems are essential for providing personalized user experiences, but their performance can be affected by outliers especially in traditional collaborative filtering methods that use the L2-norm. To address this challenge, we developed two new algorithms, SharpEl1rs and SharpEl1rs-Impute, based on the L1-norm to improve resistance against extreme values and effectively handle missing data. Our experimental setting was designed to compare these proposed methods with existing techniques. Then our algorithms are applied to real datasets to assess their performance, with findings indicating that our proposed models offer improved accuracy in some cases and solid performance in others for industrial-scale recommendation systems.
Following this, we introduced a third algorithm, SparseL1rs, which incorporates a Lasso penalty into the optimization problem. Traditional Lasso-penalized models generally assume fully observed data, limiting their applicability in real-world scenarios where datasets are often incomplete. Our integration allows for effective management of missing data and promotes sparsity, enhancing both resistance to outliers and interpretability.
In another part of this dissertation, the Conjecturing method was used as a machine learning technique to identify complex patterns and relationships within data as sufficient conditions. Its ability to capture the unique characteristics of each class of a target variable and handle both numerical and categorical features motivated us to use these conjectures for classification tasks. The conjectures were transformed into binary properties—labeled as true if a sample satisfied the conditions and false otherwise—and integrated into the dataset to provide additional information about each class. This approach not only offers deeper insights into the factors associated with each class but also enhances model transparency by clarifying the underlying relationships driving predictions. These binary properties augmented the original feature set, enriching the input data. In this study, we evaluated the impact of the Conjecturing method by incorporating these sufficient conditions into various models and comparing their performance with and without conjectures. Applied to three datasets—Heart Disease, Titanic, and Body Fluid—the method demonstrated its ability to generate meaningful conditions for each class, improving interpretability while in some cases enhancing model accuracy.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
12-12-2024