Author ORCID Identifier

https://orcid.org/0000-0003-4921-6454

Defense Date

2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Systems Modeling and Analysis

First Advisor

Paul Brooks

Second Advisor

Craig Larson

Third Advisor

Yanjun Qian

Fourth Advisor

Cesar Zamudio

Fifth Advisor

David Edwards

Abstract

Recommendation systems are essential for providing personalized user experiences, but their performance can be affected by outliers especially in traditional collaborative filtering methods that use the L2-norm. To address this challenge, we developed two new algorithms, SharpEl1rs and SharpEl1rs-Impute, based on the L1-norm to improve resistance against extreme values and effectively handle missing data. Our experimental setting was designed to compare these proposed methods with existing techniques. Then our algorithms are applied to real datasets to assess their performance, with findings indicating that our proposed models offer improved accuracy in some cases and solid performance in others for industrial-scale recommendation systems.

Following this, we introduced a third algorithm, SparseL1rs, which incorporates a Lasso penalty into the optimization problem. Traditional Lasso-penalized models generally assume fully observed data, limiting their applicability in real-world scenarios where datasets are often incomplete. Our integration allows for effective management of missing data and promotes sparsity, enhancing both resistance to outliers and interpretability.

In another part of this dissertation, the Conjecturing method was used as a machine learning technique to identify complex patterns and relationships within data as sufficient conditions. Its ability to capture the unique characteristics of each class of a target variable and handle both numerical and categorical features motivated us to use these conjectures for classification tasks. The conjectures were transformed into binary properties—labeled as true if a sample satisfied the conditions and false otherwise—and integrated into the dataset to provide additional information about each class. This approach not only offers deeper insights into the factors associated with each class but also enhances model transparency by clarifying the underlying relationships driving predictions. These binary properties augmented the original feature set, enriching the input data. In this study, we evaluated the impact of the Conjecturing method by incorporating these sufficient conditions into various models and comparing their performance with and without conjectures. Applied to three datasets—Heart Disease, Titanic, and Body Fluid—the method demonstrated its ability to generate meaningful conditions for each class, improving interpretability while in some cases enhancing model accuracy.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

12-12-2024

Share

COinS