DOI

https://doi.org/10.25772/CKCS-BY33

Defense Date

2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Systems Modeling and Analysis

First Advisor

Dr. Chenlu Ke

Abstract

Variable selection and dimension reduction are two fundamental components of modern statistical and machine learning methodologies for analyzing high-dimensional datasets, which have become increasingly prevalent in the era of Big Data. However, most existing methods primarily focus on continuous data, despite many practical datasets containing both continuous and categorical variables. To address this challenge, this dissertation develops novel approaches for variable screening and dimension reduction specifically tailored to high-dimensional data involving categorical predictors.

For regression analyses with mixed predictor types, we propose a unified framework that constrains sufficient reduction of continuous variables through subpopulations defined by categorical variables. Leveraging reproducing kernel-based ANOVA statistics, a model-free extension of classical ANOVA methods used in linear models, we identify important individual predictors and linear combinations of predictors without imposing stringent modeling assumptions. Unlike traditional marginal screening methods, our screening approach evaluates each predictor in the presence of others, and hence improves variable selection accuracy. Following the identification of candidate predictors, we further introduce a kernel-based sequential least squares method that efficiently reduces dimensionality by extracting a few critical linear combinations from the selected predictors. Compared to existing partial sufficient dimension reduction methods, our technique offers greater flexibility as it requires neither predefined model structures nor strong assumptions about predictor distributions. Additionally, our method accommodates both continuous and categorical response variables and does not rely on slicing when dealing with continuous responses. Theoretical and computational aspects of the proposed methods are developed. Comprehensive simulation studies demonstrate their effectiveness across various regression and classification scenarios, supported by illustrative real-data applications.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

5-9-2025

Available for download on Sunday, May 09, 2027

Share

COinS