Author ORCID Identifier
https://orcid.org/0000-0002-8936-5684
Defense Date
2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Biostatistics
First Advisor
Nitai Mukhopadhyay
Abstract
Testing conformity of observed data with theoretical probability distributions is fundamental to statistical modeling, yet current methods face significant limitations when extending to multivariate settings where geometric structure and interpretability become increasingly important. This dissertation develops a novel Topological Data Analysis (TDA) framework that uses probability density level sets, regions where density exceeds specified thresholds, to test distributional conformity across dimensions. By quantifying the overlap between observed and reference level sets using the Dice Similarity Coefficient (DSC), we create a geometrically-motivated measure of distributional agreement that naturally scales from univariate to multivariate applications. The methodology employs adaptive permutation-based bootstrap procedures for statistical inference, with computational optimizations including early stopping criteria that reduce bootstrap replications by 35-40% while maintaining rigorous Type I error control at nominal levels. Through extensive simulations with 10,000 replications across multiple sample sizes and dimensionalities, the TDA-based approach demonstrates competitive performance against established tests including Shapiro-Wilk, Anderson-Darling, Henze-Zirkler, and the energy-based E-statistic, with particular advantages for detecting non-elliptical departures from normality showing power gains of 40-96% for skewed alternatives. Real-world applications to NHANES, Pima Indians diabetes, and Cleveland Heart Disease datasets reveal clinically meaningful patterns undetected by traditional methods, including metabolic heterogeneity in glucose-insulin relationships and age-related decoupling of cardiovascular risk factors. The framework provides unique geometric interpretability through visualization of spatially-specific distributional differences, offers a bounded effect size metric complementing hypothesis testing, and extends naturally to three-dimensional settings despite computational challenges. This work bridges topological thinking with statistical inference, providing practitioners with powerful tools for precision medicine where understanding distributional shape characteristics can identify at-risk individuals within ostensibly normal parameter ranges and enable more targeted therapeutic strategies.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
12-10-2025