Author ORCID Identifier

https://orcid.org/0000-0002-8936-5684

Defense Date

2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Biostatistics

First Advisor

Nitai Mukhopadhyay

Abstract

Testing conformity of observed data with theoretical probability distributions is fundamental to statistical modeling, yet current methods face significant limitations when extending to multivariate settings where geometric structure and interpretability become increasingly important. This dissertation develops a novel Topological Data Analysis (TDA) framework that uses probability density level sets, regions where density exceeds specified thresholds, to test distributional conformity across dimensions. By quantifying the overlap between observed and reference level sets using the Dice Similarity Coefficient (DSC), we create a geometrically-motivated measure of distributional agreement that naturally scales from univariate to multivariate applications. The methodology employs adaptive permutation-based bootstrap procedures for statistical inference, with computational optimizations including early stopping criteria that reduce bootstrap replications by 35-40% while maintaining rigorous Type I error control at nominal levels. Through extensive simulations with 10,000 replications across multiple sample sizes and dimensionalities, the TDA-based approach demonstrates competitive performance against established tests including Shapiro-Wilk, Anderson-Darling, Henze-Zirkler, and the energy-based E-statistic, with particular advantages for detecting non-elliptical departures from normality showing power gains of 40-96% for skewed alternatives. Real-world applications to NHANES, Pima Indians diabetes, and Cleveland Heart Disease datasets reveal clinically meaningful patterns undetected by traditional methods, including metabolic heterogeneity in glucose-insulin relationships and age-related decoupling of cardiovascular risk factors. The framework provides unique geometric interpretability through visualization of spatially-specific distributional differences, offers a bounded effect size metric complementing hypothesis testing, and extends naturally to three-dimensional settings despite computational challenges. This work bridges topological thinking with statistical inference, providing practitioners with powerful tools for precision medicine where understanding distributional shape characteristics can identify at-risk individuals within ostensibly normal parameter ranges and enable more targeted therapeutic strategies.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

12-10-2025

Available for download on Friday, December 10, 2027

Included in

Biostatistics Commons

Share

COinS