Defense Date


Document Type


Degree Name

Doctor of Philosophy



First Advisor

David Wheeler


Investigations into the association between chemical exposure and health outcomes are increasingly focused on the role of chemical mixtures, as opposed to individual chemicals. The analysis of chemical mixture data required the development of novel statistical methods, one of these being Bayesian group index regression. A statistical challenge common to all chemical mixture analyses is the ubiquitous presence of below detection limit (BDL) data. We propose an extension of Bayesian group index regression that treats both regression effects and missing BDL observations as parameters in a model estimated through a Markov Chain Monte Carlo algorithm that we refer to as Pseudo-Gibbs imputation. The Pseudo-Gibbs approach enables the estimated parameters of the health effects model to inform the missing data imputations and vice versa, as well as accounting for the true variance of the BDL imputations. We conduct a simulation study showing greater power to detect chemical indices significantly associated with an outcome and sensitivity for identifying important chemicals within indices at high levels of BDL missing data. We apply our model to a case-control study on the effects of chemical exposure on childhood leukemia. We next address a problem specific to group index models: how to partition a given set of chemicals into groups to form the requisite indices. We first proposed a novel variable clustering algorithm using a variant on the traditional PCA algorithm called Robust PCA. We compared this clustering method with other variable clustering methods from the literature using a simulation study. Finally, we extended the variable clustering method identified previously to incorporate information from an outcome variable. This semi-supervised clustering extension incorporates the ability to constrain clusters based on the direction of association of individual chemicals with the outcome of interest. We apply both unsupervised variable clustering and semi-supervised clustering methods identified to a case-control study on the effects of chemical exposure on non-Hodkin’s lymphoma.


© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission


Included in

Biostatistics Commons