DOI

https://doi.org/10.25772/2FB3-AF10

Defense Date

2016

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Biostatistics

First Advisor

Roy T. Sabo

Second Advisor

Le Kang

Third Advisor

Adam Sima

Fourth Advisor

Qiqi Lu

Fifth Advisor

Edward L. Boone

Abstract

Clustered data often feature nested structures and repeated measures. If coupled with binary outcomes and large samples (>10,000), this complexity can lead to non-convergence problems for the desired model especially if random effects are used to account for the clustering. One way to bypass the convergence problem is to split the dataset into small enough sub-samples for which the desired model convergences, and then recombine results from those sub-samples through meta-analysis. We consider two ways to generate sub-samples: the K independent samples approach where the data are split into k mutually-exclusive sub-samples, and the cluster-based approach where naturally existing clusters serve as sub-samples. Estimates or test statistics from either of these sub-sampling approaches can then be recombined using a univariate or multivariate meta-analytic approach. We also provide an innovative approach for simulating clustered and dependent binary data by simulating parameter templates that yield the desired cluster behavior. This approach is used to conduct simulation studies comparing the performance of the K independent samples and cluster-based approaches to generating sub-samples, the results from which are combined either with univariate and multivariate meta-analytic techniques. These studies show that using natural clusters leaded to lower biased test statistics when the number of clusters and treatment effect were large, as compared to the K independent samples approach for both the univariate and multivariate meta-analytic approaches. And the independent samples approach was preferred when the number of clusters and treatment effect were small. We also apply these methods to data on cancer screening behaviors obtained from electronic health records of n=15,652 individuals and showed that these estimated results support the conclusions from the simulation studies.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

5-11-2016

Included in

Biostatistics Commons

Share

COinS