DOI

https://doi.org/10.25772/CTR6-EQ68

Defense Date

2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Biostatistics

First Advisor

Dr. Dipankar Bandyopadhyay

Abstract

Features of non-Gaussianity of responses, manifested via skewness and heavy tails, are ubiquitous in data generated from large scale observational studies. For example, in periodontal disease (PD) studies, the clinical biomarkers, such as clinical attachment level (CAL) and probed pocket depth (PPD), are non-Gaussian. Yet, they continue to be routinely analyzed via linear/non-linear mixed effects models under standard Gaussian assumptions for the random terms. The modeling framework in (longitudinal) observational studies is also further complicated in presence of irregular observation times, and data size resulting from large number of subjects with long time profiles. This dissertation attempts to address these shortcomings, utilizing a classical framework. In the first project, we define and elucidate an extension of the (parametric) skew-t linear mixed model for the CAL response,where the model fitting addresses the big data setting via the implementation of divide-and-conquer techniques that utilize a novel distributed alternating expectation conditional-maximization (AECM) algorithm. Specifically, the E- steps of the AECM algorithm are run in parallel on multiple worker processes, while manager processes perform the M-steps with an updated fraction of the results from the local expectation steps. We prove convergence properties of this algorithm and compare its performance to traditional modelling techniques using both synthetic and a real dataset generated from the HealthPartners® (HP) HMO. In the second project, we extend this framework and formulate a novel matrix-variate skew-t model, suitable for simultaneous modeling of bivariate (CAL and PD) responses. This model utilizes matrix-variate distributions, while allowing for novelties such as varying-dimensional matrices, not traditionally observed in matrix-variate regressions. For model-fitting, we propose versions of the acceleration techniques developed in Project 1 that accelerates the necessary expectation conditional-maximization (ECM) algorithm via distribution of the E-steps in the ECM algorithm utilized to fit this model, and fractional updating of the statistics utilized in the M-steps. Once again, we study the model performances using both synthetic and the HP data. A major contribution is the dissemination of well-documented software (utilizing R and Rcpp) for ready implementation of our proposed models. In sum, our proposed methodological and computational advances (with theoretical validations) are readily generalizable for efficient modeling of both univariate and multivariate non-Gaussian responses in a variety of observational longitudinal settings.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

8-12-2022

Available for download on Wednesday, August 11, 2027

Included in

Biostatistics Commons

Share

COinS