DOI
https://doi.org/10.25772/4CY7-PH72
Defense Date
2015
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Biostatistics
First Advisor
Kellie J. Archer
Abstract
The advent of high-throughput sequencing has brought about the creation of an unprecedented amount of research data. Analytical methodology has not been able to keep pace with the plethora of data being produced. Two assays, ImmunoSEQ and the cytokinesisblock micronucleus (CBMN), that both produce count data and have few methods available to analyze them are considered.
ImmunoSEQ is a sequencing assay that measures the beta T-cell receptor (TCR) repertoire. The ImmunoSEQ assay was used to describe the TCR repertoires of patients that have undergone hematopoietic stem cell transplantation (HSCT). Several different methods for spectratype analysis were extended to the TCR sequencing setting then applied to these data to demonstrate different ways the data set can be analyzed. The different methods include CDR3 distribution perturbation, Oligoscores, Simpson's diversity, Shannon diversity, Kullback-Liebler divergence, a non-parametric method and a proportion logit transformation method. Herein we also demonstrate adapting compositional data analysis methods to the TCR sequencing setting. The various methods were compared when analyzing a set of 13 subjects who underwent hematopoietic stem cell transplantation. The eight subjects who developed graft versus host disease were compared to the five who did not. There was no little overlap in the results of the different methods showing that researchers must choose the appropriate method for their research question of interest.
The CBMN assay measures the rate of micronuclei (MN) formation in a sample of cells and can be paired with gene expression or methylation assays to determine association between MN formation and other genetic markers. Herein we extended the generalized monotone incremental forward stagewise (GMIFS) method to the situation where the response is count data and there are more independent variables than there are samples. Our Poisson GMIFS method was compared to a popular alternative, glmpath, by using simulations and applying both to real data. Simulations showed that both methods perform similarly in accurately choosing truly significant variables. However, glmpath appears to overfit compared to our GMIFS method. Finally, when both methods were applied to two data sets GMIFS appeared to be more stable than glmpath.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
7-30-2015
Included in
Biostatistics Commons, Genetic Processes Commons, Other Immunology and Infectious Disease Commons