DOI

https://doi.org/10.25772/34W4-V547

Defense Date

2014

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Biostatistics

First Advisor

Mark Reimers

Second Advisor

Michael Neale

Third Advisor

Kellie Archer

Fourth Advisor

Nitai Mukhopadhyay

Fifth Advisor

Shirley Taylor

Abstract

In recent years, the development of new genomic technologies has allowed for the investigation of many regulatory epigenetic marks besides expression levels, on a genome-wide scale. As the price for these technologies continues to decrease, study sizes will not only increase, but several different assays are beginning to be used for the same samples. It is therefore desirable to develop statistical methods to integrate multiple data types that can handle the increased computational burden of incorporating large data sets. Furthermore, it is important to develop sound quality control and normalization methods as technical errors can compound when integrating multiple genomic assays. DNA methylation is a commonly studied epigenetic mark, and the Infinium HumanMethylation450 BeadChip has become a popular microarray that provides genome-wide coverage and is affordable enough to scale to larger study sizes. It employs a complex array design that has complicated efforts to develop normalization methods. We propose a novel normalization method that uses a set of stable methylation sites from housekeeping genes as empirical controls to fit a local regression hypersurface to signal intensities. We demonstrate that our method performs favorably compared to other popular methods for the array. We also discuss an approach to estimating cell-type admixtures, which is a frequent biological confound in these studies. For data integration we propose a gene-centric procedure that uses canonical correlation and subsequent permutation testing to examine correlation or other measures of association and co-localization of epigenetic marks on the genome. Specifically, a likelihood ratio test for general association between data modalities is performed after an initial dimension reduction step. Canonical scores are then regressed against covariates of interest using linear mixed effects models. Lastly, permutation testing is performed on weighted correlation matrices to test for co-localization of relationships to physical locations in the genome. We demonstrate these methods on a set of developmental brain samples from the BrainSpan consortium and find substantial relationships between DNA methylation, gene expression, and alternative promoter usage primarily in genes related to axon guidance. We perform a second integrative analysis on another set of brain samples from the Stanley Medical Research Institute.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

12-9-2014

Share

COinS