Power Analysis in Applied Linear Regression for Cell Type-Specific Differential Expression Detection
DOI
https://doi.org/10.25772/VPMR-W673
Defense Date
2016
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Biostatistics
First Advisor
Mikhail G. Dozmorov
Second Advisor
Nak-Kyeong Kim
Third Advisor
Roy T. Sabo
Fourth Advisor
Vladimir I. Vladimirov
Fifth Advisor
Timothy P. York
Abstract
The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from any tissues suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. However, this variability may actually be advantageous, as heterogeneous gene expression measurements coupled with cell counts may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression. Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect the performance of linear regression. In this dissertation we specifically address the parameter space involved in the most rigorous use of linear regression to estimate cell type-specific differential expression and report under which conditions significant detection is probable. We define parameters affecting the sensitivity of cell type-specific differential expression estimation as follows: sample size, cell type-specific proportion variability, mean squared error (spread of observations around linear regression line), conditioning of the cell proportions predictor matrix, and the size of actual cell type-specific differential expression. Each parameter, with the exception of cell type-specific differential expression (effect size), affects the variability of cell type-specific differential expression estimates. We have developed a power-analysis approach to cell type by cell type and genomic site by site differential expression detection which relies upon Welch’s two-sample t-test and factors in differences in cell type-specific expression estimate variability and reduces false discovery. To this end we have published an R package, LRCDE, available in GitHub (http://www.github.com/ERGlass/lrcde.dev) which outputs observed statistics of cell type-specific differential expression, including two-sample t- statistic, t-statistic p-value, and power calculated from two-sample t-statistic on a genomic site- by-site basis.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
9-2-2016