Defense Date

2016

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Biostatistics

First Advisor

Mikhail G. Dozmorov

Second Advisor

Nak-Kyeong Kim

Third Advisor

Roy T. Sabo

Fourth Advisor

Vladimir I. Vladimirov

Fifth Advisor

Timothy P. York

Abstract

The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from any tissues suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. However, this variability may actually be advantageous, as heterogeneous gene expression measurements coupled with cell counts may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression. Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect the performance of linear regression. In this dissertation we specifically address the parameter space involved in the most rigorous use of linear regression to estimate cell type-specific differential expression and report under which conditions significant detection is probable. We define parameters affecting the sensitivity of cell type-specific differential expression estimation as follows: sample size, cell type-specific proportion variability, mean squared error (spread of observations around linear regression line), conditioning of the cell proportions predictor matrix, and the size of actual cell type-specific differential expression. Each parameter, with the exception of cell type-specific differential expression (effect size), affects the variability of cell type-specific differential expression estimates. We have developed a power-analysis approach to cell type by cell type and genomic site by site differential expression detection which relies upon Welch’s two-sample t-test and factors in differences in cell type-specific expression estimate variability and reduces false discovery. To this end we have published an R package, LRCDE, available in GitHub (http://www.github.com/ERGlass/lrcde.dev) which outputs observed statistics of cell type-specific differential expression, including two-sample t- statistic, t-statistic p-value, and power calculated from two-sample t-statistic on a genomic site- by-site basis.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

9-2-2016

Share

COinS