Defense Date


Document Type


Degree Name

Doctor of Philosophy


Integrative Life Sciences

First Advisor

Maria C. Rivera


Microbial communities are recognized as major drivers of global biogeochemical processes. However, the genetic diversity and composition, as well as processes leading to the origin and diversification of these communities in space and time, are poorly understood. Character- ization of microbial communities using high-throughput sequencing of 16S tags shows that Operational Taxonomic Unit (OTU) abundances can be approximated by a gamma distribu- tion, which suggests structuring around small numbers of highly abundant OTUs and a large proportion of rare OTUs. The current methods used to characterize how communities are structured rely on multivariate statistics, which operate on pair-wise distance matrices. My analyses demonstrate that use of these methods, by reducing a highly-dimensional data set (tens of samples, thousands of OTUs), results in a significant loss of information. I demon- strate that, in some cases, up to 80% of the least abundant OTUs may be removed while still recovering the same community relationships; this indicates these metrics are biased toward the highly abundant OTUs. I also demonstrate that the observed patterns of OTU abundance detected from microbial communities can be robustly modeled using techniques similar to those used to model the presence and absence of genes in genome evolution. Using simulation studies, I show that general Markov models in a Bayesian inference framework out- perform traditional, multivariate ecological methods in recovering true community structure. Applying this new methodology to Atlantic Ocean communities uncovered a distance-decay effect which was not revealed by the traditional methods; applying to communities discov- ered on Hog Island point toward mechanisms of thicket establishment. Although the ocean data set operated on a much larger, continental scale, characterization of the sequence data generated from the nutrient-poor soil on Hog Island, a barrier island off the Virginia coast, allows for a better characterization of the processes affecting these communities on a much smaller scale. Finally, using 16S data from the Human Vaginal Microbiome Project, gener- ated here at VCU under the umbrella of the overall NIH HMP initiative, I give examples of the quality control, analysis and visualization pipeline that I developed to support the efforts of this project. In conclusion, my analyses of the metagenomic sequence data from bacterial communities sampled from different environments demonstrate that the proper identification of the biological processes influencing these communities requires the development and im- plementation of new statistical and computational methodologies that take advantage of the extensive amount of information generated in next-generation, high-throughput sequencing projects.


© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

May 2013

Included in

Life Sciences Commons