Author ORCID Identifier

https://orcid.org/0009-0001-7960-512X

Defense Date

2025

Document Type

Thesis

Degree Name

Master of Science

Department

Bioinformatics

First Advisor

Dr. Bret Boyd

Abstract

Part 1

I examined a potential source of variation in phylogenetic inference, investigating whether handling single-nucleotide polymorphisms (SNPs) through different methods during reference-guided genome assembly of pigeon and dove species impacts phylogenetic reconstruction and inference. Specifically, I created a custom consensus base calling tool that handles allelic variation through two different approaches, either ignoring heterozygous sites by denoting them as an ambiguous character “N,” or using a pseudo-random choice between supported base variants. Through reference-guided assembly of 108 dove species, I generated two datasets of alignments using both consensus calling methods, created two trees from both datasets via supermatrix and supertree approaches, and compared trees for differences in topology and/or pairwise distances between species. While all four resulting trees were well-supported and largely in agreement with each other, some small differences in topology and branch lengths were observed. The differing branch lengths appeared to be influenced by an increased count of polymorphic sites in certain species, and thus were affected more noticeably by the SNP-handling method. However, differences in pairwise patristic distances to the reference between datasets were small overall, and the longer branches in the trees were not accompanied by a change between species relationships. The topological variations that were present also did not consistently appear to result from differences in SNP-handling methods, or differences in the tree-building approach, and more likely reflect a stochastic inconsistency due to points of weak phylogenetic signal. Overall, no real change in relationships between species appeared to be impacted by the method of handling polymorphic sites within the samples.

Part 2

In the second part of my thesis, I examined the effectiveness of reference-guided genome assembly of heritable bacteria in association with dove-parasitizing lice (Columbicola). Parasitic lice, like many other insects, are host to heritable bacteria (endosymbionts) that provide the lice with essential metabolites. These bacteria have genomes that are substantially smaller than their a-symbiotic relatives, likely a product of genome reduction following transition to vertical inheritance by lice. Commonly used de novo approaches of assembling endosymbiont genomes from whole insect gDNA can be time-consuming, and more efficient reference-guided alternatives have primarily been implemented with endosymbionts closely related to free-living reference taxa. In this study, I developed a pipeline for reference-guided assembly of endosymbiont genomes from whole insect gDNA, designed to be implemented within a high-performance computing environment. The pipeline was first developed and tested with the assembly of Sodalis sp. endosymbiont genomes from Columbicola passerinae 1, using the genome of the closely-related, free-living Sodalis praecaptivus as a reference. Comparison of the resulting assembly to a prior complete Sodalis assembly, which had been generated from the same library of reads, demonstrated the pipeline’s behavior to be overall consistent. To test the pipeline’s performance with more distantly related taxa, I attempted to utilize it in assembling an Enterobacter sp. endosymbiont from Columbicola macrourae 1, using a sampling of multiple reference genomes in the Enterobacteriaceae family. The pipeline on its own failed to recover a continuous Enterobacter assembly, lacking alignments outside of the most conserved protein-coding regions, thus requiring a combined de novo approach to complete the remainder of the assembly. The final assembly appeared largely complete and yielded a full suite of tRNAs, in comparison to an assembly from a prior study that utilized the same read library and lacked all 20 canonical tRNAs. While the pipeline on its own did not succeed in completely assembling an Enterobacter sp. genome, it does appear to recover most, if not all, functionally intact genes. An avenue for future research could involve validating and tailoring the pipeline for this purpose, which could make it a more efficient alternative to whole de novo genome assembly when primarily seeking to obtain an endosymbiont’s genes.

Rights

© Mariam Topchyan

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

4-24-2025

Share

COinS