DOI

https://doi.org/10.25772/4NAD-TG96

Defense Date

2021

Document Type

Thesis

Degree Name

Doctor of Philosophy

Department

Computer Science

First Advisor

Lukasz Kurgan

Abstract

COMPUTATIONAL ANALYSIS AND PREDICTION OF INTRINSIC DISORDER AND INTRINSIC DISORDER FUNCTIONS IN PROTEINS

By Akila Imesha Katuwawala

A dissertation submitted in partial fulfillment of the requirements for the degree of Engineering, Doctor of Philosophy with a concentration in Computer Science at Virginia Commonwealth University.

Virginia Commonwealth University, 2021

Director: Lukasz Kurgan, Professor, Department of Computer Science

Proteins, as a fundamental class of biomolecules, have been studied from various perspectives over the past two centuries. The traditional notion is that proteins require fixed and stable three-dimensional structures to carry out biological functions. However, there is mounting evidence regarding a “special” class of proteins, named intrinsically disordered proteins, which do not have fixed three-dimensional structures though they perform a number of important biological functions. Computational approaches have been a vital component to study these intrinsically disordered proteins over the past few decades. Prediction of the intrinsic disorder and functions of intrinsic disorder from protein sequences is one such important computational approach that has recently gained attention, particularly in the advent of the development of modern machine learning techniques. This dissertation runs along two basic themes, namely, prediction of the intrinsic disorder and prediction of the intrinsic disorder functions. The work related to the prediction of intrinsic disorder covers a novel approach to evaluate the predictive performance of the current computational disorder predictors. This approach evaluates the intrinsic disorder predictors at the individual protein level compared to the traditional studies that evaluate them over large protein datasets. We address several interesting aspects concerning the differences in the protein-level vs. dataset-level predictive quality, complementarity and predictive performance of the current predictors. Based on the findings from this assessment we have conceptualized, developed, tested and deployed an innovative platform called DISOselect that recommends the most suitable computational disorder predictors for a given protein, with an underlying goal to maximize the predictive performance. DISOselect provides advice on whether a given disorder predictor would provide an accurate prediction for a given protein of user’s interest, and recommends the most suitable disorder predictor together with an estimate of its expected predictive quality. The second theme, prediction of the intrinsic disorder functions, includes first-of-its-kind evaluation of the current computational disorder predictors on two functional sub-classes of the intrinsically disordered proteins. This study introduces several novel evaluation strategies to assess predictive performance of disorder prediction methods and focuses on the evaluation for disorder functions associated with interactions with partner molecules. Results of this analysis motivated us to conceptualize, design, test and deploy a new and accurate machine learning-based predictor of the disordered lipid-binding residues, DisoLipPred. We empirically show that the strong predictive performance of DisoLipPred stems from several innovative design features and that its predictions complements results produced by current disorder predictors, disorder function predictors and predictors of transmembrane regions. We deploy DisoLipPred as a convenient webserver and discuss its predictions on the yeast proteome.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

5-13-2021

Share

COinS