Defense Date


Document Type


Degree Name

Doctor of Philosophy


Computer Science

First Advisor

Lukasz Kurgan


Drugs exert their (therapeutic) effects via molecular-level interactions with proteins and other biomolecules. Computational prediction of drug-protein interactions plays a significant role in the effort to improve our current and limited knowledge of these interactions. The use of the putative drug-protein interactions could facilitate the discovery of novel applications of drugs, assist in cataloging their targets, and help to explain the details of medicinal efficacy and side-effects of drugs. We investigate current studies related to the computational prediction of drug-protein interactions and categorize them into protein structure-based and similarity-based methods. We evaluate three representative structure-based predictors and develop a Protein-Drug Interaction Database (PDID) that includes the putative drug targets generated by these three methods for the entire structural human proteome. To address the fact that only a limited set of proteins has known structures, we study the similarity-based methods that do not require this information. We review a comprehensive set of 35 high-impact similarity-based predictors and develop a novel, high-quality benchmark database. We group these predictors based on three types of similarities and their combinations that they use. We discuss and compare key architectural aspects of these methods including their source databases, internal databases and predictive models. Using our novel benchmark database, we perform comparative empirical analysis of predictive performance of seven types of representative predictors that utilize each type of similarity individually or in all possible combinations. We assess predictive quality at the database-wide drug-protein interaction level and we are the first to also include evaluation across individual drugs. Our comprehensive analysis shows that predictors that use more similarity types outperform methods that employ fewer similarities, and that the model combining all three types of similarities secures AUC of 0.93. We offer a first-of-its-kind analysis of sensitivity of predictive performance to intrinsic and extrinsic characteristics of the considered predictors. We find that predictive performance is sensitive to low levels of similarities between sequences of the drug targets and several extrinsic properties of the input drug structures, drug profiles and drug targets.


© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission