Defense Date


Document Type


Degree Name

Doctor of Philosophy


Pharmaceutical Sciences

First Advisor

Dr. Pramit Nadpara


Objective: The objectives of our study were to characterize the study population with cancer and cardiovascular diseases (CVD) both as compared to those without and to build a predictive model using machine learning (ML) algorithms that can predict the risk of CVD in cancer patients. In addition, our objective was also to evaluate characteristics associated with cardiotoxic adverse events of breast cancer therapies and develop a multiple criteria decision analysis (MCDA) model to conduct benefit-risk assessment of breast cancer therapy regimens. Methods: We used Medical Expenditure Panel Survey (MEPS) and FDA Adverse Events Reporting System (FAERS) 2005-2015 files along with literature evidence for our study. We used MEPS database to train our predictive models using ML algorithms such as random forest (RF), gradient boosting and deep learning and compared these to standard regression models. Separate predictive models were built for chronic and acute CVD. We characterized the population with both cancer and CVD and those with cancer therapy associated cardiotoxic adverse events using multinomial logistic models . FAERS and literature evidence were also used to build the MCDA model to rank the breast cancer therapy regimens given the benefits and the risks involved in the treatment alternatives. Results: Our study sample consisted of 44,217 cancer patients identified using MEPS 2005-2015 files out of which 12,339 (28.7%) patients were also diagnosed with CVD. Age, marital status, education and employment status were the sociodemographic characteristics that differed significantly across cancer patients with and without CVD. We observed that most of the ML models for chronic (RF c-statistic: 0.9872, gradient boosting c-statistic: 0.7608, deep learning c-statistic: 0.7662) and acute CVD (RF c-statistic: 0.9738, gradient boosting c-statistic: 0.7853, deep learning c-statistic: 0.8267) were more accurate than the standard regression models for chronic (standard regression model c-statistic: 0.7641, GLM net model c-statistic: 0.7349) and acute (standard regression model c-statistic: 0.7534, GLM net model c-statistic: 0.7853) CVD. We then used the most accurate RF model to build a web-based application that could predict CVD risk. We then identified 35,630,544 breast cancer patients using FAERS dataset. Our findings suggest that breast cancer patients receiving targeted therapies were more likely to be diagnosed with CVD as compared to those who were receiving conventional therapies (OR = 1.213, 95% CI = 1.180, 1.247). On conducting MCDA, we found that the breast cancer therapy regimen 3 with trastuzumab, cyclophosphamide/ carboplatin and a taxane (paclitaxel/ docetaxel) was the most preferred therapy alternative given the benefits and the risks associated with each of the alternatives. Conclusion: Our study thus evaluated the use of newer analytical techniques such as ML algorithms and MCDA to evaluate certain outcomes. Our study suggests that ML algorithms were more accurate in predicting CVD risk in cancer patients. In addition, our MCDA model suggested that the breast cancer therapy regimen with trastuzumab, cyclophosphamide/ carboplatin and a taxane was the most preferred alternative considering the survival and adverse events benefits and risks.


© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission