Download Full Text (523 KB)


Our objective was to develop a model to predict the length of stay of patients using data from MCV. We conducted our analysis using a dataset of over 130,000 patients described by 66 features. The features contained clinical characteristics (e.g. diagnosis), facility characteristics (e.g. bed type), and socioeconomic characteristics (e.g. insurance type). Our study was focused on patients that stayed in the hospital. To cope with data imperfections, such as missing data, we applied data cleaning methods. Using learned domain knowledge, we identified 9 features to build our predictive models: admit source, primary insurance, discharge disposition, admit unit, iso result, icu order, stepdown order, general care order, and age. Regression algorithms were then applied for length of stay prediction, using two views: one with the complete dataset, and the second decomposed independently into ten most popular diagnosis outcomes. We then used regression to model the length of stay using the whole dataset as well as splitting the patients by diagnosis. This division was dictated by a high variance within the data. Obtained machine learning models were embedded in a web application created via Angular. The app allows the user to pick which disease they are modeling, the specific model(s) to use, and the values for the variables. It then computes the result and displays visualization of the weights.

Publication Date



Machine Learning, Length of Stay, Clinical Decision Support, Medical


Computer Engineering | Engineering

Faculty Advisor/Mentor

Bartosz Krawczyk

Faculty Advisor/Mentor

Vimal Mishra

VCU Capstone Design Expo Posters


© The Author(s)

Date of Submission

May 2018

Internal Medicine