Defense Date


Document Type


Degree Name

Doctor of Philosophy



First Advisor

Rosalyn Hobson


Understanding the reasoning behind the low enrollment and retention rates of Underrepresented Minority (URM) students (African Americans, Hispanic Americans, and Native Americans) in the disciplines of science, technology, engineering, and mathematics (STEM) has concerned many researchers for decades. Numerous studies have used traditional statistical methods to identify factors that affect and predict student retention. Recently, researchers have relied on using data mining techniques for modeling student retention in higher education [1]. This research has used neural networks for performance modeling in order to obtain an adequate understanding of factors related to first year academic success and retention of URM at Virginia Commonwealth University. This research used feed forward back-propagation architecture for modeling. The student retention model was developed based on fall to fall retention in STEM majors. The overall freshman year GPA was used to model student academic success. Each model was built in two different ways: the first was built using all available student inputs, and the second using an optimized subset of student inputs. The optimized subset of the most relevant features that comes with the student, such as demographic attributes, high school rank, and SAT test scores was formed using genetic algorithms. A further step towards understanding the retention of URM groups in STEM fields was taken by conducting a series of focus groups with participants of an intervention program at VCU. Focus groups were designed to elicit responses from participants for identifying factors that affect their retention the most and provide more knowledge about their first year experiences, academically and socially. Results of the genetic algorithm and focus groups were incorporated into building a hybrid model using the most relevant student inputs. The developed hybrid model is shown to be a valuable tool in analyzing and predicting student academic success and retention. In particular, we have shown that identifying the most relevant student inputs from the student’s perspective can be incorporated with quantitative methodologies to build a tool that can be used and interpreted effectively by people who are related to the field of STEM retention and education. Further, the hybrid model performed comparable to the model developed using the optimized set of inputs that resulted from the genetic algorithm. The GPA prediction hybrid model was tested to determine how well it would predict the GPA for all students, majority students and URM students. The root mean squared error (RMSE) on a 4.0 scale was 0.45 for all students, 0.47 for majority students, and 0.45 for URM students. The hybrid retention model was able to predict student retention correctly for 74% of all students, 79% of majority students and 60% of URM students. The hybrid model’s accuracy was increased 3% compared to the model which used the optimized set of inputs.


© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

August 2011

Included in

Engineering Commons