Using machine learning methods to identify significant variables for the prediction of first-year Informatics Engineering students dropout

Abstract

Student dropout is a phenomenon that affects all higher education institutions in Chile, with costs for people, institutions and the State the reported retention rate of first year students for all Chilean universities was of 75%. Despite the extensive research and the implementation of various models to identify dropout causes and risk groups, few of them have been carried out in the Chilean higher education context.Our work attempts to identify, using machine learning methods, the variables with highest predictive value for student dropout by the end of the first year of study, within a 6-year Informatics Engineering programme with a rather high dropout rate of 21.9% reported on 2018. In that regard, we use the data of 4 cohorts of students (2012-2016) enrolled at the programme, to feed a random forest feature selection process. We later build a decision tree using the identified relevant features, which we later test using data of the 2017-2018 cohorts of students.Despite the fact that the decision tree is over-fitted (97,21% training accuracy against 81.01% test accuracy), the process sheds light on the nature of the variables that determine whether or not a student remains at the end of their first year of study at the University. 6 of the identified factors are academic, and the remaining one is social-cultural. © 2020 IEEE.

Publication
Proceedings - International Conference of the Chilean Computer Science Society, SCCC
Felipe Bello
Felipe Bello
Assistant Professor
José Luis Jara
José Luis Jara
Associate Professor