Comparative Analysis of Gradient Boosting, XGBoost, and KNN on Predicting Student Graduation in Imbalance and Balance Data Schemes

Authors

  • Muhammad Rizki Hubu Universitas Mercu Buana Yogyakarta, Yogyakarta, Indonesia
  • Irfan Pratama Universitas Mercu Buana Yogyakarta, Yogyakarta, Indonesia

DOI:

https://doi.org/10.38035/dijemss.v6i6.5362

Keywords:

Graduation Prediction, Data Imbalance, Gradient Boosting, XGBoost

Abstract

The objective of this research is to compare the performance of three machine learning algorithms: Gradient Boosting, XGBoost, and K-Nearest Neighbors (KNN) in predicting student graduation using a quantitative approach and comparative experimental methods. The analysis process follows the CRISP-DM stages, which include business understanding, data understanding, data preparation, modeling, evaluation, and implementation. The dataset used consists of approximately 1,251 data points from students of the 2018–2020 cohort with an imbalanced distribution, namely 73.78% graduated on time and 6.22% did not graduate on time. The variables analyzed include academic and non-academic data, such as total credits, GPA per semester, number of repeated courses, and number of leaves. To address the data imbalance, the SMOTE-TOMEK balancing technique was applied. The results of this research indicate that XGBoost showed an improvement in performance after balancing, with accuracy, precision, recall, and F1-score reaching 1.0000. Gradient Boosting shows consistent performance with a score of 0.9992, both before and after balancing. KNN also experienced an increase in accuracy from 0.9928 to 0.9968 after the balancing process. Findings from the confusion matrix results show a significant improvement in classification. Therefore, the implementation of the SMOTE-TOMEK technique has proven effective in improving the performance of classification models on imbalanced data, and XGBoost is recommended as the main algorithm for predicting student graduation.

References

A. Anwarudin, W. Andriyani, B. P. DP, and D. Kristomo, “The Prediction on the Students’ Graduation Timeliness Using Naive Bayes Classification and K-Nearest Neighbor,” J. Intell. Softw. Syst., vol. 1, no. 1, p. 75, 2022, doi: 10.26798/jiss.v1i1.597.

D. Hermanto, D. I. Ricoida, D. Pibriana, and M. R. Pribadi, “Analysis of Student Graduation Prediction Using Machine Learning Techniques on an Imbalanced Dataset?: An Approach to Address Class Imbalance,” vol. 11, no. 3, pp. 559–568, 2024, doi: 10.15294/sji.v11i3.5528.

R. Al-Ali, K. Alhumaid, M. Khalifa, S. A. Salloum, R. Shishakly, and M. A. Almaiah, “Analyzing Socio-Academic Factors and Predictive Modeling of Student Performance Using Machine Learning Techniques,” Emerg. Sci. J., vol. 8, no. 4, pp. 1304–1319, 2024, doi: 10.28991/ESJ-2024-08-04-05.

R. Primartha, algortima machine learning. Banding: Informatika Bandung, 2021.

J. Wang, C. Jiang, and S. Member, “Thirty Years of Machine Learning?: The Road to Pareto-Optimal Wireless Networks Line of Sight”.

E. Retnoningsih and R. Pramudita, “Mengenal Machine Learning Dengan Teknik Supervised dan Unsupervised Learning Menggunakan Python,” vol. 7, no. 2, pp. 156–165, 2020.

R. S. Nurhalizah and R. Ardianto, “Analisis Supervised dan Unsupervised Learning pada Machine Learning?: Systematic Literature Review,” vol. 4, no. 1, pp. 61–72, 2024.

H. Eyke, “Towards Analogy-Based Explanations,” 2020.

A. Almalawi, B. Soh, A. Li, and H. Samra, “Predictive Models for Educational Purposes: A Systematic Review,” Big Data Cogn. Comput., vol. 8, no. 12, pp. 1–42, 2024, doi: 10.3390/bdcc8120187.

A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., vol. 91, pp. 216–231, 2019, doi: 10.1016/j.patcog.2019.02.023.

T. Agustina, A. Fitrianto, and Indahwati, “Comparison of SARIMA, Bagging Exponential Smoothing with STL Decomposition and Robust STL Decomposition for Forecasting Red Chili Production,” Int. J. Sci. Res. Sci. Eng. Technol., vol. 11, no. 2, pp. 64–73, 2024, doi: 10.32628/ijsrset2411146.

V. Atlantic, E. Sulistianingsih, and H. Perdana, “Gradient Boosting Machine Pada Klasifikasi Kelulusan Mahasiswa,” Bul. Ilm. Math. Stat. dan Ter., vol. 13, no. 2, pp. 165–174, 2024.

D. Kurniadi, F. Nuraeni, and S. M. Lestari, “Implementasi Algoritma Naïve Bayes Menggunakan Feature Forward Selection dan SMOTE Untuk Memprediksi Ketepatan Masa Studi Mahasiswa Sarjana,” J. Sist. Cerdas, vol. 5, no. 2, pp. 63–82, 2022, doi: 10.37396/jsc.v5i2.215.

D. L. Wibisono and Z. Abidin, “Prediction of Student Graduation Predicts using Hybrid 2D Convolutional Neural Network and Synthetic Minority Over-Sampling Technique,” Recursive J. Informatics, vol. 1, no. 1, pp. 27–34, 2023, doi: 10.15294/rji.v1i1.65646.

A. Bisri and R. Rachmatika, “Integrasi Gradient Boosted Trees dengan SMOTE dan Bagging untuk Deteksi Kelulusan Mahasiswa,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 8, no. 4, p. 309, 2019, doi: 10.22146/jnteti.v8i4.529.

M. W. Dwinanda, N. Satyahadewi, and W. Andani, “Classification of Student Graduation Status Using Xgboost Algorithm,” BAREKENG J. Ilmu Mat. dan Terap., vol. 17, no. 3, pp. 1785–1794, 2023, doi: 10.30598/barekengvol17iss3pp1785-1794.

Downloads

Published

2025-08-31

How to Cite

Hubu, M. R., & Pratama, I. (2025). Comparative Analysis of Gradient Boosting, XGBoost, and KNN on Predicting Student Graduation in Imbalance and Balance Data Schemes. Dinasti International Journal of Education Management and Social Science, 6(6), 5016–5026. https://doi.org/10.38035/dijemss.v6i6.5362