Comparative Analysis of Gradient Boosting, XGBoost, and KNN on Predicting Student Graduation in Imbalance and Balance Data Schemes
DOI:
https://doi.org/10.38035/dijemss.v6i6.5362Keywords:
Graduation Prediction, Data Imbalance, Gradient Boosting, XGBoostAbstract
The objective of this research is to compare the performance of three machine learning algorithms: Gradient Boosting, XGBoost, and K-Nearest Neighbors (KNN) in predicting student graduation using a quantitative approach and comparative experimental methods. The analysis process follows the CRISP-DM stages, which include business understanding, data understanding, data preparation, modeling, evaluation, and implementation. The dataset used consists of approximately 1,251 data points from students of the 2018–2020 cohort with an imbalanced distribution, namely 73.78% graduated on time and 6.22% did not graduate on time. The variables analyzed include academic and non-academic data, such as total credits, GPA per semester, number of repeated courses, and number of leaves. To address the data imbalance, the SMOTE-TOMEK balancing technique was applied. The results of this research indicate that XGBoost showed an improvement in performance after balancing, with accuracy, precision, recall, and F1-score reaching 1.0000. Gradient Boosting shows consistent performance with a score of 0.9992, both before and after balancing. KNN also experienced an increase in accuracy from 0.9928 to 0.9968 after the balancing process. Findings from the confusion matrix results show a significant improvement in classification. Therefore, the implementation of the SMOTE-TOMEK technique has proven effective in improving the performance of classification models on imbalanced data, and XGBoost is recommended as the main algorithm for predicting student graduation.
References
A. Anwarudin, W. Andriyani, B. P. DP, and D. Kristomo, “The Prediction on the Students’ Graduation Timeliness Using Naive Bayes Classification and K-Nearest Neighbor,” J. Intell. Softw. Syst., vol. 1, no. 1, p. 75, 2022, doi: 10.26798/jiss.v1i1.597.
D. Hermanto, D. I. Ricoida, D. Pibriana, and M. R. Pribadi, “Analysis of Student Graduation Prediction Using Machine Learning Techniques on an Imbalanced Dataset?: An Approach to Address Class Imbalance,” vol. 11, no. 3, pp. 559–568, 2024, doi: 10.15294/sji.v11i3.5528.
R. Al-Ali, K. Alhumaid, M. Khalifa, S. A. Salloum, R. Shishakly, and M. A. Almaiah, “Analyzing Socio-Academic Factors and Predictive Modeling of Student Performance Using Machine Learning Techniques,” Emerg. Sci. J., vol. 8, no. 4, pp. 1304–1319, 2024, doi: 10.28991/ESJ-2024-08-04-05.
R. Primartha, algortima machine learning. Banding: Informatika Bandung, 2021.
J. Wang, C. Jiang, and S. Member, “Thirty Years of Machine Learning?: The Road to Pareto-Optimal Wireless Networks Line of Sight”.
E. Retnoningsih and R. Pramudita, “Mengenal Machine Learning Dengan Teknik Supervised dan Unsupervised Learning Menggunakan Python,” vol. 7, no. 2, pp. 156–165, 2020.
R. S. Nurhalizah and R. Ardianto, “Analisis Supervised dan Unsupervised Learning pada Machine Learning?: Systematic Literature Review,” vol. 4, no. 1, pp. 61–72, 2024.
H. Eyke, “Towards Analogy-Based Explanations,” 2020.
A. Almalawi, B. Soh, A. Li, and H. Samra, “Predictive Models for Educational Purposes: A Systematic Review,” Big Data Cogn. Comput., vol. 8, no. 12, pp. 1–42, 2024, doi: 10.3390/bdcc8120187.
A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., vol. 91, pp. 216–231, 2019, doi: 10.1016/j.patcog.2019.02.023.
T. Agustina, A. Fitrianto, and Indahwati, “Comparison of SARIMA, Bagging Exponential Smoothing with STL Decomposition and Robust STL Decomposition for Forecasting Red Chili Production,” Int. J. Sci. Res. Sci. Eng. Technol., vol. 11, no. 2, pp. 64–73, 2024, doi: 10.32628/ijsrset2411146.
V. Atlantic, E. Sulistianingsih, and H. Perdana, “Gradient Boosting Machine Pada Klasifikasi Kelulusan Mahasiswa,” Bul. Ilm. Math. Stat. dan Ter., vol. 13, no. 2, pp. 165–174, 2024.
D. Kurniadi, F. Nuraeni, and S. M. Lestari, “Implementasi Algoritma Naïve Bayes Menggunakan Feature Forward Selection dan SMOTE Untuk Memprediksi Ketepatan Masa Studi Mahasiswa Sarjana,” J. Sist. Cerdas, vol. 5, no. 2, pp. 63–82, 2022, doi: 10.37396/jsc.v5i2.215.
D. L. Wibisono and Z. Abidin, “Prediction of Student Graduation Predicts using Hybrid 2D Convolutional Neural Network and Synthetic Minority Over-Sampling Technique,” Recursive J. Informatics, vol. 1, no. 1, pp. 27–34, 2023, doi: 10.15294/rji.v1i1.65646.
A. Bisri and R. Rachmatika, “Integrasi Gradient Boosted Trees dengan SMOTE dan Bagging untuk Deteksi Kelulusan Mahasiswa,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 8, no. 4, p. 309, 2019, doi: 10.22146/jnteti.v8i4.529.
M. W. Dwinanda, N. Satyahadewi, and W. Andani, “Classification of Student Graduation Status Using Xgboost Algorithm,” BAREKENG J. Ilmu Mat. dan Terap., vol. 17, no. 3, pp. 1785–1794, 2023, doi: 10.30598/barekengvol17iss3pp1785-1794.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Muhammad Rizki Hubu, Irfan Pratama

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish their manuscripts in this journal agree to the following conditions:
- The copyright on each article belongs to the author(s).
- The author acknowledges that the Dinasti International Journal of Education Management and Social Science (DIJEMSS) has the right to be the first to publish with a Creative Commons Attribution 4.0 International license (Attribution 4.0 International (CC BY 4.0).
- Authors can submit articles separately, arrange for the non-exclusive distribution of manuscripts that have been published in this journal into other versions (e.g., sent to the author's institutional repository, publication into books, etc.), by acknowledging that the manuscript has been published for the first time in the Dinasti International Journal of Education Management and Social Science (DIJEMSS).











































