Boosting algorithms for predicting end-point temperature in BOF steelmaking using big industrial datasets

Jian-bo Zhang, Maryam Khaksar Ghalati, Jun Fu, Xiao-an Yang, G. M. A. M. El-Fallah and Hongbiao Dongh

Abstract

The application of machine learning was investigated for predicting end-point temperature in the basic oxygen furnace steelmaking process, addressing gaps in the field, particularly large-scale dataset sizes and the underutilization of boosting algorithms. Utilizing a substantial dataset containing over 20,000 heats, significantly bigger than those in previous studies, a comprehensive evaluation of five advanced machine learning models was conducted. These include four ensemble learning algorithms: XGBoost, LightGBM, CatBoost (three boosting algorithms), along with random forest (a bagging algorithm), as well as a neural network model, namely the multilayer perceptron. Our comparative analysis reveals that Bayesian-optimized boosting models demonstrate exceptional robustness and accuracy, achieving the highest R-squared values, the lowest root mean square error, and lowest mean absolute error, along with the best hit ratio. CatBoost exhibited superior performance, with its test R-squared improving by 4.2% compared to that of the random forest and by 0.8% compared to that of the multilayer perceptron. This highlights the efficacy of boosting algorithms in refining complex industrial processes. Additionally, our investigation into the impact of varying dataset sizes, ranging from 500 to 20,000 heats, on model accuracy underscores the importance of leveraging larger-scale datasets to improve the accuracy and stability of predictive models.