Improving the prediction of global solar radiation using interpretable boosting algorithms coupled SHAP and LIME analysis: a comparative study

Merabet, K.; Daif, N.; Di Nunno, F.; Granata, F.; Difi, S.; Kisi, O.; Heddam, S.; Kim, S.; Zounemat-Kermani, M.

doi:10.1007/s00704-025-05507-x

Solar radiation prediction plays a vital role in many areas of hydrological and water resources planning and management. However, the need for a machine learning (ML) model’s interpretability and explainability has motivated the use of various interpretability methods. For these reasons, the present study was oriented toward the development of robust ML models based on boosting algorithms and enhanced using SHapley Addictive exPlanations (SHAP) and local interpretable model-agnostic explanations (LIME) algorithms. Six boosting algorithms were used in the present study: adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), light gradient boosting machine (LightGBM), natural gradient boosting (NGBoost), and histogram gradient boosting (HistGBRT). All models were developed using data collected at the USGS 02187010 station and composed from various weather variables. All models were evaluated using root mean squared error (RMSE), the mean absolute error (MAE), the coefficient of correlation (R), and the Nash–Sutcliffe efficiency (NSE), based on two different scenarios: (i) scenario 1 using only weather variables, and (ii) scenario 2 weather variables combined with periodicity numbers, i.e., day, month, and year number. The obtained results indicate that the proposed boosting models using periodicity outperform the single models without periodicity, and excellent numerical performances were obtained. For scenario 1, the best accuracy was obtained using the CatBoost1 with R, NSE, RMSE, and MAE values of 0.835, 0.697, 44.407 W/m2, and 34.721 W/m2, respectively. Using scenario 2, the performances of the models were improved, showing the R, NSE, RMSE, and MAE values significantly improved reaching the values of 0.925, 0.856, 30.617 W/m2, and 22.925 W/m2, respectively, obtained using the CatBoost1 and HistGBRT1.