Machine learning approaches for enhanced estimation of reference evapotranspiration (ETo): a comparative evaluation - Scientific Reports

This expanded evaluation framework provides a holistic assessment of model performance, capturing absolute error, bias, percentage-based error, and overall predictive agreement across different input feature scenarios.

To examine the temporal characteristics of the meteorological variables used in the modeling process, autocorrelation (ACF) and partial autocorrelation (PACF) analyses were performed. These analyses were applied to maximum temperature (Tx), minimum temperature (Tn), wind speed at 2 m (U2), solar radiation (Rs), maximum relative humidity (HRx), minimum relative humidity (HRn), and reference evapotranspiration (ETrs).

Figures 3, 4, 5, 6, 7, 8 and 9 display the ACF and PACF plots for each variable. As shown in Figs. 3 and 4, both Tx and Tn exhibit strong seasonal behavior, with ACF patterns displaying a sinusoidal structure and slowly decaying correlations. Their PACF plots reveal significant correlations at lag 1 and lag 2, followed by a rapid decline, indicating that while these variables exhibit long-term seasonal dependencies, short-term autoregressive influence is relatively limited.

In contrast, U2 (Fig. 5) shows a sharp initial drop in the ACF and weak persistence beyond lag 10, while its PACF indicates significance only at the first few lags. This suggests limited temporal memory and minimal autocorrelation in the wind speed data.

Solar radiation (Fig. 6) also displays a pronounced seasonal pattern similar to temperature, with high autocorrelation at lags associated with the annual cycle. The PACF for Rs confirms significant short-term lags (1-3), supporting the relevance of recent values in predictive modeling.

Relative humidity (Figs. 7 and 8) demonstrates substantially weaker autocorrelation. Both HRx and HRn exhibit a rapid decline in the ACF and limited significant lags in the PACF, indicating near-random temporal behavior. This justifies treating these variables as independent across daily time steps in non-sequential machine learning models.

Reference evapotranspiration (ETrs) follows a seasonal autocorrelation pattern (Fig. 9), with high ACF values persisting over annual cycles, and significant PACF lags at positions 1-5. This suggests that ETrs, while seasonally dependent, can be effectively modeled using daily meteorological inputs without requiring explicit time-series modeling frameworks.

These findings collectively support the application of machine learning models in a static (non-temporal) framework, as most variables exhibit weak short-term temporal dependencies that can be implicitly captured by data-driven models.

The correlation matrix presented in Fig. 10 provides insight into the linear relationships among the input variables and ETrs. Solar radiation (Rs) exhibited the highest positive correlation with ETrs (r = 0.83), followed by maximum temperature (r = 0.78) and minimum temperature (r = 0.72). These strong correlations highlight the critical role of radiative and thermal energy in driving evapotranspiration.

Wind speed (U2) showed a moderate positive correlation with ETrs (r = 0.57), reflecting its contribution to vapor transport and surface moisture removal. In contrast, maximum and minimum relative humidity (HRx and HRn) demonstrated negative correlations with ETrs (r = - 0.40 and r = - 0.62, respectively), indicating that higher humidity suppresses evapotranspiration by reducing the vapor pressure deficit.

Strong inter-variable correlations were also observed, particularly between Tx and Tn (r = 0.88), and between HRx and HRn (r = 0.54), suggesting potential multicollinearity. These relationships were considered during feature selection and scenario design to ensure model generalizability and reduce redundancy.

Overall, the correlation analysis guided the selection of the most informative features for predictive modeling. Rs, Tx, and Tn emerged as the most influential variables, while U2 and humidity metrics provided complementary information useful in capturing complex evapotranspiration dynamics.

The performance of three machine learning models -- K-Nearest Neighbors (KNN), Decision Tree (DT), and Random Forest (RF) -- was evaluated under four input feature scenarios: (i) all features combined, (ii) three-feature combinations, (iii) two-feature combinations, and (iv) single-feature inputs. Model performance was assessed using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), the coefficient of determination (R²), Mean Bias Error (MBE), Nash-Sutcliffe Efficiency (NSE), and KGE. Summary results are presented in Figs. 11, 12, 13, 14, 15, 16 and 17.

The effect of algorithmic parameter tuning was assessed for KNN, DT, and RF using ten discrete parameter levels (1-10). Evaluation metrics included RMSE, MAE, R², MBE, KGE, and NSE. Tukey's HSD test (α = 0.05) was applied for statistical grouping.

RMSE As illustrated in Fig. 11, all algorithms exhibited decreasing RMSE with increasing parameter values. Decision Tree consistently produced the highest RMSE values, starting at 1.97 (group A) for parameter 1 and improving to 0.65 (group J) at parameter 10. KNN demonstrated a gradual reduction in RMSE, ranging from 0.64 (A) to 0.45 (J). Random Forest yielded the lowest RMSE values across all parameter settings, ranging from 0.47 (group r) to 0.43 (group J).

Significant differences were observed between algorithms at each parameter value. Random Forest outperformed Decision Tree at all levels and achieved statistical groupings that indicate significantly better performance (groups r-J vs. A-J). Performance gains plateaued after parameter value 7 for all models.

Figure 12 shows MAE trends mirrored those of RMSE. Decision Tree exhibited the highest MAE at parameter 1 (1.48, group A) and the lowest at parameter 10 (0.43, group J). KNN showed improved performance from 0.41 (group A) to 0.32 (group J), while Random Forest again yielded the lowest MAE, decreasing from 0.24 (group r) to 0.22 (group J).

The Random Forest consistently fell into the lowest statistical groupings, indicating its superior accuracy in minimizing absolute prediction error.

Figure 13 shows coefficient of determination (R²) values improved with increasing parameter settings. Random Forest achieved near-perfect R² values across all levels (0.96 to 0.99, groups ¤-J). KNN followed closely, reaching R² = 0.99 (group J) at parameter 10. In contrast, Decision Tree exhibited notably lower R² values, especially at lower parameter levels (0.52 at parameter 1, group A), with maximum improvement to 0.94 (group J) at parameter 10.

These results indicate better model fit and generalization by Random Forest and KNN, especially at higher parameter levels.

Mean Bias Error (MBE) analysis (Fig. 14) revealed systematic prediction biases. KNN and Random Forest predominantly exhibited negative MBE values, suggesting a tendency to underpredict, with MBE values ranging from - 0.04 to - 0.01. Conversely, Decision Tree showed a positive bias, with peak MBE at 0.077 (group D) at parameter 4.

This indicates a structural tendency of Decision Trees toward overestimation, particularly at mid-range parameter values.

Figure 15 shows model performance, as measured by the Kling-Gupta Efficiency (KGE), improved consistently with increasing parameter values. At the lowest parameter setting (1), KNN and Random Forest performed moderately (KGE ≈ 0.85), while Decision Tree lagged significantly (KGE ≈ 0.65), marked by group A. As the parameter value increased, all models showed steady improvement. By parameter value 6, KGE values exceeded 0.98 across models, with Random Forest slightly outperforming others, and statistical differences becoming negligible (group F and beyond). From parameter value 7 onward, all three models achieved near-perfect performance (KGE > 0.99) and shared the same statistical groupings (G-J), indicating no significant differences. Overall, Random Forest maintained the most consistent top-tier performance, while Decision Tree showed the greatest relative improvement as parameter values increased.

Figure 16 shows model performance, measured by the Nash-Sutcliffe Efficiency (NSE), followed a trend similar to that observed for KGE. At parameter value 1, performance was lowest, particularly for the Decision Tree model, which achieved an NSE of approximately 0.55 (group A), while KNN and Random Forest scored higher (around 0.85). As the parameter value increased, NSE improved for all models. From parameter value 6 onward, NSE values exceeded 0.95 for all models, with Random Forest showing slightly higher stability. By parameter value 9 and 10, all models achieved NSE values above 0.98, and statistical groupings converged to group J, indicating no significant performance differences. Overall, Random Forest again showed the most consistent and high performance across all parameter settings, while Decision Tree showed the largest improvement range.

The performance of the three machine learning models -- K-Nearest Neighbors (KNN), Decision Tree (DT), and Random Forest (RF) -- was evaluated in terms of RMSE, MAE, R², MBE, KGE, and NSE across 13 input feature scenarios. Error bars indicate the variability of all terms, and group letters denote statistically significant differences (p < 0.05) among the scenarios for each model.

The performance of the three models -- K-Nearest Neighbors (KNN), Decision Tree, and Random Forest -- was evaluated across multiple scenarios using Root Mean Square Error (RMSE) as the accuracy metric. Overall, Random Forest consistently outperformed the other models, especially in scenarios combining multiple features. For instance, in the most comprehensive scenario involving all variables (Tx + Tn + HRx + HRn + Rs + U2), both Random Forest and KNN achieved the lowest RMSE of approximately 0.5 (group letter O), indicating superior predictive accuracy.

In simpler scenarios such as Rs, KNN slightly outperformed the others with an RMSE around 1.6 (group A). However, as additional features were incorporated, Random Forest showed clear advantages. For example, in scenarios like Tx + Tn + U2 and HRx + HRn + Rs, Random Forest achieved RMSE values as low as 0.9 and 1.2, outperforming KNN and Decision Tree, which recorded higher errors.

Decision Tree generally exhibited the highest RMSE across scenarios, particularly in complex combinations involving Tx and U2, with RMSE often exceeding 1.5. This suggests it is less effective in capturing complex interactions between features compared to the other models.

Error bars representing the standard deviation across multiple runs reveal that Random Forest predictions were not only more accurate but also more stable, showing smaller variance than both KNN and Decision Tree. KNN demonstrated competitive performance in several scenarios but with slightly higher variability, while Decision Tree showed the greatest inconsistency.

Statistical significance indicated by group letters (ranging from A to O) confirms that the differences in RMSE among models are meaningful. Random Forest dominated the higher-ranked groups (M through O) in scenarios with many combined features, while KNN excelled in simpler cases. Decision Tree's grouping often overlapped with KNN but generally fell behind Random Forest.

These findings highlight Random Forest's robustness and ability to leverage multiple feature interactions effectively, resulting in both more accurate and consistent predictions compared to KNN and Decision Tree.

Figure 18 shows the evaluation of the models using Mean Absolute Error (MAE) across various scenarios reveals trends consistent with the RMSE findings. Random Forest consistently delivers the lowest MAE values in most scenarios, particularly in those involving multiple combined features. For example, in the most comprehensive scenario (Tx + Tn + HRx + HRn + Rs + U2), Random Forest achieves an MAE of approximately 0.3 (group letter O), reflecting highly accurate predictions. KNN shows competitive performance in simpler scenarios such as Rs, with an MAE around 1.2 (group A), but generally lags behind Random Forest as more features are introduced. Decision Tree tends to have the highest MAE values across scenarios, especially when the number of features increases, indicating less precise predictions.

Error bars representing variability further emphasize Random Forest's superior stability, with smaller error margins compared to the more variable performances of KNN and Decision Tree. Statistical groupings reinforce these observations, with Random Forest dominating the top-performing groups in complex scenarios, while KNN and Decision Tree tend to share overlapping groups with lower performance rankings.

Figure 19 shows the coefficient of determination (R) was used to evaluate the goodness of fit of the models across the various scenarios. Consistent with RMSE and MAE results, Random Forest generally achieved the highest R values, indicating superior explanatory power and prediction accuracy. For example, in complex scenarios such as Tx + Tn + HRx + HRn + Rs + U2, Random Forest attained an R close to 0.98 (group letter O), reflecting near-perfect model fit. KNN followed closely, particularly in simpler scenarios like Rs with R ≈ 0.7 (group A) but showed more variability in more complex combinations. Decision Tree consistently exhibited the lowest R values and larger error bars, highlighting its limited capacity to capture complex relationships within the data.

Error bars denoting standard deviation across multiple runs again confirm Random Forest's stability and reliability compared to the more inconsistent performances of KNN and Decision Tree. The group letters corroborate that Random Forest holds significant statistical advantage in the majority of the scenarios, especially as the number of features increases.

Figure 20 depicts MBE analysis across different scenarios reveals the bias direction and magnitude in the models' predictions. Random Forest generally shows minimal bias, with MBE values close to zero in most scenarios, especially in complex feature combinations such as Tx + Tn + HRx + HRn + Rs + U2 (MBE close to 0, group letter O). KNN tends to exhibit a slight positive bias in simpler scenarios like Rs and Tx + Tn, while Decision Tree shows more variable bias, sometimes overestimating and other times underestimating, especially in scenarios involving U2 and Tx + Tn + HRx + HRn.

Error bars indicate that Random Forest maintains a consistent and stable bias across runs, reinforcing its reliability. Conversely, Decision Tree displays larger variability in bias, indicating less dependable predictions.

Figure 21 presents the NSE metric, which measures the predictive skill of the models relative to the observed mean, further supports previous findings. Across all scenarios, Random Forest consistently achieves the highest NSE values, often exceeding 0.9 in complex feature combinations like Tx + Tn + HRx + HRn + Rs + U2 (group O), indicating excellent model performance. KNN closely follows, particularly in simpler scenarios such as Rs with NSE around 0.7 (group A). Decision Tree generally displays the lowest NSE values and higher variability, indicating weaker predictive skill.

Error bars confirm Random Forest's superior stability, showing lower variance compared to KNN and Decision Tree. The group letters further emphasize the statistical significance of Random Forest's better performance across most scenarios.

The results in Fig. 22 show that model performance improved with the inclusion of more meteorological variables. The highest KGE was achieved by the Random Forest model (0.97) using the full input set (Tx + Tn + HRx + HRn + Rs + U2), followed closely by KNN (0.96) and Decision Tree (0.91). In contrast, the lowest performance was observed in the U2-only scenario, where Random Forest and KNN scored 0.50 and Decision Tree 0.45. Moderate results were seen in configurations like Tx + Tn + Rs (Random Forest: 0.88) and Rs + U2 (0.91). Overall, Random Forest consistently outperformed the other models across all scenarios, with KNN close behind in more complex input combinations. Decision Tree had the weakest performance, particularly in limited-input scenarios.

The predictive performance of three machine learning models K-Nearest Neighbors (KNN), Decision Tree (DT), and Random Forest (RF) was evaluated for estimating reference evapotranspiration (ETo) under five distinct scenarios (S1 to S5). Key performance metrics including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Coefficient of Determination (R²), Mean Bias Error (MBE), Nash-Sutcliffe Efficiency (NSE), and KGE were used to assess model accuracy by comparing predicted against observed ETo values.

In Scenario 1(Fig. 23), the Random Forest (RF) model again outperforms the K-Nearest Neighbors (KNN) and Decision Tree (DT) models, with the lowest RMSE of 0.516 and MAE of 0.312, indicating better prediction accuracy. The RF model has a high R² of 0.964, showing a very strong correlation between predicted and observed ETo values. The MBE for RF is -0.024, indicating minimal bias, and the model achieves NSE of 0.964, further confirming its reliability. The KNN model shows good performance with RMSE 0.595 and MAE 0.374, while the DT model performs slightly worse with RMSE 0.698 and MAE 0.457. Both KNN and DT have slightly lower R² values (0.953 and 0.93, respectively) compared to RF. Overall, the RF model delivers the most accurate and consistent ETo predictions for Scenario 1.

In Scenario 2 (Fig. 24), the RF model demonstrates superior predictive accuracy with an RMSE of 0.926, an MAE of 0.625, and an R² of 0.885, indicating a strong fit between predicted and observed ETo values. The RF model also maintains a low MBE of -0.027, suggesting minimal systematic bias, and achieves NSE of 0.885, underscoring its reliability. The KNN model exhibits slightly lower performance with an RMSE of 1.064, MAE of 0.682, and R² of 0.848, but maintains negligible bias (MBE = 0.002) and a respectable NSE of 0.860. The Decision Tree (DT) model performs moderately with an RMSE of 1.093, MAE of 0.714, and R² of 0.80, while showing a near-zero bias (MBE = 0.002) and NSE of 0.840. Visual inspection confirms that RF predictions cluster more tightly around the 1:1 line compared to KNN and DT, reflecting its enhanced precision and consistency for this scenario. Overall, Random Forest outperforms the other models in accurately estimating ETo under the conditions of Scenario 2.

In Scenario 3 (Fig. 25), the RF model again demonstrates the best predictive performance, with an RMSE of 1.063 and an MAE of 0.739, indicating lower overall prediction errors compared to the other models. The RF model achieves a solid R² value of 0.848, which reflects a strong correlation between predicted and observed ETo values. Additionally, it maintains a minimal bias, with an MBE of -0.005, and an NSE of 0.848, signifying good model reliability. The KNN model shows slightly poorer performance, with a higher RMSE of 1.183 and MAE of 0.755, but with comparable R² (0.813) and a small negative bias (MBE = -0.012). The Decision Tree (DT) model performs moderately, with an RMSE of 1.144 and MAE of 0.796, accompanied by an R² of 0.825 and a slightly larger negative bias (MBE = -0.041). The RF model's predictions align more closely with the 1:1 reference line, indicating more consistent and accurate ETo estimation for this scenario. Overall, Random Forest continues to be the most robust model in predicting ETo under Scenario 3 conditions.

For Scenario 4 (Fig. 26), the RF model again exhibits superior performance with an RMSE of 0.624 and an MAE of 0.402, indicating relatively low prediction errors. It achieves the highest R² value of 0.948, reflecting a strong agreement between predicted and observed ETo values. The RF model shows minimal bias with an MBE of -0.028 and maintains an NSE of 0.948, which suggests robust predictive accuracy and reliability. DT model performs moderately well, with an RMSE of 0.781 and MAE of 0.522, a slightly lower R² of 0.918, and a small negative bias (MBE = -0.025). KNN model shows the highest RMSE and MAE values among the three models at 0.691 and 0.477 respectively, with an R² of 0.936, indicating slightly less accurate predictions compared to RF and DT. The RF model's predictions align most closely with the 1:1 reference line, confirming its reliability and accuracy in predicting ETo for this scenario.

In Scenario 5 (Fig. 27), the RF model again demonstrates the best performance among the three algorithms, achieving an RMSE of 0.852 and an MAE of 0.613, which are lower than those of KNN and DT models. The RF model has a high coefficient of determination (R²) of 0.930, indicating a strong correlation between predicted and observed ETo values. The model's bias is minimal, with an MBE of -0.005, and it maintains a high NSE of 0.903, highlighting its accuracy and reliability. The DT model shows moderate performance with RMSE and MAE values of 1.046 and 0.736, respectively, and an R² of 0.853. The KNN model shows an RMSE of 0.929 and an MAE of 0.640, with an R² of 0.884, performing slightly better than DT but worse than RF. Overall, the RF model consistently provides the most accurate and precise predictions for ETo across this scenario.

Machine learning approaches for enhanced estimation of reference evapotranspiration (ETo): a comparative evaluation - Scientific Reports

POPULAR CATEGORY

misc

entertainment

corporate

research

wellness

athletics