Influence of exogenous factors on water demand forecasting models during the COVID-19 period

Elsevier, Engineering Applications of Artificial Intelligence, Volume 117, Part A, 2023, 105617
Authors: 
Manar Abu Talib, Mohamed Abdallah, Abdulrahman Abdeljaber, Omnia Abu Waraga

Water scarcity has urged the need for adequate water demand forecasting to facilitate efficient planning of municipal infrastructure. However, the development of water consumption models is challenged by the rapid environmental and socio-economic changes, particularly during unforeseen events like the COVID-19 pandemic. This study investigated the impact of COVID-19 on the efficiency of water demand prediction models, considering the lockdown measures and various exogenous features, such as previous consumption (PC) and socio-demographic (SDF), seasonal (SF), and climatic (CF) factors. Multiple ensemble models, gradient-boosting machines (GBM), extreme-gradient-boosting (XGB), light-gradient-boosting, random forest (RF), and stack regressor (STK) were examined, compared to other machine-learning techniques, multiple-linear regression (MLR), decision trees, and neural networks. The models were tested using 3-year metering records for 128,000 consumers in Dubai. The feature importance analysis indicated that PC and SDF had a significant impact on consumption rates with correlation coefficients of 0.95 and 0.74, respectively, as opposed to SF and CF, which had negligible effect. The results showed that, before COVID, RF and STK outperformed other models with a coefficient-of-determination (R2) and root-mean-squared-error (RMSE) of 0.928 and 0.039, followed by XGB at 0.923 and 0.041, respectively. However, MLR achieved the highest prediction accuracy amid COVID with R2 and RMSE of 0.90 and 0.05, followed by GBM and XGB equally at 0.83 and 0.06, respectively. An ensemble-based error prediction model was applied, resulting in up to 9.2% improvement in predictions. Overall, this research emphasized the efficiency of ensemble models in handling fluctuating data with a high degree of nonlinearity.