Abstract |
With an increase in the urban population, environmental pollution is drastically increased. Air pollution is one of the significant issues in smart cities. The higher value of PM2.5 can cause various health issues like respiratory disease, heart attack, lung disease, and fatigue. Predicting PM2.5 can help the administration to warn people at risk and make scientific measures to reduce pollution. Existing work has utilized various regression models to predict air pollution; however, different feature selection techniques with the regression algorithm have not yet been explored. This paper has implemented five feature selection techniques (namely, Recursive Feature Elimination, Analysis of Variance, Random Forest, Variance Threshold, and Light Gradient Boosting) to select the best features. Further, six regression algorithms and ensemble models (Extra Tree, Decision Tree, XGBoost, Random Forest, Light GBM, and AdaBoost) are applied to predict PM2.5 using python language on the dataset of five cities of China. The models are compared based on the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-2 parameters. We observed that the AdaBoost algorithm with the Light GBM feature selection technique gives the highest performance among all the five datasets. The highest performance values (MAE 0.07, RMSE 0.14, and R-2 0.94) are given by the AdaBoost algorithm with LightGBM feature selection on the Chengdu dataset. The computed feature importance has shown that humidity, cbwd, dew point, and pressure play an essential role in air pollution. |