Endang Fitria Rahmawati

- Thesis topic: Air Quality Prediction by Machine Learning Models: Case Study in Jakarta and Surabaya, Indonesia

- Doi:

- Abstract:

Air pollution remains a pressing global environmental challenge, particularly in developing nations characterized by high population densities, rapid urbanization, substantial energy consumption, and expanding industrial activities. Vehicular emissions are among the dominant contributors, severely impacting public health by increasing the risks of respiratory, cardiovascular, reproductive, and neurological disorders, as well as carcinogenic outcomes. Indonesian metropolitan areas such as Jakarta and Surabaya are especially vulnerable, where traffic congestion, industrial emissions, and population growth exacerbate air quality deterioration.
To address this, air quality monitoring systems are widely used to assess real-time pollutant concentrations and generate the Air Quality Index (AQI). However, most systems in Indonesia are limited to real-time assessment and lack predictive capabilities. In recent years, machine learning (ML) techniques have emerged as powerful tools for forecasting air pollution, offering new avenues for proactive environmental management and policy planning.
This study aims to develop, evaluate, and optimize machine learning models for AQI prediction in several key urban regions of Indonesia. Four ML algorithms—Light Gradient Boosting Machine (LightGBM), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost)—were employed. Hyperparameter tuning was conducted using GridSearchCV to enhance model performance and interpretability. The predictive models were trained and validated using air quality datasets from Central, North, and East Jakarta, as well as Surabaya’s Wonorejo and Kebonsari districts.
Results indicate that for Central Jakarta, CatBoost and XGBoost achieved the lowest root mean square error (RMSE) values of ±13.21 and ±13.65, respectively. In North Jakarta, CatBoost and XGBoost achieved RMSEs of ±51.12 and ±49.69, while in East Jakarta, CatBoost and Random Forest performed best with RMSEs of ±25.18 and ±11.63. For Wonorejo, XGBoost and Random Forest achieved RMSEs of ±11.63 and ±11.30, and for Kebonsari, XGBoost and CatBoost produced the most accurate results, with RMSEs of ±8.23 and ±8.82, respectively. These findings demonstrate the potential of advanced ML models in enhancing air quality forecasting accuracy and supporting data-driven environmental decision-making in urban Indonesia.

Keywords: Air Quality Index, Modelling, LightGBM, Random Forest, XGBoost, CatBoost

The Best Value-AddedService Network in Asia

Alumni