Beyond the Threshold: How Categorizing Air Quality Metrics Alters Statistical Inference and Model Performance
Publication Date : Sep-01-2025
Author(s) :
Volume/Issue :
Abstract :
Air pollution remains a critical environmental and public health issue, with harmful pollutants such as nitrogen oxides, sulfur dioxide, carbon monoxide, and volatile organic compounds posing significant risks to human health and ecosystems. This study investigates key factors influencing air pollution and evaluates how converting continuous pollutant measurements into categorical Air Quality Index (AQI) labels affects statistical inference and model performance. Using a global dataset of air quality measurements, the analysis incorporates a combination of statistical techniques— including correlation analysis and multiple linear regression—to examine pollutant sources and data transformation impacts. The results show that continuous data produced stronger correlations, higher R² values, and better prediction accuracy than categorical data. However, categorical models offered clearer interpretability and may still be useful in settings with limited data or for effective public communication. The findings offer valuable insights into the trade-offs between data accessibility and analytical precision, informing policymakers in the development of targeted mitigation strategies and sustainable air quality management practices.
