Wavelet-Integrated Machine Learning Models for Predicting Marine Chlorophyll-a Concentration along the California Coast
Publication Date : May-18-2026
Author(s) :
Volume/Issue :
Abstract :
In recent years, algal blooms have occurred with greater frequency and intensity along the southern and central California coast. Accurate forecasting of blooms is challenging due to the numerous environmental factors that can influence algal growth. Marine chlorophyll-a concentration is one of the key indicators that can be used in monitoring and predicting algal blooms. Previous research efforts that used machine learning models to predict chlorophyll-a concentration in the southern to central California coastal region were mostly targeted at individual locations and used datasets covering fewer than eight years before 2019. In this study, wavelet analysis (WA) was used to pre-process chlorophyll-a marine long-duration time-series data to increase its suitability for machine learning by removing noise while retaining short-term spikes. SVR, Random Forest, XGBoost, ANN and LSTM machine learning models were then applied to the WA-integrated data pipeline along with water quality and meteorological inputs to predict chlorophyll-a concentration at three locations along the southern to central California coast. Additionally, datasets spanning from 2008 to 2025 were employed to address the shorter durations of the previous studies. The WA-ANN model achieved the overall best performance (Scripps Pier R^2 = 0.88, Cal Poly Pier R^2 = 0.79, Stearns Wharf R^2 = 0.75) for the three locations, accurately capturing the spikes indicative of algal blooms.
