Abstract:The Northwest Pacific is one of the areas with the highest sea fog frequency globally and serves as a major shipping route. Currently, there are no dedicated sea fog prediction products for this region. Therefore, studying the characteristics and prediction of the sea fog in this area is crucial. Based on the data from International Comprehensive Ocean-Atmosphere Data Set (ICOADS) and ERA5 data from 2013 to 2023, this study analyzes the distribution characteristics of the sea fog over Northwest Pacific and develops a sea fog prediction model using the machine learning method. By calculating mutual information (MI) values, we identify 12 key factors closely related to the occurrence of sea fog, including sea surface temperature (SST), relative humidity, difference between SST and dew point temperature (tSST-td) and geographical coordinates. To address the class imbalance between fog and non-fog samples, we apply resampling techniques and assess the impacts of various sampling strategies on the model performance. The results indicate that adding geographical information as factors and applying oversampling significantly improve the model performance, and the eXtreme Gradient Boosting (XGBoost) model shows the highest threat score. The feature importance analysis indicates that the difference between SST and dew point temperature and relative humidity serve as the core factors in the sea fog prediction model. Among comparative models, the XGBoost model achieves the best overall performance, followed by the convolutional neural network (CNN) and support vector machine (SVM), and both CNN and SVM achieve a threat score above 0.3. Case studies further confirm that the XGBoost model shows the best results, demonstrating the highest agreement with the observed fog coverage. This study reveals the complexities of sea fog formation over Northwest Pacific and provides a scientific basis for sea fog prediction over open ocean areas.