Abstract:The Northwest Pacific(NWP) is one of the areas with the highest sea fog frequency globally and serves as a major shipping route. Currently, there are no dedicated sea fog prediction products for this region. Therefore, studying the characteristics and prediction in this area is crucial. Based on the 2013—2023 data from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) and ECMWF Reanalysis v5 (ERA5), this study analyzes the distribution characteristics of sea fog in the NWP and develop a sea fog prediction model. By calculating Mutual Information (MI) values, we identify 12 key factors closely related to sea fog occurrences, including sea temperature-dew point difference, relative humidity, sea surface temperature, and geographical coordinates. To address the issue of imbalanced sea fog data, we apply resampling techniques and compare the effects of different sampling methods and feature information on model performance. The results indicate that adding geographical information as factors and applying oversampling significantly improved model performance, with the eXtreme Gradient Boosting (XGBoost) model showing the most notable improvement. The feature importance analysis indicates that sea surface temperature-dew point difference and relative humidity serve as the core predictors in the sea fog prediction model. Among the machine learning models we compare, XGBoost achieves the best overall performance, followed by the Convolutional Neural Network (CNN) and the the Support Vector Machine (SVM), with both CNN and SVM achieving a TS score above 0.3. In case-specific analyses, the XGBoost model shows the best results, demonstrating the highest agreement with observed fog coverage. This study reveals the complexities of sea fog formation in the NWP and provides a scientific basis for sea fog prediction over open ocean areas.