On the use of machine learning to correct NWP model sea surface wind forecasts with scatterometer data input

  • In the context of the OSI SAF Visiting Scientist Program, Evgeniia Makarova from the Institut de Ciències del Mar (ICM - CSIC) in Barcelona, worked on the implementation and validation of a number of Machine Learning (ML) algorithms that predict ERA5 sea surface wind biases based on several input parameters related to ocean and atmospheric processes, mostly available from NWP output. This work took place in 2022 and was supervised by Marcos Portabella (ICM) and Ad Stoffelen (KNMI).

  • Objectives and framework of the study

    If the ERA5 corrections may be obtained without scatterometer observations, then the corrections may also be applied in for example seasonal or climate forecasts and not only in past situations. Moreover, one may learn from ML which processes need amendment to provide more accurate forecasts, which useful for parameterization studies. Several preliminary ML setups are implemented and evaluated, which look for the functional relationship between several oceanic and atmospheric variables and the persistent NWP biases as observed in the NWP-scatterometer differences. Such variables include ECMWF model parameters, such as stress-equivalent winds and their derivatives (curl and divergence), atmospheric stability related parameters, i.e., sea-surface temperature (SST), air temperature (Ta), relative humidity (rh), surface pressure (sp), as well as SST gradients and ocean currents. This work evaluates the feasibility of such approach and provides an overview of possible implementations of this model bias regression. 

  • The work demonstrates the feasibility to predict ERA5 local biases, mainly using information based only on other NWP variables. This can be used in the operational setup for correction of the ECMWF ocean forcing forecasts in line with scatterometer-based bias adjustments applied in data assimilation and retrospective ocean forcing experiments.

  • Report conclusions

    The algorithms that are evaluated include two libraries based on Gradient Boosting Decision Trees (GBDT), such as XGBoost and LightGBM, and feed-forward neural networks (FNNs), implemented with the sklearn library (MLP Regressor) and with the Tensorflow and Keras API.

    Globally, the best performing model is a Tensorflow-based neural network with 4 hidden layers with 256, 128, 64, 32 neurons per layer, with dropout used for regularization. It shows a 5.54% of error variance reduction globally (see Figure 1), and in particular up to 7.66% in the extra-tropics, when compared to the ERA5 performance (test period). In the tropics and high latitudes, the error variance reduction is of 3.67% and 5.47%, respectively. 

  •  Mean error variance reduction of the different ML models for ERA scatterometer correction (SC) with respect to ERA5 for the test period, Mean error variance reduction of the different ML models for ERA scatterometer correction (SC) with respect to ERA5 for the test period,

    Figure 1. Mean error variance reduction of the different ML models for ERA scatterometer correction (SC) with respect to ERA5 for the test period, validated against independent scatterometer HSCAT-B over the different ocean regions. ERA*N15 is a SC reference not based on ML.

    The XGBoost and MLP Regressor libraries show slightly lower performance when validated against the independent scatterometer HSCAT-B. However, these two libraries will not be considered for future work, as XGBoost is much more computationally expensive than the feed forward neural networks, while the sklearn implementation of FNN is not scalable enough for training over large datasets.

  • Figures in the gallery below show an example of the generated corrections by a feed forward network with 4 hidden layers (second figure of the gallery) on the ERA5 stress-equivalent wind field (first figure of the gallery) forecast on 15/02/2022 at 11:00 (i.e., the +5h forecast from the 06:00 analysis).

    • ERA5 stress-equivalent wind field for 15/02/2020 11:00 UTC, the +5h forecast from the 06:00 analysis ERA5 stress-equivalent wind field for 15/02/2020 11:00 UTC, the +5h forecast from the 06:00 analysis
    • Predicted scatterometer differences Predicted scatterometer differences
  • Figure 2. ERA5 stress-equivalent wind field for 15/02/2020 11:00 UTC, the +5h forecast from the 06:00 analysis (first figure of the gallery) and predicted scatterometer differences (second figure of the gallery). Background shows wind intensity (first figure of the gallery) and predicted difference in wind intensity (second figure of the gallery). Arrows show ERA5 wind field (first figure of the gallery) and vector difference between corrected field and ERA5 (second figure of the gallery).

  • Benefits for the SAF

    • The work demonstrates the feasibility to predict ERA5 local biases, mainly using information based only on other NWP variables.
    • This can be used in the operational setup for correction of the ECMWF ocean forcing forecasts in line with scatterometer-based bias adjustments applied in data assimilation and retrospective ocean forcing experiments.
    • In addition, the model trained on the ERA5 SC can also be used to correct the biases in the reanalysis dataset before scatterometers existed.
  • Report on this study

    On the use of machine learning to correct NWP model sea surface wind forecasts with scatterometer data input

  • Authors

     

    • Evgeniia Makarova from the Institut de Ciències del Mar (ICM - CSIC) in Barcelona

    • Marcos Portabella (ICM)

    • Ad Stoffelen (KNMI)

New ticket helpdesk
CAPTCHA