Workshop on Ensemble methods in Meteorology and oceanography

This workshop is intended to assess the state of the art of ensemble methods, in the forecast as well as the data assimilation domains, identify applications, the possibilities and limitations of these methods, and the future research directions. The workshop range is very wide, covering every applications linked to meteorology and oceanography, like atmospheric chemistry or hydrology.

Place and access

The workshop will take place in Jussieu University:
    Bâtiment Esclangon
       * on 15th may, from 2.00PM to 6.00PM: amphithéâtre Durand
       * on 16th may, from 9.00AM to 1.00PM: amphithéâtre Astier
    4 place de Jussieu
    75006 PARIS
Access using the main entrance (see map below) or secondary entrance, in front of tower 46.
Metro line (M10), Jussieu station



THURSDAY 15th MAY afternoon
2.00-2.25 G. Evensen
(Norsk Hydro, NO)
Using the EnKF for combined state and parameter estimation Abstract
2.25-2.30 Questions
2.30-2.55 M. Leutbecher
The ECMWF EPS: Current status and future plans Abstract
2.55-3.00 Questions
3.00-3.25 F. Le Gland
Large sample asymptotics for the ensemble Kalman filter Abstract
3.25-3.30 Questions
3.30-4.00 Coffee break
4.00-4.25 O. Talagrand
Validation des ensembles Abstract
4.25-4.30 Questions
4.30-4.55 R. Vautard
Méthodes ensemblistes pour la simulation de la qualité de l'air Abstract
4.55-5.00 Questions
5.00-5.25 L. Descamps, L. Berre
Ensemble assimilation and prediction at Météo-France Abstract
5.25-5.30 Questions
5.30-6.00 DISCUSSION

FRIDAY 16th MAY morning
9.00-9.25 C. Snyder
Obstacles to particle filtering in high-dimensional systems Abstract
9.25-9.30 Questions
9.30-9.55 O. Pannekoucke
Diagnostic, estimation et modélisation des covariances d'erreur de prévision à l'aide d'un ensemble de prévisions perturbées Abstract
9.55-10.00 Questions
10.00-10.25 L. Raynaud
Filtrage optimal des variances d'erreur d'ébauche estimées à l'aide d'un ensemble d'assimilations de taille finie Abstract
10.25-10.30 Questions
10.30-11.00 Coffee break
11.00-11.25 V. Mallet
Prévision d'ensemble par apprentissage statistique et application à la qualité de l'air Abstract
11.25-11.30 Questions
11.30-11.55 S. Rémy
Ensemble Kalman Filter in a boundary layer 1D numerical model Abstract
11.55-12.00 Questions
12.00-12.25 J. Jumelet
Statistical estimation of stratospheric cloud size distribution by combining microphysical/optical modelling and lidar measurements Abstract
12.25-12.30 Questions


up arrow

Geir EVENSEN, Using the EnKF for combined state and parameter estimation

Traditional parameter estimation relies on minimization of a cost function with respect to variation of a set of poorly known model parameters. The parameter estimation problem is associated with strong nonlinearities, even using simple and linear dynamical models, and the problem is hard to solve.
In a Bayesian framework, it is possible to formulate the "Combined state and parameter estimation problem", which can be solved using ensemble methods. It turns out that the ensemble formulation leads to a well posed problem and it is possible to invert with respect to large parameter and state spaces, even using a limited ensemble size. The EnKF is currently being used for parameter and state estimation in oil-reservoirs, and provides for operational systems where the model parameters and state are updated sequentially in time. These systems now allow for real time forecasting of oil production with uncertainty, in addition to a better characterization of the reservoir. This presentation will provide a brief review of parameter estimation and data assimilation using the EnKF in reservoir models.

Link to presentation: ppt file up arrow

Martin LEUTBECHER, Judith BERNER, Roberto BUIZZA, Renate HAGEDORN, Lars ISAKSEN, Thomas JUNG, Tim PALMER, Glenn SHUTTS & Frederic VITART, The ECMWF EPS: Current status and future plans

The ECMWF EPS is a global Ensemble Prediction System which provides weather forecasts twice daily up to a forecast range of 15 days and once weekly up to a forecast range of 32 days. Seamless predictions into the longer forecast ranges use higher resolution (~50km) in the first 10 days and lower resolution (~80 km) thereafter. A brief overview of the EPS system is given; it will cover the methods to perturb initial conditions and model tendencies, the configuration of the forecast model, and the configuration of the re-forecast suite for calibrating probabilistic forecasts.
Next, an overview is given on current research to improve the EPS. This comprises (i) a scheme representing model uncertainty associated with the backscatter of kinetic energy dissipated in the subgrid-scale, (ii) an improved representation of initial uncertainty in the tropics using singular vectors, (iii) the use of ensembles of 4D-Var using perturbed observations to represent initial uncertainty.
Finally, the benefit for ensemble forecasts of using flow-dependent estimates of initial uncertainty will be discussed using an idealized framework based on the Lorenz-95 system. Accurate flow-dependent estimates of initial uncertainty are obtained with an extended Kalman Filter. Ensemble forecasts using these initial uncertainties provide a benchmark system. This system will be compared with ensemble forecasts using poorer estimates of initial uncertainty that are either time-invariant or have an error in the day-to-day variations of initial uncertainty.

Link to presentation: pdf file up arrow

François LE GLAND, Valérie MONBET & Vu-Duc TRAN, Large sample asymptotics for the ensemble Kalman filter

The ensemble Kalman filter (EnKF) has been proposed in sequential data assimilation, where state vectors of huge dimension (e.g. resulting from the discretization of pressure and velocity fields over a continent, as considered in meteorology) should be estimated from noisy measurements (e.g. collected at sparse in-situ stations).
Even if the state and measurement equations are linear with additive Gaussian white noise, computing and storing the error covariance matrices involved in the Kalman filter is practically impossible, and it has been proposed to represent the filtering distribution with a sample (ensemble) of a few elements and to think of the corresponding empirical covariance matrix as an approximation of the untractable error covariance matrix. Extensions to nonlinear state equations have also been proposed.
Surprisingly, very little is known about the asymptotic behaviour of the EnKF, whereas on the other hand, the asymptotic behaviour of many different classes of particle filters is well understood, as the number of particles goes to infinity. Interpreting the ensemble elements as a population of particles with mean-field interactions (and not merely as an instrumental device producing the ensemble mean value as an estimate of the hidden state), we prove the convergence of the EnKF, with the classical rate 1/sqrt(N), as the number N of ensemble elements increases to infinity. In the linear case, the limit of the empirical distribution of the ensemble elements is the usual (Gaussian distribution associated with the) Kalman filter, as expected, but in the more general case of a nonlinear state equation with linear observations, this limit differs from the usual Bayesian filter, and still has to be characterized. To get the correct limit in this case, the mechanism that generates the elements in the EnKF should be interpreted as a proposal importance distribution, and appropriate importance weights should be assigned to the ensemble elements.

Link to presentation: pdf file up arrow

Olivier TALAGRAND, Guillem CANDILLE, Validation des ensembles

Ensembles, whether they are produced as part of an assimilation or a prediction process, are meant to define a probability distribution, which is itself meant to describe our uncertainty on the state of the system under consideration. In that sense, ensembles are intended at describing an object that is not observable, and does not even have an objective existence. As a consequence, verification of ensembles against independent observations is of a totally different nature than verification of "deterministic" estimates.
It is argued that, except in extreme cases, it makes no sense to speak of the quality of an individual ensemble. Validation can be only statistical. It is further argued that the quality of ensemble estimation (and, more generally, of probabilistic estimation) lies in the conjunction of two properties. These are reliability on the one hand (i.e., statistical consistency between estimated probabilities and observed frequencies of occurrence), and resolution on the other, (i.e., the property that reliably estimated probability elements are distinctly different from climatology). A number of scores are described and discussed, which objectively and quantitatively measure the degree to which those two properties are present in an ensemble estimation process. It is stressed that those various scores, which measure in particular the statistical quality of the spread of the ensembles, and are mostly used for evaluation of ensemble prediction, can be as useful for the evaluation of ensemble assimilation.
An important aspect, which has a major impact on the cost of ensemble methods, is the size of the ensembles. As concerns ensemble prediction, evidence is presented that no practical gain can be achieved from ensemble size beyond a few tens of units. The situation seems to be different for ensemble assimilation, where large dimensions (in the 100s) may be necessary for the numerical stability of the assimilation process.
The question of the size of ensembles is closely related to the question of the size of the verifying sample (the larger the ensembles, the larger the verifying sample required for validating the ensembles). It is argued that, in geophysical applications at least, the (relatively) small size of the verifying sample will always impose strong limits on what can be obtained from ensemble methods. Clear identification of those limits is highly desirable.

Link to presentation: ppt file up arrow

Robert VAUTARD, Méthodes ensemblistes pour la simulation de la qualité de l'air

La simulation de la qualité de l'air, aussi bien à l'échelle régionale qu'à l'échelle urbaine, est devenu un outil puissant de surveillance, de prévision et d'évaluation des politiques de réduction des émissions. Pourtant les modèles simulent un grand nombre de processus largement sous-contraints par les observations. Depuis quelques années, plusieurs auteurs se sont penchés sur la question de la représentation de l'incertitude dans ces simulations. Un tour d'horizon de ces études sera effectué, avec un accent particulier sur les travaux faits à l'IPSL et avec ses partenaires.

Link to presentation: ppt file up arrow

Laurent DESCAMPS, Loïk BERRE, Ensemble assimilation and prediction at Météo-France

An ensemble variational assimilation is also running in a real time pre-operational mode at Météo-France. Its principle and design, based on an ensemble of perturbed assimilations with the global Arpege model, will be firstly described. One of its original features is the use of local spatial averaging techniques (as an alternative to the Schur filter). This enables the sampling noise to be filtered out, and the flow-dependent covariance information to be efficiently extracted from a small ensemble (with e.g. six members). Efforts to validate ensemble flow-dependent variances with innovation-based estimates will be shown, together with impact studies. Applications to high resolution limited area models such as Aladin and Arome will be also described. A short-range ensemble prediction system (PEARP system) is also running operationally, once a day at 18UTC. The ensemble (11 members) is initialized combining 'blending breeding' and singular vectors techniques. The singular vectors are computed over a 12h optimisation period and for four targeted areas (northern and southern hemispheres, the tropical zone and an area that includes western Europe and the northern part of the Atlantic Ocean). PEARP uses a TL358c2.4 L55 version of the global spectral model ARPEGE. Several upgrades of the PEARP system are planned during the next two years including:

By 2009, the new PEARP system will have the same characteristics as most of the existing global EPS but a grid resolution over Europe close to most of the existing LAMEPS.

Link to Loïk BERRE's presentation : ppt file
Link to Laurent DESCAMPS's second presentation: pdf file up arrow

Chris SNYDER, Obstacles to particle filtering in high-dimensional systems

Particle filters are ensemble-based assimilation schemes that, unlike the ensemble Kalman filter, employ a fully nonlinear and non-Gaussian analysis step. Simulations for a simple example indicate that the ensemble size required for a successful particle filter scales exponentially as the state dimension increases. Asymptotic results, following the work of Bengtsson, Bickel and collaborators, are possible in two cases: one in which each prior state component is independent and identically distributed, and one in which both the prior pdf and the observation errors are Gaussian, but with general covariances. The asymptotic results assume the use of the prior as proposal distribution and depend on the fact that the observation log-likelihood has an approximately Gaussian distribution as the number of observations and state dimension increase. The results show that the required ensemble size increases exponentially with the variance of the observation log-likelihood, rather than the state dimension per se.

Link to presentation: ppt file up arrow

Olivier PANNEKOUCKE, Loïk. BERRE, Gérald DEROZIERS & Sébastien MASSART, Diagnostic, estimation et modélisation des covariances d'erreur de prévision à l'aide d'un ensemble de prévisions perturbées.

Cette présentation est consacrée à l'utilisation d'un ensemble de prévisions perturbées dans la modélisation des covariances. En reproduisant la dynamique de l'information, un tel ensemble est utilisé pour estimer et modéliser soit les fonctions de corrélation climatologiques, soit celles "du jour".
Dans une première partie, le diagnostic des variations géographiques des fonctions de corrélation est introduit via la notion de longueur de portée locale. Ce diagnostic est illustré dans le cas de l'atmosphère (ARPEGE) et de l'océan (OPA-Var).
Dans une deuxième partie, la modélisation basé sur l'hypothèse diagonale dans l'espace des ondelettes est présentée. Cette formulation permet de représenter les variations géographiques des fonctions de corrélation locales tout en offrant une réduction du bruit d'échantillonnage via une moyenne spatiale locale des fonctions de corrélation brutes. Cette approche est illustrée dans le cas d'un modèle atmosphérique (ARPEGE).
Dans la dernière partie, la modélisation utilisant l'équation de diffusion est abordée. En particulier, il est montré que le tenseur de diffusion locale peut être estimé via une hypothèse d'homogénéité locale du champ de tenseur. Cette approche est illustrée dans le cas d'un modèle de chimie atmosphérique (Ozone dans MOCAGE-PALM).

Link to presentation: pdf file up arrow

Laure RAYNAUD, Filtrage optimal des variances d'erreur d'ébauche estimées à l'aide d'un ensemble d'assimilations de taille finie

Les variances d'erreur d'ébauche sont des éléments clés de tout schéma d'assimilation, car elles caractérisent en particulier les poids respectifs de l'ébauche et des observations dans l'analyse. Un ensemble d'assimilations offre un cadre priviligié pour le calcul de variances climatologiques ou dépendantes de l'écoulement (variances  du jour ), dont l'utilisation devrait alors permettre une meilleure prise en compte des observations dans le processus d'assimilation. l'inconvénient majeur d'un ensemble d'assimilations est lié à son coût numérique élevé, qui impose l'utilisation d'ensembles de petite taille. Ceci induit un bruit d'échantillonnage néfaste pour l'estimation des variances.
Afin de tirer pleinement profit de l'estimation ensembliste des variances, des outils de filtrage doivent donc être mis en place. Pour cela, les propriétes spatiales du bruit d'échantillonnage observé sur les variances estimées sont examinées dans un cadre idéalisé 1D. Un lien étroit entre les structures spatiales du bruit d'échantillonnage et de l'erreur d'ébauche est ainsi mis en évidence de manière expérimentale. Des dérivations analytiques confirment ce résultat, et permettent de relier simplement la matrice de covariance du bruit d'échantillonnage à la matrice de covariance de l'erreur d'ébauche.
Un filtre objectif des variances ensemblistes, initialement proposé par Berre et al. (2007), est ensuite étudié. Celui-ci s'applique au champ spectral de variance, et s'exprime comme une fonction simple du rapport signal/bruit. Une méthode d'estimation de ce rapport est proposée, fondée sur la formulation analytique de la covariance du bruit d'échantillonnage. Elle permet un calcul aisé du filtre à chaque cycle d'assimilation, pour chaque paramètre du modèle et à chaque niveau vertical. Ce filtre objectif a été implémenté dans un ensemble d'assimilations fourni par le modèle global opérationnel Arpège de Météo-France, formé de six assimilations 3D-Fgat indépendantes. Les premiers résultats obtenus suggèrent que la stratégie de mise en oeuvre du filtre permet un calcul robuste du filtre à chaque analyse, et est capable de mettre en évidence des dépendances pertinentes au paramètre et au niveau vertical. Les cartes de variance filtrées ainsi obtenues sont précises et cohérentes avec la situation météorologique, les plus fortes valeurs de variance étant localisées dans les thalwegs. D'autre part, cette combinaison d'un petit ensemble d'assimilations et d'un filtrage objectif permet de représenter la dynamique temporelle de ces cartes de variance.

Link to presentation: pdf file up arrow

Vivien MALLET, Gilles STOLZ, Prévision d'ensemble par apprentissage statistique et application à la qualité de l'air

Une limitation importante des prévisions de qualité de l'air provient des fortes incertitudes dans les modèles de transport réactif et dans leurs données d'entrée. Afin de prendre en compte ces incertitudes, nous générons un ensemble de simulations qui reposent sur des formulations physiques concurrentes et des données d'entrée perturbées ou issues de sources concurrentes.
Les prévisions de l'ensemble sont ensuite combinées linéairement, en associant à chaque simulation un poids dépendant des prévisions et des observations passées. L'opération est répétée à chaque échéance de prévision et est appelée agrégation séquentielle. L'agrégation la plus simple consiste à produire une moyenne d'ensemble. En agrégeant avec les meilleurs poids au sens des moindres carrés, on obtient une combinaison appelée "superensemble" en climatologie.
D'autres méthodes, issues de l'apprentissage statistique, permettent de produire des combinaisons plus performantes en pratique. Ces méthodes bénéficient d'un cadre mathématique qui garantit, pour toute suite d'observations et pour des périodes suffisamment longues, des performances similaires à celles de la meilleure combinaison linéaire constante -- ce qui se vérifie en pratique. Nous comparerons l'approche et ses résultats pour la prévision de l'ozone (qualité de l'air) avec ceux de l'assimilation de données classique (séquentielle et variationnelle).

Link to presentation: pdf file up arrow

Samuel RÉMY, T. BERGOT, Ensemble Kalman Filter in a boundary layer 1D numerical model

COBEL-ISBA is a 1D boundary layer numerical model dedicated to the forecast of low visibility events, currently in operational use in the Paris-Charles de Gaulle international airport. A site-specific local observations system (meteorological instrumented mast, radiative fluxes, soil temperature and moisture) has been installed on the Paris-CDG airport and is used in a one-dimensional variational data assimilation (1DVAR) scheme to give initial conditions. Background errors has been diagnosed using two methods, and were shown to follow a strong diurnal cycle. This conclusion led to the development of an Ensemble Kalman Filter (EnKF), together with an adaptive covariance inflation algorithm, which was tested within the framework of an Observing System Simulated Experiment (OSSE). A hybrid Ensemble-Variational scheme has also been developed, and the three assimilations schemes (1DVAR, EnKF, Hybrid) have been compared over clear-sky and foggy situations, using more or less developed local observation systems. The influence of liquid water on the two other control variables (temperature and humidity), and the opportunity to use it as a third control variable to improve the initialisation of fog and low clouds has also been investigated.

Link to presentation: ppt file up arrow

Julien JUMELET, Slimane BEKKI, Christine DAVID & Philippe KECKHUT, Statistical estimation of stratospheric cloud size distribution by combining microphysical/optical modelling and lidar measurements

We will describe a retrieval method for stratospheric cloud size distributions. The information is provided by a microphysical and optical model (assuming that the particles are spherical) and multiwavelength lidar backscatter coefficients. The errors on the lidar backscatter coefficients are explicitly taken into account in the statistical estimation. In order to discard model-simulated outliers resulting from the strong nonlinearity of the model, a 1 sigma-filter is applied to the solution cluster. Within the filtered solution cluster, the retrieval algorithm minimizes a cost function of the misfit between measurements and model simulations. Two validation cases are presented on two polar stratospheric cloud (PSC) detected above ALOMAR (69 degrees North - Norway). The clouds were also observed with a balloon-borne optical particle counter. In nondepolarizing regions of the clouds (i.e. spherical particles), the parameters of the size distribution are successfully retrieved, especially the mode radius and the geometrical standard deviation. Other results highlight the importance of taking into account the non linearity of the model together with the lidar errors, when estimating the size distribution parameters from lidar measurements. The retrieval algorithm is then applied to another PSC event at ALOMAR that lasted about 5hours. The results show that multi-wavelength lidar data integrated over short time intervals and coupled to both Rotational Raman Technique (RRT) temperatures and the size distribution retrieval method described above can provide very useful information for the identification of PSC types and on the temporal evolution of the size distribution parameters.

Link to presentation: pdf fileTBA up arrow

up arrow