A new altimetry data validation approach based on Data Mining and Machine learning techniques

Romain Bergougnoux (NOVELTIS, France)


Sophie Le Gac (CNES, France); Charlotte Garcia (CNES, France); Eric Jeansou (NOVELTIS, France); Mathilde Cancet (NOVELTIS, France); Florence Toublanc (NOVELTIS, France); Sylvain Brunato (NOVELTIS, France)

Event: 2018 Ocean Surface Topography Science Team Meeting

Session: Regional and Global CAL/VAL for Assembling a Climate Data Record

Presentation type: Type Poster

The coming of “Big Data” and data treatment techniques allow scientists to extract and evaluate efficiently tendencies from large databases. In that context, the purpose of this study is to explore the potential of Data Mining and Machine Learning methods to assess the validity of altimetry measurements over ocean and compare their performances with the historical editing criteria.
Currently, the detection of spurious data in radar altimetry measurements relies on a legacy data editing method consisting in checking whether the value of some altimetric parameters is outside a validity domain defined by minimum and maximum thresholds. This historical editing method is described in the data user manuals and in the CALVAL reports of altimetric missions. It has been developed and used by the community of experts over the last 20 years.
Our study considers mainly clustering and classification techniques to assess the validity of 1 Hz SLA (Sea Level Anomalies) from 1 cycle of standard JASON-3 GDR data. Unsupervised and supervised learning techniques have been applied in order to evaluate the capability of such methods in altimetry.
The data parameterization and preparation stages are crucial before learning and validation processes, and have to be performed according to the final expected target. Indeed, filtering, standardization, principal component analysis and segmentation are applied to select decisive parameters and to build reliable classifiers. Finally, measurements validity is determined from their affiliation to such and such groups. Confusion matrix and ROC curves are produced for validation purpose in order to compare the current “editing” criterion and our results. First conclusions of our work highlight a correct classification from unsupervised learning. Supervised models implementations will also be applied on the same dataset.
A python prototype developed for this project provides a way to challenge different Data Mining and Machine Learning techniques for altimetry data. This is promising for possible implementation of such method in future altimetry data validation processes.

Poster show times:

Room Start Date End Date
Foyer, Salao Nobre & tent Thu, Sep 27 2018,18:00 Thu, Sep 27 2018,20:00
Foyer, Salao Nobre & tent Fri, Sep 28 2018,14:00 Fri, Sep 28 2018,15:00
Romain Bergougnoux