Concepts underlying the Data management shell

The aim of the data management shell (DMS) is to provide a common library for the interaction between the geophysical model and the user or other models. This library will be common to all models of the PRISM system. It will also manage the meta-data linked to the flux of information in and out of the geophysical models. In effect this shell is the layer which separates the numerical world in which all fields are defined in the model and the physical world in which the user wants to use it.

The data shell will belong to the model and will enable it to read user input, read and write to disk or communicate with other models. Thus, models which uses the data shell will be able to run in stand-alone and will not require any other components of the PRISM system. This close link between the individual geophysical models and the data shell is also need for performance issues.

The use of a common data management shell will standardize the interfaces of the models used in PRISM. This is an essential step to enable the coupling of the various components and produce a usable earth-system model.

This document will detail the concepts behind the data management shell and be the document on which the detailed design will be based.

1) The task of the data management shell

Metadata is the information that provides a definitive description of what the data in each variable represents, and of the spatial and temporal properties of the data. This enables applications and users of data to decide which quantities are need and how they should be used.

In a geophysical model variables are defined on space/time grids which are internal to the model. Unless one knows the model it is not possible to reconstruct the meaning of the data without the metadata. Thus an essential step in facilitating the coupling of models is to ensure that all data exchanged is properly described.

It will be the role of the DMS to ensure that all variables which leave the model are accompanied with a set of meta-data. This is obviously essential for the coupling of models but also for the variables written to disk as within PRISM it is expected that users will run models which were not developed in their own institution.

Most of the configuration of a model, performed by a user, deals with the input and output of the model. He decides what ancillary data should be read, with which other model it should interact and prescribes a list of fields and frequencies to be written on disk for diagnostic purposes. It is thus logical for the DMS to deal with the configuration file and pass on to the model all the internal configuration parameters.

It is thus an essential step in the coupling of models and the usage of coupled model to keep and manage the data and its metadata as one entity.

2) Interactions with the user

The user will specify to the data management shell, through an ASCII configuration file, what should be done to the data the model outputs. This configuration file is the result of a decision by the user based on information on which variables can be provided by the model and which ones he needs. It is thus essential to keep a tight feedback loop between the variables the model can generate and the variables required outside of the model. At this point the users needs a visibility of the other components of the coupled system so that requests can be matched to corresponding output information.

As the DMS will also communicate the universal constants and parameters to the model they will need to be either provided in the configuration file or a a mechanism for obtaining them needs to be described. This will ensure that all models using the DMS will use a consistent set of universal constants.

3) Interactions with other models

The DMS will perform temporal operations on the fly and scale the variables as needed before they are sent to the coupler. It is then the coupler's task to perform spatial interpolations on the fields before passing them on to the listening model.

The coupler will also obtain all the meta-data for the grids and fields directly from the DMS. It is thus the DMS's task to ensure that a complete description is available for the model.

4) Input/output from files

The DMS will handle the input and output to files for the diagnostics the restartability and the ancillary variables of the model. This exchange will allow for a number of file formats. The format for which the original developments will be made will be netCDF with the CF convention and input will be requested from the community to add support for other formats.

In order to be able to write to the file the variables as requested by the user the fields provided by the model will need to be transformed. The same is true when input to the model is required. These transformation will operate on spatial grid points, thus they will not include spatial interpolations. The DMS will be able to perform temporal operations.

5) Meta-data which will be managed by the data shell

The data management shell will receive the description of the geophysical model's grid and the variables it can output directly from the model. It will also receive a number of informations directly from the user through the configuration file. This information will be stored internally so that all data which leaves the model can be described according to the standard of the CF convention. It is also needed to identify and treat properly input data. This meta-data is key for performing the translation of the data from the numerical space of the model to the physical space the user wishes to see and vice-versa.

The meta-data managed by the model will cover two major topics :

5.1) Description of grids

We need to choose the different ways to describe grids and make sure it is coherent with the Cf convention.

The possibility will be available to provide for simple grids only a minimum of information. It will then be completed by the data management shell.

5.2) Description of variables

What does CF require to describe variable, will it be enough

6) Implementation aspects of the exchanges

6.1) Exchanges with the geophysical model

Explain that in the initialization phase of the model it will need to provide all the meta data to the DMS and receive in return handles for the various elements. During the time loop then only very simple "query", "put" and "get" functions will be needed.

6.2) Exchanges with the user

The model will offer a large choice of variables it can output and it will be for the user to select the ones it wishes to save in files or passe on to other models. But typical configurations will be provided and the user will then be able to adapt them to their scientific needs.

The list of possible outputs from the model will be generated by a parser which will rely on comments (in a format to be specified) just before the calls to the DMS subroutines. A similar procedure will be applied to the list of required variables but in this case special care has to be taken with data only required in certain conditions.

The process which allows the suer to choose should be implemented with a GUI and should be the responsibility of work-package 4b. But as the user/DMS interactions will be based on ASCII files in a first phase they can be hand-generated.

A format needs to be defined for the configuration files which will be read by the DMS.

6.3) Exchange with other models

Sophie, what the important points we need to respect for the exchange with other models ?

6.4) Writing and reading from files

We will allow for a number of formats as long as they respect a number of criteria on the associated meta-data.

7) Internal implementation issues

7.1) Internal structure of the DMS

7.2) The transformation library

The library for transformations which will be included in the DMS will include operations which do not change the spatial grid and perform various temporal operations. A list of operations is proposed :

Time operations which need to be supported :
- ave : time average of the field
- inst : Instantaneous values of the field
- t_min : The minimum value over the time period is produced
- t_max : The maximum value over the time period is produced
- l_min : The minimum value over the entire simulation will be generated (without time dimension)
- l_max : The maximum value over the entire simulation will be generated (without time dimension)
- t_sum : Sums the variable over time.
- once : The field is processed only once and thus it has no time axis
- never : The field is never treated
Point operations :
- sin : Sinus
- cos : cosines
- tan : tangent
- asin : arc-sinus
- acos : arc-cosines
- atan : arc-tangent
- exp : exponential
- log : logarithm
- sqrt : square root
- chs : change sign
- abs: absolute value
- cels : Transforms the field into degrees Celsius
- kelv : Transforms the field into degrees Kelvin
- deg : puts a field into degrees (from radian for instance)
- rad : puts field into radians
- ident : Identity, does nothing !
Field-scalar operations :
- + : addition
- - : minus
- * : multiplication
- / : division
- ^ : exponential
- min : (min(x,scal)) Minimum between field x and scal
- max : (max(x,scal)) Maximum between field x and scal
Allowed indexing operations :
- gather : gathers from the model data all the points listed in index. Other points on the grid are going to labeled as missing.
- scatter : scatters the model data onto the points which are listed in the index. Other points are going to be labeled as missing. This is used for instance to put a variable only available over oceans points onto a global grid.
- coll : The same function as gather but the points which are not indexed are not altered.
- fill : Same function as scatter except that the points which are not indexed are left untouched
- only : Only the points listed in index have meaningful data and the others are changed to missing. This is useful for masking.
- undef : The points listed in index are set to undefined

7.3) Dealing with sub-domain decomposition

Jan Polcher

First version: Sat Jan 12 23:56:25 CET 2002
The text becomes readable : Sun Jan 20 23:29:29 CET 2002

Last modified: Sun Jan 20 23:32:46 CET 2002