Concepts underlying the Data management shell
The aim of the data management shell (DMS) is to provide a common library
for the interaction between the geophysical model and the user or
other models. This library will be common to all models of the PRISM
system. It will also manage the meta-data linked to the flux of
information in and out of the geophysical models. In effect this shell
is the layer which separates the numerical world in which all fields
are defined in the model and the physical world in which the user
wants to use it.
The data shell will belong to the model and will enable it to
read user input, read and write to disk or communicate with other
models. Thus, models which uses the data shell will be able to run in
stand-alone and will not require any other components of the PRISM
system. This close link between the individual geophysical models and
the data shell is also need for performance issues.
The use of a common data management shell will standardize the
interfaces of the models used in PRISM. This is an essential step to
enable the coupling of the various components and produce a usable
earth-system model.
This document will detail the concepts behind the data management
shell and be the document on which the detailed design will be based.
1) The task of the data management shell
Metadata is the information that provides a definitive description of
what the data in each variable represents, and of the spatial and
temporal properties of the data. This enables applications and users
of data to decide which quantities are need and how they should be used.
In a geophysical model variables are defined on space/time grids which
are internal to the model. Unless one knows the model it is not
possible to reconstruct the meaning of the data without the
metadata. Thus an essential step in facilitating the coupling of
models is to ensure that all data exchanged is properly described.
It will be the role of the DMS to ensure that all variables which
leave the model are accompanied with a set of meta-data. This is
obviously essential for the coupling of models but also for the
variables written to disk as within PRISM it is expected that users
will run models which were not developed in their own institution.
Most of the configuration of a model, performed by a user, deals with
the input and output of the model. He decides what ancillary data
should be read, with which other model it should interact and
prescribes a list of fields and frequencies to be written on disk for
diagnostic purposes. It is thus logical for the DMS to deal with the
configuration file and pass on to the model all the internal
configuration parameters.
It is thus an essential step in the coupling of models and the usage
of coupled model to keep and manage the data and its metadata as one
entity.
2) Interactions with the user
The user will specify to the data management shell, through an ASCII
configuration file, what should be done to the data the model
outputs. This configuration file is the result of a decision by the
user based on information on which variables can be provided by the
model and which ones he needs. It is thus essential to keep a tight
feedback loop between the variables the model can generate and the
variables required outside of the model. At this point the users
needs a visibility of the other components of the coupled system so
that requests can be matched to corresponding output information.
As the DMS will also communicate the universal constants and
parameters to the model they will need to be either provided in the
configuration file or a a mechanism for obtaining them needs to be
described. This will ensure that all models using the DMS will use a
consistent set of universal constants.
3) Interactions with other models
The DMS will perform temporal operations on the fly and scale the
variables as needed before they are sent to the coupler. It is then
the coupler's task to perform spatial interpolations on the fields
before passing them on to the listening model.
The coupler will also obtain all the meta-data for the grids and
fields directly from the DMS. It is thus the DMS's task to ensure that
a complete description is available for the model.
4) Input/output from files
The DMS will handle the input and output to files for the diagnostics
the restartability and the ancillary variables of the model. This
exchange will allow for a number of file formats. The format for which
the original developments will be made will be netCDF
with the CF
convention and input will be requested from the community to add
support for other formats.
In order to be able to write to the file the variables as requested by
the user the fields provided by the model will need to be
transformed. The same is true when input to the model is
required. These transformation will operate on spatial grid points,
thus they will not include spatial interpolations. The DMS will be
able to perform temporal operations.
5) Meta-data which will be managed by the data shell
The data management shell will receive the description of the
geophysical model's grid and the variables it can output directly from
the model. It will also receive a number of informations directly from
the user through the configuration file. This information will be stored
internally so that all data which leaves the model can be described
according to the standard of the CF
convention. It is also needed to identify and treat properly input
data. This meta-data is key for performing the translation of the data
from the numerical space of the model to the physical space the user
wishes to see and vice-versa.
The meta-data managed by the model will cover two major topics :
5.1) Description of grids
We need to choose the different ways
to describe grids and make sure it is coherent with the Cf
convention.
The possibility will be available to provide for simple grids
only a minimum of information. It will then be completed by
the data management shell.
5.2) Description of variables
What does CF require to describe variable, will it be enough
6) Implementation aspects of the exchanges
6.1) Exchanges with the geophysical model
Explain that in the initialization phase
of the model it will need to provide all the meta data to
the DMS and receive in return handles for the various
elements. During the time loop then only very simple
"query", "put" and "get" functions will be needed.
6.2) Exchanges with the user
The model will offer a large choice of variables it can
output and it will be for the user to select the ones it
wishes to save in files or passe on to other models. But
typical configurations will be provided and the user will
then be able to adapt them to their scientific needs.
The list of possible outputs from the model will be
generated by a parser which will rely on comments (in a
format to be specified) just before the calls to the DMS
subroutines. A similar procedure will be applied to the list
of required variables but in this case special care has to
be taken with data only required in certain conditions.
The process which allows the suer to choose should be
implemented with a GUI and should be the responsibility of
work-package 4b. But as the user/DMS interactions will be
based on ASCII files in a first phase they can be
hand-generated.
A format needs to be defined for the configuration files
which will be read by the DMS.
6.3) Exchange with other models
Sophie, what the important points we need
to respect for the exchange with other models ?
6.4) Writing and reading from files
We will allow for a number of formats as
long as they respect a number of criteria on the associated
meta-data.
7) Internal implementation issues
7.1) Internal structure of the DMS
7.2) The transformation library
The library for transformations which will be included in the
DMS will include operations which do not change the spatial
grid and perform various temporal operations. A list of
operations is proposed :
- Time operations which need to be supported :
- ave : time average of the field
- inst : Instantaneous values of the field
- t_min : The minimum value over the time period is produced
- t_max : The maximum value over the time period is produced
- l_min : The minimum value over the entire simulation will be generated
(without time dimension)
- l_max : The maximum value over the entire simulation will be generated
(without time dimension)
- t_sum : Sums the variable over time.
- once : The field is processed only once and thus it has no time axis
- never : The field is never treated
- Point operations :
- sin : Sinus
- cos : cosines
- tan : tangent
- asin : arc-sinus
- acos : arc-cosines
- atan : arc-tangent
- exp : exponential
- log : logarithm
- sqrt : square root
- chs : change sign
- abs: absolute value
- cels : Transforms the field into degrees Celsius
- kelv : Transforms the field into degrees Kelvin
- deg : puts a field into degrees (from radian for instance)
- rad : puts field into radians
- ident : Identity, does nothing !
- Field-scalar operations :
- + : addition
- - : minus
- * : multiplication
- / : division
- ^ : exponential
- min : (min(x,scal)) Minimum between field x and scal
- max : (max(x,scal)) Maximum between field x and scal
- Allowed indexing operations :
- gather : gathers from the model data all the
points listed in index. Other points on the grid are
going to labeled as missing.
- scatter : scatters the model data onto the
points which are listed in the index. Other points are
going to be labeled as missing. This is used for instance
to put a variable only available over oceans points onto
a global grid.
- coll : The same function as gather but the
points which are not indexed are not altered.
- fill : Same function as scatter except that
the points which are not indexed are left untouched
- only : Only the points listed in index have
meaningful data and the others are changed to
missing. This is useful for masking.
- undef : The points listed in index are set to
undefined
7.3) Dealing with sub-domain decomposition
Jan Polcher
First version: Sat Jan 12 23:56:25 CET 2002
The text becomes readable : Sun Jan 20 23:29:29 CET 2002
Last modified: Sun Jan 20 23:32:46 CET 2002