Storing data and meta-data
Storing data and meta-data
The data format choices described here for ALMA are very similar to
the ones which were made by
AMIP for collection data from GCMs. This ensures that the proposed
data format is applicable for off-line land-surface simulations as
well as coupled experiments and will thus be suited for all actions of
GLASS project.
Information needed to describe the data.
To ensure that data can be exchanged easily, the files should not only
contain the values but also the full description of the
variables. This last information is called the meta-data. Such
self-descriptive files can be exchanged and used with little extra
information. To achieve this, a thorough analysis has to be performed
on the information required to describe fully geophysical data. Such
work was performed for modelling applications by groups at the Hadley
Centre and the NCAR and resulted in two convention of meta-data which
are in process of being merged.
For ALMA it is proposed to adopt this new convention which will build
on the one developed by the Hadley Centre and is called GDT. The two
most important features it offers to land-surface schemes is that it
allows to store non-rectilinear grids and to compress data by
gathering. Two points which are essential for land-surface schemes are
the possibility to use irregular grids and not have to store ocean
points when a longitude latitude grid is used. These two features were
not present in older conventions such as the COARDS
convention.
Once the meta-data convention is chosen an appropriate numerical
format needs to be found which allows to store it all in a file. The
only condition such a format has to fulfill is that it also allows to
write in the file all of the meta-data in an unambiguous way. If this
is the case then the numerical format can be changed when computer
technology evolves as programs can be written which transfer all the
information from the old files into the new ones without the
intervention of an operator. Thus, if the right specification for the
meta-data are chosen, the choice of data format can be guided by
convenience for the users.
Format used to store the data.
A data format which allows to store all meta-information in the same
file as the data is
netCDF. This is a binary format which is machine independent and
the software needed to store or retrieve data runs on a wide range of
platforms. In other words the data is stored in a compact form and a
file generated on one machine can be read on any other computer. The
netCDF format is in the public domain and is maintained by UNIDATA. This ensures that it
is widely distributed in the geophysical community and that its
maintenance is assured. Furthermore, a large fraction of data analysis
or graphical software used in the geophysical community can read or
write netCDF files.
The following information can be stored along with the data in a netCDF file :
- Dimension of the various axes used in the file
- The spatial and temporal coordinates which allows to locate the
stored data
- Convention used for the meta-data, sign convention, time when the
file was generated and the model which produced the data.
The following information can be store for each variable :
- Size and rank of the variable;
- Name of the variable;
- A title which better describes the variable;
- Units of the variable;
- The flag used to signal missing values;
- The coordinates corresponding to each of axes.
Header informations extracted with the "ncdump -h" command from some
netCDF files containing off-line forcing data for land-surface schemes
are presented :
- A data set of forcing variables on a
rectilinear grid. In this case the geographical coordinates of
data points can be stored in vectors. Thus, the dimensions and
coordinates of the variables have the same rank and can be given the
same names.
- A file containing data on a more
general grid. This time the dimensions and the coordinates of the
variables have different ranks and thus names. In this case each data
variable needs to have an attribute (associate) which specifies which
variable should be used to project the dimension onto the geographical
coordinates.
- An example of an ALMA forcing file
for data compressed by gathering. For each variable the spatial
dimensions are reduced to a vector of land-points. The file contains
an extra variables, of type integer, which gives the position of each
land-point on the full grid. This variable (land) is characterized by
an attribute describing the dimensions along which the compression was
performed : compress = "y x". Each variable now indicates the
compressed coordinates by putting them into brackets in the
"associate" attribute. Thus, the "compress" attribute allows to
reconstruct the rank of the full variable and the "associate"
attribute leads us to the projection needed to put the data in their
geographical context.
Software freely available to write and analyze data in the netCDF
format.
A long list of software packages which can read or write netCDF files
is available on the UNIDATA
web site. In the following we would like to highlight the software
packages which are public domain and will be particularly useful to
work with the data exchanged within GLASS (Additions to
this list are welcome).
- Data management software :
-
-
PCMDI developed for the management
of the AMIP data set (which is stored in netCDF) a software package
based on python. The CDAT and
CDMS
software allow the user to read and write data and meta-data of
geophysical variables and to manipulate the variables in python. The
content of the netCDF files are made accessible as python objects.
-
The NCAR provides a set of commands in the NCO package which
allow to manipulate netCDF files and the data within. These tools are
very useful for changing variables names, extracting variables or
sections of variables from files. They can even be used to interpolate
the data.
- Graphics :
-
- A graphics program which works well with netCDF files is FERRET. It allows to do
some basic operations on the files and display properly fields on
irregular grids or time evolutions.
- GRADS also
works well but its support of self describing files such as netCDF is
rather poor. To use it the netCDF files need to be simplified
beforehand.
- I/O libraries :
-
- The IOIPSL
library provides simple interfaces for writing and reading data in the
netCDF format from a running model. It is used at IPSL for the history and restart
files of the atmosphere, ocean and land-surface models.
- EZGET
has been developed at PCMDI to facilitate retrieval of modeled and
observed climate data stored in netCDF.
Last modified: Mon May 15 22:32:20 WEST 2000