PMIP 2 Database Structure


 

Choosing the access type

Depending on the way you access the data (locally at LSCE, or remotely), the database home or root (referred to as DB_ROOT below) may change, but the file hierarchy will be the same.

File hierarchy

The data files of the PMIP 2 database can be found in the following straightforward directory hierarchy:

DB_ROOT/experiment/vartype/frequency/variable/
  variable_table_experiment_modelorlab[_time-range].nc
  CTRL/variable_table_experiment_modelorlab[_time-range].ctrl


Note that contrary to the IPCC database, we can have data files related to different models in the same directory, because there are no ensemble runs and the data file names are unique.

experiment is the standard experiment identifier:

Experiment Identifier
Pre-industrial Ocean-Atmosphere PMIP2 experiment pmip2_0k_oa
Pre-industrial Ocean-Atmosphere-Vegetation PMIP2 experiment pmip2_0k_oav
Mid-Holocene 6k Ocean-Atmosphere PMIP2 experiment pmip2_6k_oa
Mid-Holocene 6k Ocean-Atmosphere-Vegetation PMIP2 experiment pmip2_6k_oav
Last Glacial Maximum Ocean-Atmosphere PMIP2 experiment pmip2_21k_oa
Last Glacial Maximum Ocean-Atmosphere-Vegetation PMIP2 experiment pmip2_21k_oav

vartype is the variable type (or submodel) identifier:

Variable type Identifier
Atmosphere atm
Ocean ocn
Land surface and vegetation land
Ice ice

frequency is a requested frequency identifier:

Frequency Identifier
Fixed (no time dependency) fixed
Mean seasonal cycle (SE) se
Yearly average an
Monthly mo
Daily da

variable is a standard output variable name.

The control files

The PMIP 2 database is stored in a directory hierarchy where the leaves are directories with all the models available for a given experiment/frequency/variable. Each leaf/variable directory has a control subdirectory (named CTRL...) with a control (or information) file associated with each netCDF data file in the upper directory.

A control file is used to store summary information and extra metadata about its associated data file. This makes it possible to have basic information about the data file without having to download it and (nc)dump it or process it. The control files have lines with the simple following format (which makes them easy to update or read by other applications):

   timestamp   keyword   =   value

The timestamp is in the form YYYYmmDD_HHmm and gives the date when the line was added to the control file (the earliest time stamps gives therefore the time when the variable was inserted in the database...). The following keywords are (or will be) available. Get in touch with the database administrator if you have ideas for other useful keywords:

Keyword Meaning
# Comment
inpath Input path: location of the file before being inserted in the database
infile Input file name: original name of the input file as it was sent to LSCE (before the file is inserted in the database)
sizebytes Size of the file in bytes
date Date of the input file
datefull Full date of the input file (as returned by the python call os.path.getmtime(input_file))
md5sum MD5 message digest or checksum: this is a unique identifier of the file.

If you have downloaded a file from the database and want to know if the transfer went well, or if you have a file that has the same name as a file in the database and you want to be sure it is indeed the very same file, all you have to do is type:

  md5sum your_file

and compare the result with the value of the md5sum in the control file! All this without having to download the file a second time.
dbpath Database path: location of the file in the database
dbfile Database file name: (possibly new) name of the file after being inserted in the database.

This name is likely to be different from infile if the input file name did not follow exactly the naming scheme required for the database.

The files are automatically renamed (if need be) when they are inserted in the database to ensure that the file names are unique and consistent across the whole database.
vshape
shape
Shape of the variable (as returned by python/CDAT variable.shape). This is the size of the variable in python/C array dimensions order:

 ([time, [extra dimensions,]] latitude, longitude)
trange Time range: date/time of the first and last time steps in the file
error Error message...

This keyword should normally appear only in control files associated with variables that are not in the database (any more).
fname New file name if the file has to be renamed (because it is retired or removed, see retirep and removep below).
retirep Retire path : path where a data file has been moved (before being erased later), because it was replaced by a newer (and hopefully better) version.

When this happens, we add a time stamp to the name of the old datafile and its associated control file and move both files to the retire directory (/home/motifdb/pmip2db_retire if you have local access to the database).

We also keep a copy of the old control file in the CTRL directory for history keeping purpose...
removep Remove path : path where a data file has been moved (before being erased later), after being explicitely removed from the database.

When this happens, we add a time stamp to the name of the removed datafile and its associated control file and move both files to the remove directory (/home/motifdb/pmip2db_remove if you have local access to the database).

We also keep a copy of the old control file in the CTRL directory for history keeping purpose...

Example:

20050323_1942      inpath = /home/motifwork/ftp.cea.fr/pmip2_submit/CCSM/20050310/PMIP2_0K_OA/O1
20050323_1942      infile = tos_O1_CCSM_PMIP2_0K_OA_0390-0399.mo.nc
20050323_1942   sizebytes = 60696052
20050323_1942        date = Mar 10 2005 20:35:42 CET
20050323_1942    datefull = (2005, 3, 10, 20, 35, 42, 3, 69, 0)
20050323_1942      md5sum = 7901de53a29a3e7b8d3cf9337a49de00
20050323_1942      dbpath = /home/motifdb/pmip2db/pmip2_0k_oa/ocn/mo/tos
20050323_1942      dbfile = tos_O_MO_pmip2_0k_oa_CCSM_0390-0399.nc
20050323_1942       shape = (120, 395, 320)
20050323_1942      trange = (390-1-16 12:0:0.0, 399-12-16 12:0:0.0)
20050323_1958           # = Control file moved out of the way
20050323_1958     retirep = /home/motifdb/pmip2db_retire/CCSM/pmip2_0k_oa


Home Top Site Map @ Last updated \2007/08/10 15:01:46