Submitting Data to the PMIP 2 Database

PMIP 2 Home > Database > Submitting data

Submitting Data to the PMIP 2 Database

Preparing the data

Before submitting data to the PMIP 2 database, you should should follow the steps outlined in the Experimental Design section:

Decide what standard experiments you want to run.

Note: ONLY the standard experiments and standard variables (saved at their required frequencies and on their required levels) will be stored in the [an error occurred while processing this directive] database (at least in the first stages of database feeding)! But we suggest you still follow closely the project's standards for other experiments...
Run your model with boundary conditions following as closely as possible the official boundary conditions for the given experiments.

Note: please document clearly any deviation from the standard boundary conditions!
Process your data files to create data files following strictly the standard output format.

The PMIP 2 database follows closely the standards set up for the IPCC project to make it easier for you to prepare the data files (provided your group is already involved in IPCC).

Do not forget to check...

Make sure your time axis is defined correctly. We suggest you use the following units for the time axis: 'days since 0001-01-01'.

Make sure this is correctly defined: e.g. 0001-01-00 is not correct!
...

Data file names

Please note that UNLIKE IPCC we request that THE DATA FILES HAVE UNIQUE NAMES across the WHOLE database, and use the following data file name template:

    varname_table_experiment_modelorlab[_time-range].nc

table being one of the PMIP2 CMOR tables short names: A_FX, A_SE, A_MO, A_DA, O_SE, O_AN, O_MO, S_SE, S_AN, S_MO, I_SE, I_DA, I_MO

experiment being one of: pmip2_0k_oa, pmip2_0k_oav, pmip2_6k_oa, pmip2_6k_oav, pmip2_21k_oa, pmip2_21k_oav

modelorlab should be composed of uppercase and/or lowercase characters, numbers and/or the '-' (dash) and/or '.' (dot) characters.

If a data file has to be split into several sub-files along the time axis, use either
    _yyyy-YYYY
if the file covers the whole years yyyy to YYYY, or
    _yyyymmdd-YYYYMMDD

There is no convention for year dates to use for the paleo experiments, so you might as well start all the paleo experiments on January 1^st 0001 (i.e. 0001 or 00010101 and the units of the time axis should be 'days since 0001-01-01').
You are of course free to follow whatever dating scheme you want in your actual runs, but unified dates in the database may make some comparison/processing steps a bit easier...

The logic behind the file names

There is some logic in the file naming scheme that may be useful for programmers! The major fields of a file name are separated by '_' characters, which makes it easy to extract the different fields of a file name by just splitting it where there are '_' characters...

shell scripts users:

> fname="ts_A_MO_pmip2_0k_oa_HadCM3-MOSES2_0000-0029"
> echo $fname | sed 's/_/  /g'
ts  A  MO  pmip2  0k  oa  HadCM3-MOSES2  0000-0029
> model=`echo $fname | cut -d_ -f7`
> echo $model
HadCM3-MOSES2

python scripts users:

>>> fname = "ts_A_MO_pmip2_0k_oa_HadCM3-MOSES2_0000-0029"
>>> ffields = fname.split('_')
>>> ffields
['ts', 'A', 'MO', 'pmip2', '0k', 'oa', 'HadCM3-MOSES2', '0000-0029']
>>> model = ffields[6]
>>> model
'HadCM3-MOSES2'

Do not forget to check...

Try to use correct file names (even if the database file insertion script can handle most common error and will correct the file names on the fly)!

Also, make sure that your data files generating codes/scripts that can handle a new naming scheme, if we have to make some changes (minor changes, hopefully)...

Based on the data that has already been submitted to the database, you should pay attention to the following details:

Use long file names, not IPCC-like short names!
varname_table_experiment_modelorlab[_time-range].nc
and not
varname_table.nc
Try to use uppercase and lowercase characters at the correct places!
    Uppercase: table
    Lowercase: varname, experiment
    Any or mixed: modelorlab
Make sure that you use the correct starting and ending dates, if you specify the optional time range! If you have ten years of simulation in a file, time-range should look like YYY1-YY10 or YYY0-YYY9, and not YYY0-YY10...

The dates implied by time-range should of course also be consistent with the time axis defined in the file (time axis' units and values).
The start and end date of time-range have to be separated by a '-' and not a '_'.
Make sure all the files you submit in the same directory have exactly the same value for the modelorlab field.
Make sure that all the fields that make a file name are separated by a '_' (underscore). Otherwise, the scripts that are used to insert your files in the database will have a very hard time determining what the file is supposed to be!

Also, make sure that there are NO '_' characters inside a field (except the experiment field)! If you need to separate a field into subfields, or indicate a range, use '-' characters...
...

Sending the data

The data files will have to be sent to LSCE by anonymous ftp and/or scp (if you already have an account at LSCE).

Data transfer by tape should only be used as a last resort (if transfer over the network appears to be extremely slow), after agreeing with the database administrator on what tape format should be used.

Remember that we are not requesting as many vertical levels for the PMIP 2 project as for IPCC, so the amount of data to transfer is (much) smaller.

The data files should be stored in the following directory hierarchy

    modelorlab/yyyymmdd/experiment/table
        where yyyymmdd is the submission date

and sent either as is (the hierarchy is replicated directly in the directory where the data is uploaded) or in a compressed tar file with the name

    modelorlab_yyyymmdd_experiment_table.tar.gz

        where yyyymmddA is the submission date followed by a letter (a to z) if several tar files are sent

whichever is more convenient.

Whatever transfer method is used, always keep an exact copy of the data you send, in case it needs to be sent again or reprocessed from scratch (we are not expecting a major database crash, but it is always best to be prepared... :).

And it is of course a good idea to keep all the programs/scripts you used to prepare your data in case your files need to be processed again (either because somebody find some errors in your files, or because the output standards need to be changed).

Sending the data by anonymous ftp

You should carefully follow those steps when you use this submission method:

Send a mail to the database administrator before actually sending the data, to make sure that your data can be transferred to LSCE (from the ftp site) before it is automatically removed from the ftp site (after 1 week)

The DB administrator will also have to give you the name of the HIDDEN_DIR parameter mentioned below.
Start an anonymous ftp session on ftp.cea.fr

ftp ftp.cea.fr
Go to the incoming/HIDDEN_DIR directory where HIDDEN_DIR should be replaced with the name you got from the DB administrator

cd incoming/HIDDEN_DIR
Go to the pmip2_submit directory if it exists. If not, create the directory!

mkdir pmip2_submit (if need be)
cd pmip2_submit
Create a subdirectory with your lab's name and put the directory hierarchies or the tar files in there

  mkdir mylab
  cd mylab
  pwd   (make sure you are at the right place, below the pmip2_submit directory, before sending the data!)
  mput myfiles

Note: if there is not enough space when you transfer your data (in which case you may only get a cryptic error message...), try the incoming/HIDDEN_DIR/pmip2_submit directory instead of incoming, and notify the DB administrator that your data is there.
When you are done, send a mail to the DB administrator.

Sending the data by scp

You should carefully follow those steps when you use this submission method:

Use scp to transfer your data to

/home/motifdb/incoming_pmip2/modelorlab/yyyymmdd Where: modelorlab is the same identifier used in the data files' names (variable_table_experiment_modelorlab[_time-range].nc) and yyyymmdd is the submission date
send a mail to the DB administrator when you have finished sending your data.

Checking the processed data

...


Home	Top	Site Map	@	Last updated \2008/06/19 09:16:10