Re: [netcdfgroup] NetCDF4 for Fusion Data

John Storrs wrote:
We are evaluating HDF5 and NetCDF4 as archive file formats for fusion
research
data. We would like to use the same format for experimental (shot-based) data
and modelling code data, to get the benefits of standardisation (one API to
learn, one interface module to write for visualization tool access, etc). A
number of fusion modelling codes use NetCDF. NetCDF for experimental data
will be new though, so far as I know. I've found some problems in shot data
archiving tests  which need to be resolved for it to be considered further.

MAST (Mega-Amp Spherical Tokamak) shot data (from magnetic sensors etc) is
mostly digitized in the range 1kHz to 2MHz. MAST shots are currently less
than 1 second in duration, but 5 second shots are forseen (some other
experiments have much longer shot times). We use up to 96-channel
digitizers.
Acquisition start time and sample period is common to a digitizer, but the
number of samples per channel sometimes varies - that is, some channels may
be sampled for a longer time than others. Channel naming is hierarchical.

There are two NetCDF-related issues here. The first is how to store the
channel data, the second how to store time, both efficiently of course. We
want per-variable compression. We don't want uninitialised value padding in
variable data, even if it would be efficiently compressed. In the normal case
where acquisition start time and sample period is common to all channels in a
dataset, we would prefer to define just one dimension, not many if channel
data array sizes vary.

NetCDF4 tests with a single fixed dimension, writing varying amounts of data
to uncompressed channel variables, shows that the variables are written to
the archive file with padding, even in no_fill mode. The file size is
independent of the amount of data written.

John:  I'm sure the netCDF developers will have much more to say, but
here's my $0.02 as an early netCDF adopter (I wrote the python interface
http://code.google.com/p/netcdf4-python).  The python interface is an
easy way to play with netCDF-4 by the way, without having to write C code.

When you create a variable with a fixed dimension (not unlimited), it
will be filled with data, even before you write anything to it.  There
is no way around that.
NetCDF4 tests with a single unlimited dimension work for very small dimension
sizes, but take forever to write even a single 4 MSample channel variable (we
are using HDF5 1.8.2 if that's relevant to this problem). That looks the
right way to go if the processing time and memory overhead is small, but we
can't test it.

Seems like this should work - I haven't seen the slowness you report.
If you could post a sample file, perhaps we could see what's happening.
Coming to storage of the time coordinate variable. If we actually store the
data, it will need to be in a double array to avoid loss of precision.
Aleternatively we could define the variable as an integer with a double scale
and offset. Both of these sound inefficient.  Traditionally we store this
type of data as a (sequence of) triple: start time, time increment, count.
Clearly we can do that within a convention, expanding it in reader code.
How should we handle this?


The standard way is to create a time variable with units "<time
increments> since <start time>".

HTH,

-Jeff
Your comments would be appreciated.

Regards
John Storrs

--
John Storrs, Experiments Dept      e-mail: john.storrs@xxxxxxxxxxxx
Building D3, UKAEA Fusion                               tel: 01235 466338
Culham Science Centre                                    fax: 01235 466379
Abingdon, Oxfordshire OX14 3DB              http://www.fusion.org.uk

_______________________________________________
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/



--
Jeffrey S. Whitaker         Phone  : (303)497-6313
Meteorologist               FAX    : (303)497-6449
NOAA/OAR/PSD  R/PSD1        Email  : Jeffrey.S.Whitaker@xxxxxxxx
325 Broadway                Office : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web    : http://tinyurl.com/5telg



  • 2009 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: