The formatting of yesterday's message from Steve Hankin to the netcdfgroup
mailing list was inadvertantly corrupted here at Unidata, so I've appended a
reposting of the message in a more readable form. Let me also take this
opportunity to remind members of the mailing list that administrative
requests for addition to or deletion from the list should be sent to
netcdfgroup-adm@xxxxxxxxxxxxxxxx rather than the full mailing list. Thanks.
Hello Rich, Tim, Ken, et. al.,
I've been following your discussion about netCDF styles with interest as my
own group - a numerical modeling group - shares the similar concerns: how to
use netCDF to achieve a compatible representations of our model data
(gridded, multi-gigabyte, multiple variables on staggered grids) as well as
PMEL's EPIC data (down the hall) and outside institutions, too.
This business of time axis representations is leading us all to similar
solutions. Rich has described a global variable called "base_date" which
"specifies the Gregorian start date". Similarly, the file
"conventions.info" available from unidata.ucar.edu suggests e.g.
variables:
double time(nobs);
time:units = "milliseconds since (1992-9-16 10:09:55.3 -600)"
Our own software, FERRET, uses a solution e.g.:
float TIME(TIME) ;
TIME:units = "seconds" ;
TIME:time_origin = "14-JAN-1976 14:00:00" ;
and accepts int, long, float, or double data types.
While all of these are very similar solutions they are also incompatible.
How are time-date strings formatted? Where should the time origin be
placed: in the units string? in a global attribute? in a variable
attribute? If in an attribute, what is the attribute name? Is the data
type mandated? Does the axis have to be a "coordinate variable" (dimension
name=variable name) ? etc. etc. Similar issues arise for if/how to map
gridded data onto 4-dimensional grids. Mandatory ordering of axes?
Mandatory axis names? Mandatory units choices? What to do with missing
axes (e.g. Z axis of vertically averaged flow)?
It seems to me that if we want to adopt conventions for these issues now is
the time to do it. NetCDF can fail to be a "standard" in any meaningful way
if these issues are not addressed somewhat formally by "users" (us) acting
as a community. I have some personal experiences with this type of
standards-failure as a member of the ANSI committee that creates CGM (the
Computer Graphics Metafile). CGM, a broadly conceived standard, has
expected user communities to develop "profiles" that dictate their
particular style choices and ensure interoperability. The user communities
have mostly failed to get organized and there is chaos in the CGM world -
enough to endanger its success as a standard.
I spoke to Russ Rew and he agreed that a "straw man" proposal on these
conventions for oceanographers was in order. I will try to pull one
together in the next few days - using "conventions.info" as a starting point
but going into much greater detail. My main goal will be to enumerate the
open issues. The list I generate will be VERY incomplete - I hope we can
pass it around and add to it. When we have a moderately exhaustive list
then we can begin discussing solutions that encompass our issues.
If you see a problem with this process please fire away!
cheers - steve
>From owner-netcdfgroup@xxxxxxxxxxxxxxxx 29 Tue, Sep
Date: Tue, 29 Sep 1992 10:03:22 -0700 (PDT)
From: HANKIN@xxxxxxxxxxxx
To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: RE: sizes of netCDF objects...
Received: by unidata.ucar.edu id AA12164
(5.65c/IDA-1.4.4 for netcdfgroup-send); Tue, 29 Sep 1992 11:05:57 -0600
Received: from FERRET.NOAAPMEL.GOV ([192.68.161.61]) by unidata.ucar.edu with
SMTP id AA12160
(5.65c/IDA-1.4.4 for <netcdfgroup@xxxxxxxxxxxxxxxx>); Tue, 29 Sep 1992
11:05:55 -0600
Organization: .
Keywords: 199209291705.AA12160
Message-Id: <920929100322.20200e6d@xxxxxxxxxxxx>
X-Vmsmail-To: SMTP%"netcdfgroup@xxxxxxxxxxxxxxxx"
<Date: Mon, 28 Sep 1992 14:44:05 PDT
<From: 28-Sep-1992 1734 <lysakowski@xxxxxxxxxxxxxxxxxxx>
<Subject: sizes of netCDF objects...
<To: netcdfgroup@xxxxxxxxxxxxxxxx
<Cc: lysakowski@xxxxxxxxxxxxxxxxxxx
<Message-id: <9209282143.AA26482@xxxxxxxxxxxxxxxxxx>
<Organization: .
<Apparently-To: netcdfgroup@xxxxxxxxxxxxxxxx
<Keywords: 199209282144.AA10080
<
<
<Please respond to this message only if you are using netCDF for
<large (over a megabyte of data) to Huge (100's of megabytes to gigabytes
<of data).
Our model outputs are typically about 2 Gbytes in size. We have an in-house
direct access format that permits us to break this up into multiple files and
a strategy that allows a "data set" (an associated group of files) to be the
equivalent of a netCDF hyperslab such that the data set still shares the grid
coordinates and indices of the full model output. This permits us in most
cases to avoid working with the full multi-gigabyte data set.
At present we have adapted the hyperslab strategy to netCDF files (using a
handfull of netCDF attributes) but we have not yet implemented the ability to
split the netCDF data set into multiple files. Because of this we havn't
been using the netCDF format for our HUGE files yet - order 10-20 Mbytes, only
so far. But we will likely be facing similar performance issues to your own
in the future.
<I need to do a short survey of netCDF usage for large to HUGE datasets.
<
<We are thinking about using netCDF for Nuclear Magnetic Resonance data for
<analytical laboratories and for Magnetic Resonance Imaging data.
<
<1) What are the largest datasets that you are using with netCDF now?
- see above
<2) For what applications?
- ocean GCM outputs
<3) What limitations are you experiencing for performance? (If you are
<experiencing limitations, please state what kind of hardware and software
<you are using so we know how to interpret your results.)
- Significant performance limitations on WRITEing file - excellent
performance READing in all (very informal) tests to date. In WRITE operations
the use of the RECORD (unlimited) dimension seems to impose a quadratic
falling off in performance as the length of the record axis increases ... a
potential gotcha for long time series saved incrementally ...
<4) What are your plans for larger datasets in the future? How far do you
<envision netCDF going before it breaks down, if at all?
- as above: we're still on the leading edge of our learning curve, too
<Thanks in advance.
<
<Rich Lysakowski
<ADISS Project Director
| NOAA/PMEL | ph. (206) 526-6080
Steve Hankin | 7600 Sand Point Way NE | FAX (206) 526-6744
| Seattle, WA 98115-0070 | hankin@xxxxxxxxxxxx