Broader Requirements for netCDF and standards - response to Lloyd's memo

Rich:

Thanks for your comments.

Data Explorer (formerly, IBM Visualization Data Explorer or informally, DX) is
a commercial software package developed by the group that I am in.  The
specification of the DX data model and the operations that DX supports are
available publically.  Obviously, the code is not.  Although DX has a lot of
capabilities, the data model can support a greater variety of data than the
currently available version of the software can visualize.  For a simple
example, consider a rank 2 tensor in 3-space on an irregular grid.  The
model can handle it without difficulty.  DX can do mathematical operations
on such data, but realization (generating renderable geometry) is not so
straightforward (i.e., a research problem).  Today, DX would allow you to
treat a 3-tuple from the tensor as a 3-vector or a single element as a scalar,
for example, and do appropriate things.  There are also a few visual things
that can be done if the tensor is symmetric.  Documentation on DX and the
data model range from marketing literature with some level of detail about
data types, etc. to papers in the literature (cf., R. Haber et al, "A Data
Model for Scientific Visualization with Provisions for Regular and Irregular
Grids", Proceedings IEEE Visualization '91 Conference, pp. 298-305, October
1991, B. Lucas et al, "An Architecture for a Scientific Visualization System".
Proceedings IEEE Visualization '92, pp. 107-113, October 1992) to an internal
report that I have written on data management methods for visualization to
software documentation (DX user's guide, DX programmer's guide).  All of this
is available publically.  We are also interested in the use of the external
representation of the model independent of DX.  Right now, it is a multiple
file representation without an API independent of DX.  This could be something
to discuss further, if there is interest.

The issue of public domain vs. commercial is certainly one that I faced at
NASA over the years.  The first data system that used CDF was the NASA Climate
Data System (NCDS).  NCDS uses some commercial software (e.g., RDBMS, UI,
graphics).  This was widely criticized at design (a decade ago) because the
NASA approach at that time was to build everything from scratch and ignore the
outside world.  Commercial software was chosen to reduce costs, especially
with a finite budget.  Custom software was used to develop things unavailable
commercially or in the public domain and to integrate.  CDF was one such piece
of software.  Later in the 80s, the development of CDF was criticized because
the NASA view then was that we should not be developing stuff to put out in
the public domain, but should adopt what is already available.  It did not
matter if the appropriate tools did not exist.  C'est le vie.

Anyway, I do agree with Rich's assessment that the public domain is the proper
place for standards (and benchmarks -- another subject we should discuss at
some point).  There is plenty of precedence for this view in other arenas of
computing.  Commercial systems may use, enhance, etc. such a standard, of
course, which then provides value that a potential customer is willing to buy.
If you will, that's our view about importing data in things like netCDF or
CDF or generating images in TIFF, PS, etc.  Given that we wanted to support
data and analysis thereof for problems that were beyond what current systems
like CDF, netCDF, HDF at al could handle we developed something ourselves.  We
did look at everything out there first -- no sense reinventing the wheel.  Our
extensions/conventions for netCDF were an early attempt to provide an importa-
tion mechanism on one public domain "standard".  However, given the limits in-
herent in the netCDF data model and its vocabulary, the result was a subset
of what the DX data model supports.  We do have some interest in making the
data model and a external format available publically, independent of DX.  I
would have interest in discussing this further with anyone so inclined.

There are other issues that I wish to discuss at some other time in two
arenas.  One relates to data set scaling, both width (complexity) and depth
(bulk size).  Some of what we have developed addresses both of these: the
complexity in terms of the model vocabulary and the size in terms of support
for parallel computation and use of high-performance I/O systems (h/w).
NSSDC, for example, has addressed some of the size scaling issues in CDF for
disk access in conventional file systems with direct access to the disk,
subsampling from disk, etc.  This has implications when dealing with more than
a few 10s of MB of data.  The other area relates to semantics -- issues of
higher-level information imbedded as metadata, and driving applications.

Lloyd
------------------------------- Referenced Note ---------------------------
Received: from unidata.ucar.edu by watson.ibm.com (IBM VM SMTP V2R2) with TCP;
   Thu, 26 Nov 1992 02:48:24 EST
Received: by unidata.ucar.edu id AA21351
  (5.65c/IDA-1.4.4 for netcdfgroup-send); Thu, 26 Nov 1992 00:01:48 -0700
Received: from enet-gw.pa.dec.com by unidata.ucar.edu with SMTP id AA21347
  (5.65c/IDA-1.4.4 for <netcdfgroup@xxxxxxxxxxxxxxxx>); Thu, 26 Nov 1992 
00:01:45 -0700
Organization: .
Keywords: 199211260701.AA21347
Received: by enet-gw.pa.dec.com; id AA12450; Wed, 25 Nov 1992 23:01:38 -0800
Message-Id: <9211260701.AA12450@xxxxxxxxxxxxxxxxxx>
Received: from mr4dec.enet; by decwrl.enet; Wed, 25 Nov 1992 23:01:38 PST
Date: Wed, 25 Nov 1992 23:01:38 PST
From: 26-Nov-1992 0153 <lysakowski@xxxxxxxxxxxxxxxxxxx>
To: netcdfgroup@xxxxxxxxxxxxxxxx
Cc: lysakowski@xxxxxxxxxxxxxxxxxxx
Apparently-To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: Broader Requirements for netCDF and standards - response to Lloyd Tr's 
memo o

Lloyd Treinish has done an excellent job on thinking how to use netCDF "as is"
to represent complex datatypes that are not inherently supported in netCDF
now.  We in the analytical instrument community have requirements that
go far beyond where netCDF is currently.  We need to support more complex
data models sooner rather than later.  Many kudos to Lloyd for taking the
next major step -- again!!

>From my quick reading of Lloyd's comments, the conventions used in Data
Explorer (included below) detail a way to implement some parts of a more
extensive data model using conventions in CDL.  I think the description
is very useful to describe how one might use netCDF and CDL to store more
complex datatypes in netCDF files.

I'd like to see the technical requirements and scope of those requirements
that the Data Explorer data model addresses now.  Lloyd, is the Data Explorer
data model specification a public-domain document?

Data Explorer sounds like a great package.  It appears to address many
requirements for several different domains of science.

--------------------------------------------------------------------------

We need to advance to other issues not addressed by Lloyd's input.  I feel
that we still haven't fully addressed the question of standards.

There are very important business, organizational, and people constraints
on any solution that will be WIDELY accepted, i.e., become a standard.
The analytical instrument vendors, universities, government agencies, and
end user companies that I've been working with on analytical data standards
over the past 4 years have said "if we have to buy it from "company x", then
it's not an open standard, and we don't want it."

A major problem with Data Explorer (and other commercial systems "more
advanced" than netCDF) is that it is proprietary, and requires paid royalties
to a for-profit company.  I've been hit down hard for proposing proprietary
technologies to standards groups and other researchers that, for whatever
reason, feel they must base their work on public-domain standards.

Until a public-domain version is made available that is free of charge,
available over Internet, and is supported by a vendor-independent software
engineering support group like Unidata or NASA, Data Explorer (or any other
commercial package) doesn't serve the major needs for universities, standards
communities, and even many sections of industry for scientific data
interchange and storage.  I've hit up against this hard "reality" many
times.

If such a public-domain version of a generic package (Data Explorer or any
other package) for scientific data interchange and storage were made
available to the scientific community, it must not be made available as
a "scaled-down" version, that requires someone to buy the commercial version
to get the full functionality.  Unidata doesn't use such "hooks", because
they don't serve Unidata's clientele.

We must not lose sight of the fact that technical solutions by themselves
are not complete solutions or business solutions, whether your "business"
is university research, industrial R&D, or government R&D.  Too many technical
solutions fail to make it "to market" because they are technical solutions
only, and fail to satisfy the all other requirements, particularly business,
organizational, and people.

This is not a soapbox conversation.  I've had to take long hard looks at
what is making the analytical data standards successful.  The technical part
of it (netCDF software) is an important, yet small part of the solution.
This is not always easy for technical people (including myself) to accept.

The vendor-independent software support center (Unidata) is an organizational
factor that is crucial to the success of netCDF.  However, to be successful
the full range of requirements must be included in the solution.  Unidata has
done a good job of addressing the fuller range of requirements than most other
organizations I've seen.

Unidata does an enormous amount of work to make sure their codes are fully
avaiable on all the major platforms, with no particular bias toward any group
of users or vendors.  They should be commended for all their great work.

I hope that this discussion leads to a broader discussion of requirements
for systems and solutions in the future.  This may be controversial, but it
is meant to forward the scientific community large.

NetCDF has a broad applicability, and it needs to be extended to meet some
of the requirements beyond those that Lloyd and others have begun to address
in varous memos.   This is a good time to start discussing the broader
requirements for the future versions.

Your feedback on this note will be much appreciated.

Rich Lysakowski
Director, Analytical Data Interchange and Storage Standards Project


  • 1992 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: