NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
I have many thoughts about some of the on-going discussions on this and related issues in the netCDF mail groups, but have been too busy to give a coherent and useful response. However, after seeing Hank's Griffioen's comment after a recent posting I must put in my two cents to try to clarify things. After all I have some passing familarity with the subject. I'm also cc'g this to Greg Goucher at NSSDC, who is responsible for CDF since I don't know if he subscribes to this mailgroup. I would encourage you to contact him directly for the latest information. Hence, on soapbox... Both netCDF and CDF support the same conceptual data model -- the idea of a data abstraction for supporting multidimensional blocks of data -- since netCDF is a separate and more recent implementation of the ideas that were developed in the original VAX/VMS FORTRAN version of CDF many years ago at the NSSDC at NASA/GSFC. Although the model is the same, the interfaces and the physical formats are quite different. The current (major) release of CDF is much newer than that of netCDF. NetCDF has only one physical form -- a single XDR file with the multi- dimensional arrays written by C convention (row major -- last dimension varies fastest). CDF supports multiple physical forms: XDR or native, single or multiple file (one header file and one file for each variable), row (i.e., by C convention) or column major (i.e., by FORTRAN convention -- first dimension varies fastest) organization and the ability to interoperate between them. At last check, I think CDF supported a few additional data type primitives, but that's relatively unimportant. Although not relevant to this discussion it also supports the original VMS format of CDF V1 (so-called CDFobsolete). Both netCDF and CDF have a similar number of officially supported ports on more or less the same operating systems. NetCDF has additional ports done by the user community compared to CDF primarily because the implementation has been completed for some time -- certainly an important point, since it applies to the HDF implementation as well. As Hank alludes to, CDF has a large and growing collection of both utilities and sophisticated general-purpose applications (some portable and some VMS- specific from the old days). Some of this functionality overlaps the proposed or in-development CDF operators and the Y0 tools that Unidata will supply. There is some overlap between CDF's CXIT tool and NCSA Image, for example. The GEDEX CD-ROM that Hank cites is a collection of climatological data sets that support an on-going Greenhouse Effect Detection EXperiment. Of course, data in CDF are also supported for a much wider range of earth science data via the NASA Climate Data System at NASA/GSFC. This data system is evolving to support the eventual Earth Observing System. CDF is also the standard for a NASA flight program in space plasma physics called the International Solar Terrestrial Physics Project, which involves a suite of international spacecraft. There is a key issue that needs to be raised, about which I have seen too little discussion. It relates to the notion of implementation scaling. The problem is that an abstraction like netCDF/CDF or the multiple abstractions, if you will, that HDF supports must be able to scale to large, complex data sets. One aspect of that was the reason for supporting interoperability among multiple physical forms in CDF, given limits in most file systems (e.g., file size) coupled with the way that many scientists utilize data. A second issue is data structure residency and how it is supported. For scaling to any reasonably interesting data set by size, structure and breadth (i.e., number of parameters/variables/fields), data structures must be disk- resident and have a built-in caching mechanism appropriate for those structures. Both CDF and netCDF attempt to do this. In addition, transaction- like operations on data must be supported. In other words, the ability to query, update/modify, delete data in-place is required. If a substantial investment in building a large data set is made, it is too expensive to make updates via copying. If I am current in my knowledge of the netCDF and CDF implementations then this is supported in CDF and not in netCDF. In the HDF case none of these ideas apply because the data structures are memory resident. (Russ, Greg and Mike please correct me if I am wrong and discuss your current thinking on the subject). None of these notions are new -- just ask anyone in the DBMS community. The difference is the data model. A third area of scaling relates to ease of access by the end user. The CDF/netCDF approach provides a uniform access mechanism via a well-defined model to arbitrary data that fits within that model. I believe that this is a simplifying approach for data access. Unfortunately the data model is too limited for many kinds of data. This has been a focus of some of the work in the group that I am in (the Scientific Visualization Systems Group at IBM T. J. Watson Research Center -- we developed the IBM Data Explorer visualiza- tion software and the IBM POWER Visualization System, a coarse-grain shared memory parallel computational server). The problem relates to how do you uniformize access to data "objects" independent of their underlying mesh/grid structure, level of aggregation or hierarchical nature? HDF Vset is one attempt to do so for a class of such objects. Generalization of CDF/netCDF arrays to non-rectilinear meshes can be accomplished by conventions for attribute and variable specifications. I did this myself for the original CDF implementation eons ago and extended it to include simple irregular and sparse meshes. However, the underlying semantics of the netCDF/CDF data model severely limit how far this can go. Our approach has been to define a more comprehensive data model than is used in netCDF/CDF. To date, the results have show promise. Words with regard to scaling are insufficient. Therefore, let me conclude my ramblings by resurrecting ideas discussed at the SIGGRAPH '90 workshop on data structures and access software for visualization that I chaired, where Greg, Russ and Mike among others were active participants. We need to quantify these notions of scaling with "benchmark" structures/data sets and operations. I would be very happy to discuss such metrics with anyone interested. Off soapbox... Thanks for any comments that anyone may have. Lloyd Treinish >From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 19 2003 Dec -0700 06:15:30 Message-ID: <wrxr7z17xu5.fsf@xxxxxxxxxxxxxxxxxxxxxxx> Date: 19 Dec 2003 06:15:30 -0700 From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx> To: netcdf-hdf@xxxxxxxxxxxxxxxx Subject: tagged netcdf-4 in cvs - netcdf-4_0_75 Received: (from majordo@localhost) by unidata.ucar.edu (UCAR/Unidata) id hBJDFVgg002677 for netcdf-hdf-out; Fri, 19 Dec 2003 06:15:31 -0700 (MST) Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu [128.117.140.88]) by unidata.ucar.edu (UCAR/Unidata) with ESMTP id hBJDFUp2002669 for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Fri, 19 Dec 2003 06:15:31 -0700 (MST) Organization: UCAR/Unidata Keywords: 200312191315.hBJDFUp2002669 Lines: 7 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx Precedence: bulk For anyone who might care I've just tagged the netcdf-4 cvs archive, with tag netcdf-4_0_75. This stands for version 0.75 of netcdf-4. The tagged version passes all of nc_test, as has been noted before, and has at least reasonable performance for reads and writes. Ed
netcdf-hdf
archives: