A data model specifies data components, relationships, and operations, independent of any particular programming language. The components of a netCDF data set are its variables, dimensions, and attributes. Each variable has a name, a shape determined by its dimensions, a type, some attributes, and values. Variable attributes represent ancillary information, such as units and special values used for missing data. Operations on netCDF components include creation, renaming, inquiring, writing, and reading.
The netCDF software includes interfaces for C, Fortran, C++, perl, and Java. Utilities are available for displaying the structure and contents of a netCDF data set, as well as for generating a netCDF data set from a simple text representation.
The netCDF format provides a platform-independent binary representation for self-describing data in a form that permits efficient access to a small subset of a large data set, without first reading through all the preceding data. The format also allows appending data along one dimension without copying the data set or redefining its structure.
Since Unidata developed netCDF, other groups and projects in the geosciences have adopted the netCDF interfaces and format, and its use has also spread to other disciplines. Below, we summarize the growth in the use of netCDF, describe the current status of the software including the recent addition of new interfaces, present the benefits of using netCDF for platform-independent data representation, list some current limitations of the netCDF model and format, and discuss how some of these limitations are addressed by new features that are under development for the next version.
As a measure of recent usage, during April and May 1996, over 900 distinct hosts downloaded version 2.4 of netCDF software, and over 1600 distinct hosts in more than 50 countries accessed information on netCDF from the Web site. NetCDF data may now be accessed from over 20 packages of freely available software, including DDI, DODS, EPIC, FAN, FERRET, GMT, GrADS, HDF interface, LinkWinds, SciAn, and Zebra. Access to netCDF data is also available from commercial or licensed packages for data analysis and visualization, including IBM Data Explorer, IDL, GEMPAK, MATLAB, PPLUS, PV-Wave, PolyPaint+, and NCAR Graphics. For more information on these and other packages for manipulating and displaying netCDF data, see (NetCDF Software Web Site, September 1996).
Use of netCDF library interfaces to access data makes knowledge of the format unnecessary, but lack of a published format specification had proved an obstacle to the adoption of netCDF in some cases. This obstacle was recently removed, with the publication of detailed documentation for the netCDF format (Rew 1996).
The unexpectedly widespread use of netCDF means that any future changes to the data model, interfaces, or format must be planned and implemented with great care. Backward compatibility with existing software and data archives is very important to netCDF users and must be part of future development plans.
The recently released netCDF-3 includes a complete rewrite of the netCDF library. The netCDF-3 file format is unchanged, so files written with the new version can be read with previous versions and vice versa.
Starting with netCDF-3, the library is no longer dependent on a vendor-supplied XDR library for external data representation, making it easier to build applications that use netCDF. Replacement of the XDR layer also made the library about twice as fast as the previous version.
The netCDF-3 library is now written in ANSI C. The conversion to ANSI C offered an opportunity to implement a completely new C interface that provided significant benefits to C programs that use netCDF: type-safety, automatic type conversions, improved readability, and more standard error behavior. The new interface also removes some obstacles to adding future enhancements, such as packed data and enhanced concurrency.
NetCDF-3 also includes a new Fortran interface that provides analogous benefits: enhanced type safety, automatic type conversion, clean separation of external and language-native types, and a new function-naming scheme for improved readability in applications.
The netCDF-3 library includes support for all netCDF-2 function interfaces, globals variables, and behavior. The benefits of the new C and Fortran interfaces will be an incentive to use them in future applications, but current applications that use the netCDF-2 interfaces will continue to work. Programs may be converted to the new interfaces incrementally, since a mixture of netCDF-2 and netCDF-3 calls is permitted.
The facilities for automatic type conversion in the new C and Fortran interfaces permit accessing numeric data using any convenient numeric type, independent of the external type of the data. For example, a user may access a variable as an array of double-precision floating-point numbers, even if the data is stored externally as 8-, 16-, or 32-bit integers or 32-bit floating-point numbers. Application programs can be simpler, since they don't have to deal with multiple external types, and can be more robust, since they continue to work even after a change to the external type of the data. This capability will be required in netCDF-4, when data may be represented externally in a packed form (for example, a packed array of 10-bit data) for which there is no natural corresponding native type.
Other new features of netCDF-3 include the ability to easily suppress buffering to facilitate sharing data among concurrent programs, the ability to specify whether 8-bit data is treated as signed or unsigned, improved support for 64-bit platforms, and new simple inquiry functions.
FAN (File Array Notation), a new package of utilities for netCDF, was recently made available (Davies 1996). The capabilities of the FAN utilities include extracting and manipulating array data from netCDF files, printing selected data from netCDF arrays, copying ASCII data into netCDF arrays, and performing various operations (sum, mean, max, min, product,...) on netCDF arrays.
With the current netCDF file format, no more than 2 gigabytes of data can be stored in a single netCDF file. This limitation is a result of 32-bit offsets currently used for storing positions within a file.
If it were possible to use a link variable to point to a specified cross-section of data in one or more other files, data could be shared by reference, without copying it. For example, an image loop could be represented by a small file containing a link variable pointing to image data in other files. To an application reading the link variable, it would appear as if the image data were in the file.
Specific additions to the netCDF data model might make some of these conventions unnecessary or allow some forms of metadata to be represented in a uniform and compact way. For example, adding explicit georeferencing to the netCDF data model would simplify elaborate georeferencing conventions at the cost of complicating the model. The problem is finding an appropriate trade-off between the richness of the model and its generality (i.e., its ability to encompass many kinds of data). A data model tailored to capture the shared context among researchers within one discipline may not be appropriate for sharing or combining data from multiple disciplines.
Another limitation of the current model is that only one unlimited (changeable) dimension is permitted for each netCDF data set. Multiple variables can share an unlimited dimension, but then they must all grow together. Hence the netCDF model does not permit variables with several unlimited dimensions or the use of multiple unlimited dimensions in different variables within the same file. Hence variables that have non-rectangular shapes (for example, ragged arrays) cannot be represented conveniently.
Both predefined and adaptive scaling will be supported. Parameters (scales and offsets) for packing will be permitted to vary along one or more variable dimensions. One or more exact and extreme values may be specified that will be preserved in packing and unpacking. Whether data is packed or not will be transparent to data readers, since the unpacking will be handled by the library. It will be possible to suppress unpacking and read the raw packed data, if desired.
The current 2 Gbyte file size limitation will also be eliminated with the use of 64-bit offsets in netCDF-4.
To support packed data and larger file sizes, netCDF-4 will require a new format, the first format change for netCDF. For backward compatibility with programs and data archives that use the current netCDF format, the netCDF-4 software must support access to data in both the old and new formats. Fortunately, netCDF already includes a format version number in the file format, so users and programs need not know whether they are accessing data in the old or new format. It will not be possible to add packed data to old format files, but otherwise the change should be relatively transparent.
It may be possible to add link variables for indirect data access to netCDF-4 as well. Our plans for this addition tentatively include the use of a combination of URL and FAN notation for specifying references to cross-sections of data in other files or on other hosts. This has the potential to make the usefulness of data independent of its location, permitting all the members of a virtual community to view and make use of their data holdings as a common resource.
Finally, we were surprised at how easy it was to provide a Java interface for netCDF data, based on an initial read-only Java interface from Joe Sirott (Java Climate Atlas Web Site, July 1996). The use of a Java-based approach to the design and implementation of distributed data access systems appears very promising. Systems based on Java's Remote Method Invocation package (RMI Web Site, September 1996) with portable data in forms such as netCDF may be able to provide powerful new capabilities that will be important for future applications, including independence from data location; executable content that is part of the metadata (for example, for georeferencing data); platform-independence for applications; the ability to write data clients and servers using simple abstract interfaces for data access; and a hierarchy of rich object models that make it easy to customize data access for particular applications.
Davies, H. L., 1995. "FAN - An array-oriented query language," Second Workshop on Database Issues for Data Visualization (Visualization '95), Atlanta, Georgia, IEEE.
DODS Web Site
http://dods.gso.uri.edu/DODS/
Fulker, D. W., 1991. "Unidata strawman for storing earth-referencing data," Seventh International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, New Orleans, Am. Meteor. Soc.
Kuehn, J. A., 1996. "Faster libraries for creating
network-portable self-describing datasets," Proceedings of
the 37th Cray User Group Meeting, Barcelona, Spain, Cray User
Group. NetCDF Web Site
http://www.unidata.ucar.edu/software/netcdf/
NetCDF Software Web Site
http://www.unidata.ucar.edu/software/netcdf/software.html
NetCDF Conventions Web Site
http://www.unidata.ucar.edu/software/netcdf/conventions.html
RMI Web Site
http://chatsubo.javasoft.com/current/
Rew, R. K. and G. P. Davis, 1990. "The Unidata netCDF: software for scientific data access," Sixth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Anaheim, California, Am. Meteor. Soc., 33-40.
Rew, R. K., G. P. Davis, S. Emmerson, and H. Davies, 1996. NetCDF User's Guide, An Interface for Data Access. (Available as PostScript or on the Web at <URL:http://www.unidata.ucar.edu/software/netcdf/docs.html>.
Sirott Web Site
http://cosmo.atmos.washington.edu/