[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #KSJ-559079]: Netcdf 4 file format's general



Hi Jin,

> > 1. If we used Netcdf4 format of file system, is it store larger
> > amount of data than Netcdf3?
> It is not easy to answer that question. I can think of
> two factors that will affect the answer.
> 
> First, if fill is not used in either case, then netcdf-3 files
> should be smaller than a corresponding
> netcdf-4 file. If fill is enabled, then it may be the case that
> the netcdf-4 file is smaller because, if memory serves, it can
> avoid actuallyallocating space for the fill data until the data is
> actually read.
> 
> Second, The record dimension (UNLIMITED) can affect the space used.
> In netcdf-3, if many variables have a first unlimited dimension, and
> the number of records has to grow for only a single variable, then
> significant space can be allocated for the other variables as well.
> The netcdf-4 format can avoid this.

Right, and those are important for large datasets.  But for small datasets or
large datasets represented in many small files, HDF5 will generally require
more space due to larger fixed overhead per variable, and use of B-trees
internally for indexing chunks.  There are some pathological examples,
e.g. a large number of record variables, each with a small number of values 
per record, where netCDF-3 can store the data very compactly compared
with HDF5.

> >2. If we used Netcdf4, is it faster than Netcdf3 to write and read?
> I am not sure of the answer. Perhaps other people here can comment. [Russ?].

It depends.  The use of chunking and compression can make accessing
subsets of multidimensional data in netCDF-4 significantly faster than
netCDF-3 in some cases.  However, netCDF-4 access can be slower if the
chunk shapes and sizes aren't appropriate for common data access
patterns, especially if large chunks need to be uncompressed to access
small amounts of data, or if chunks must be repeatedly compressed or
uncompressed due to inadequate chunk cache.

If you are just reading data in the same order in which it was written
and data is not compressed, the two formats are approximately
equivalent.  NetCDF-4 access is faster when there are a very large
number of variables or attributes, as it indexes those for O(log N)
access, whereas netCDF-3 just locates variables and attributes by name
in a file with a simple O(N) search, where N is the number of
attributes or variables.

For parallel I/O, netCDF-3 has a performance advantage, due to the
simpler data layout.

For adding new metadata to an existing file, netCDF-4 is superior,
because it never has to move data to make space for large amounts of
new metadata in a file header, because metadata is appended

--Russ

> >3. Does the Netcdf Java library used Netcdf C libarary to read file of 
> >Netcdf4
> > file format? Is it faster way to read the file?
> For reading, the Java library does not actually need to use the C library.
> However Java is likely to be slower to some degree than the C library.
> I should note that in the newest java library, netcdf-4 file writing
> is possible and it uses the c-library to do that writing.
> 
> 
> 
> 
> =Dennis Heimbigner
> Unidata
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: KSJ-559079
Department: Support netCDF
Priority: Normal
Status: Closed