[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump



Hi James,

> We recently began working on a transition from netcdf 3.6.2 to 4.1.1.
> 
> The process was trouble free and things seem to be working, but we have
> been surprised to find the HDF variant producing extremely large files
> relative to the old netcdf native form.  Our measurement files are already
> enormous, and further growth would be deadly.
> 
> Has anyone else encountered this?

There is a larger fixed-size overhead for metadata (names and
properties of variables, dimensions, and attributes) in the HDF5-based
netCDF-4 format, but in our experience, it's not significant for files
with lots of data and only a moderate amount of metadata.  And use of
compression can make equivalent netCDF-4 files significantly smaller
than netCDF-3 classic format files.

As an example we use in our netCDF training workshop, a small netCDF
classic format file with only one dimension of size 2 and one variable
that uses that dimension is very small using netCDF classic or 64-bit
offset formats:

    88  test.nc1   # classic format
    92  test.nc2   # 64-bit -offset format
  5072  test.nc3   # netCDF-4 format
  5108  test.nc4   # netCDF-4 -classic model format

However, if you change the dimension size to 10000, the sizes are much
closer: 

 40080  test.nc1   # classic format                
 40084  test.nc2   # 64-bit -offset format         
 45064  test.nc3   # netCDF-4 format               
 45101  test.nc4   # netCDF-4 -classic model format

And if you apply level-1 compression to the variable in the netCDF-4
format, the netCDF-4 file is significantly smaller for this
(artificial) data:

 40080  test.nc1   # classic format                
 40084  test.nc2   # 64-bit -offset format         
 21055  test.nc3   # netCDF-4 format               
 21092  test.nc4   # netCDF-4 -classic model format

Finally, if you apply the shuffle filter along with compression for
this test file, the result is significantly better compression:

 40080  test.nc1   # classic format                
 40084  test.nc2   # 64-bit -offset format         
  7777  test.nc3   # netCDF-4 format               
  7814  test.nc4   # netCDF-4 -classic model format

It's easy to run little experiments like this with the "nccopy"
utility in the latest netCDF snapshot release (soon to be in version
4.1.2), as you can specify conversions and compression on the command
line: 

  
http://www.unidata.ucar.edu/netcdf/workshops/2010/utilities/NccopyExamples.html
  
This is a very articficial example and it's unlikely you'll get
results as good with your real data, but experimenting with nccopy's
compression options on some real data could determine what you can
expect in using netCDF 4 for your data.

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: AIQ-275071
Department: Support netCDF
Priority: Normal
Status: Closed