[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump



James,

> I am advised that you should be able to get the following via
> anonymous ftp:
> 
> 
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.fnc> Fluid_Meas.fnc
> 
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.fnc-nccopy-k3>
> Fluid_Meas.fnc-nccopy-k3
> 
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.snc> Fluid_Meas.snc
> 
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.snc-nccopy-k3>
> Fluid_Meas.snc-nccopy-k3

Thanks, I see what you mean!  We'll have to investigate why the 
netCDF-4 copies of these netCDF classic format files are so much 
larger than expected (e.g. 42 MB classic file but 96 MB netCDF-4 
file, and ncdump shows not a lot of metadata).  I don't currently 
have an explanation, but it could be a bug.

--Russ

> >> Thanks for the reply.  If the difference were metadata, wouldn't we
> >> expect to see the greatest difference between the netcdf-3 firnat
> >> and HDF with smaller data files?  In fact, we're finding the
> >> opposite.
> >
> > Yes, if you only have a moderate amount of metadata and lots of data,
> > HDF5 files would be much larger with a small amount of data but similar
> > in size with a large amount of data.
> >
> > If, however, you had lots of metadata (for example 5000 variables and
> > 5000 dimensions), then the HDF5 files might appear significantly larger
> > even with lots of data.
> >
> >> We would like to share some larger data files with you guys in
> >> order to better understand the situation.  Would you be willing to
> >> pick some data up from our ftp site?
> >
> > Yes, that would be useful.
> >
> > --Russ
> >
> >> > Hi James,
> >> >
> >> >> We recently began working on a transition from netcdf 3.6.2 to 4.1.1.
> >> >>
> >> >> The process was trouble free and things seem to be working, but we
> >> have
> >> >> been surprised to find the HDF variant producing extremely large
> >> files
> >> >> relative to the old netcdf native form.  Our measurement files are
> >> >> already
> >> >> enormous, and further growth would be deadly.
> >> >>
> >> >> Has anyone else encountered this?
> >> >
> >> > There is a larger fixed-size overhead for metadata (names and
> >> > properties of variables, dimensions, and attributes) in the HDF5-based
> >> > netCDF-4 format, but in our experience, it's not significant for files
> >> > with lots of data and only a moderate amount of metadata.  And use of
> >> > compression can make equivalent netCDF-4 files significantly smaller
> >> > than netCDF-3 classic format files.
> >> >
> >> > As an example we use in our netCDF training workshop, a small netCDF
> >> > classic format file with only one dimension of size 2 and one variable
> >> > that uses that dimension is very small using netCDF classic or 64-bit
> >> > offset formats:
> >> >
> >> >     88  test.nc1   # classic format
> >> >     92  test.nc2   # 64-bit -offset format
> >> >   5072  test.nc3   # netCDF-4 format
> >> >   5108  test.nc4   # netCDF-4 -classic model format
> >> >
> >> > However, if you change the dimension size to 10000, the sizes are much
> >> > closer:
> >> >
> >> >  40080  test.nc1   # classic format
> >> >  40084  test.nc2   # 64-bit -offset format
> >> >  45064  test.nc3   # netCDF-4 format
> >> >  45101  test.nc4   # netCDF-4 -classic model format
> >> >
> >> > And if you apply level-1 compression to the variable in the netCDF-4
> >> > format, the netCDF-4 file is significantly smaller for this
> >> > (artificial) data:
> >> >
> >> >  40080  test.nc1   # classic format
> >> >  40084  test.nc2   # 64-bit -offset format
> >> >  21055  test.nc3   # netCDF-4 format
> >> >  21092  test.nc4   # netCDF-4 -classic model format
> >> >
> >> > Finally, if you apply the shuffle filter along with compression for
> >> > this test file, the result is significantly better compression:
> >> >
> >> >  40080  test.nc1   # classic format
> >> >  40084  test.nc2   # 64-bit -offset format
> >> >   7777  test.nc3   # netCDF-4 format
> >> >   7814  test.nc4   # netCDF-4 -classic model format
> >> >
> >> > It's easy to run little experiments like this with the "nccopy"
> >> > utility in the latest netCDF snapshot release (soon to be in version
> >> > 4.1.2), as you can specify conversions and compression on the command
> >> > line:
> >> >
> >> >   
> >> > http://www.unidata.ucar.edu/netcdf/workshops/2010/utilities/NccopyExamples.html
> >> >
> >> > This is a very articficial example and it's unlikely you'll get
> >> > results as good with your real data, but experimenting with nccopy's
> >> > compression options on some real data could determine what you can
> >> > expect in using netCDF 4 for your data.
> >> >
> >> > --Russ
> >> >
> >> > Russ Rew                                         UCAR Unidata Program
> >> > address@hidden                      http://www.unidata.ucar.edu
> >> >
> >> >
> >> >
> >> > Ticket Details
> >> > ===================
> >> > Ticket ID: AIQ-275071
> >> > Department: Support netCDF
> >> > Priority: Normal
> >> > Status: Closed
> >> >
> >>
> >>
> >>
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: AIQ-275071
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> 
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: AIQ-275071
Department: Support netCDF
Priority: Normal
Status: Closed