[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #AMR-714212]: netcdf file size for limited vs unlimited



Hi Oscar,

> Writing multiple 1-dimensional variables (a time-series) to a
> netcdf-file formatted as nc_type "short", I noticed that the file
> becomes twice as large when using an unlimited versus a limited
> dimension definition.
> 
> However:
> 1. Writing one variable of nc_type 'short' only, both the limited and
> unlimited dimension files are of the same size...
> 2. Writing all data as floats the limited and unlimited dimension
> nc-files are of equal size (double the size of the limited dimension
> file of type short; as expected). It seems that using multiple variables
> of unlimited dimension means that the data is always written as a
> float?, or am I doing something wrong?

Dennis's answer was close, in that you need to know something about the
underlying netCDF-classic format to explain this.  The reason is that the 
space for each variable's data in a record is padded to the nearest multiple 
of 4-bytes.  This makes sure each variable's data starts on a 4-byte 
boundary, which is an optimization for disk seeks on some platforms.

There is a special case if there is only one record variable, in which case 
no padding is used for byte or short variables.  These padding rules are
documented in the format specification:

  http://www.unidata.ucar.edu/netcdf/docs/netcdf.html#NetCDF-Classic-Format

and specifically in the description of the "varslab", which is a record's worth
of data for a single variable, along with the special note at the end of the
specification on padding:

  Note on padding: In the special case of only a single record variable of 
character, byte, or short type, no padding is used between data values. 

As for a way to get around this problem, all I can think of is to use an
extra artifical dimension to make the short variables 2-dimensional, such as:

netcdf unlim2 {
dimensions:
  time = unlimited;
  two = 2;
variables:
  short var1(time, two);
  short var2(time, two);
data:
  var1 = 
 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19;
  var2 = 
 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19;
}

You can still read these values all at once in a contiguous block, and a
small layer of software would let you write the values two-at-a-time, using
a function you would call for each value that would save odd values and
write to the file when it had 2 values.

--Russ


> The files I write are quite large and I need to use an unlimited
> dimension as I don't know the record length in advance (I join multiple
> files into one nectdf file) but I don't like to waste double the
> disk-space my nc-files.
 
> I use Matlab to write nc-files and I tried the Matlab-native netcdf
> commands (example below), but also the mexcdf-toolbox and snctools. All
> give the same result. This seems to be more a netcdf than a Matlab
> issue. Any help is much appreciated though.
> 
> 
> EXAMPLE1 to illustrate this issue (Matlab native commands):
> %%%%%%%%%%%%%%%%%%%%%%%%
> N=80000;
> 
> % LIMITED dimension
> % creating a netcdf file
> nc = netcdf.create('testfile_lim.nc', 'NC_CLOBBER');
> % define dimension
> time_dim = netcdf.defDim(nc, 'time', N);
> % define variables
> var1_id = netcdf.defVar(nc, 'var1', 'short', time_dim);
> var2_id = netcdf.defVar(nc, 'var2', 'short', time_dim);
> netcdf.endDef(nc);
> % write data
> netcdf.putVar(nc, var1_id,int16([1:N]));
> netcdf.putVar(nc, var2_id,int16([1:N]));
> % close nc-file
> netcdf.close(nc)
> 
> % UNLIMITED dimension
> % creating a netcdf file
> nc = netcdf.create('testfile_unlim.nc', 'NC_CLOBBER');
> % define dimension
> time_dim = netcdf.defDim(nc, 'time',
> netcdf.getConstant('NC_UNLIMITED'));
> % define variables
> var1_id = netcdf.defVar(nc, 'var1', 'short', time_dim);
> var2_id = netcdf.defVar(nc, 'var2', 'short', time_dim);
> netcdf.endDef(nc);
> % write data
> netcdf.putVar(nc, var1_id,0,N,int16([1:N]));
> netcdf.putVar(nc, var2_id,0,N,int16([1:N]));
> % close nc-file
> netcdf.close(nc)
> %%%%%%%%%%%%%%%%%%%%%%%%
> 
> testfile_lim.nc => 312kB
> testfile_unlim.nc => 625kB
> 
> 
> 
> EXAMPLE2 to illustrate this issue (mexcdf commands):
> %%%%%%%%%%%%%%%%%%%%%%%%
> N=80000;
> 
> nc_lim = netcdf( 'test_lim.nc' , 'clobber');
> nc_unlim = netcdf( 'test_unlim.nc' , 'clobber');
> 
> nc_lim('time') = N;
> nc_unlim('time') = 0;
> 
> nc_lim{'var1'} = ncshort('time');
> nc_lim{'var2'} = ncshort('time');
> nc_unlim{'var1'} = ncshort('time');
> nc_unlim{'var2'} = ncshort('time');
> 
> 
> nc_unlim{'var1'}([1:N]) = int16([1:N]);  % Store data
> nc_unlim{'var2'}([1:N]) = int16([1:N]);  % Store data
> 
> nc_lim{'var1'}(:) = int16([1:N]);  % Store data
> nc_lim{'var2'}(:) = int16([1:N]);  % Store data
> 
> close(nc_lim);
> close(nc_unlim);%%%%%%%%%%%%%%%%%%%%%%%%
> 
> test_lim.nc => 312kB
> test_unlim.nc => 625kB
> 
> 
> thanks,............................Oscar Hartogensis
> 
> 
> ---------------------------------------------------------
> Oscar K Hartogensis
> Meteorology and Air Quality Group
> Wageningen University
> mail: PO Box 47, 6700 AA Wageningen, the Netherlands
> visit: Atlas, building 104, Droevendaalsesteeg 4,
> 6708 PB Wageningen, the Netherlands
> tel: +31 (0)317 482109
> fax: +31 (0)317 419000
> email: address@hidden
> url: www.met.wau.nl
> ---------------------------------------------------------
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: AMR-714212
Department: Support netCDF
Priority: Normal
Status: Closed