Re: NC_SHORT alignment, unlimited dimension

Alexey,

> When playing with NetCDF, I found that files usually take twice as
> much space as I would expect. A close examinationg with od -x and 
> less demonstrates that half of the space is not used.
> 
> I realize that every record should be aligned at 4-byte boundary, but it 
> looks like every member of record structure is aligned at 4-byte
> boundary as well.

There is a special case that might be of some help.  If there is only
one record variable, the format drops the restriction that each record
is 4-byte aligned, so there is no record padding and no wasted space,
even for shorts or bytes.  For example, this file wastes no space:

  netcdf t1 {
  dimensions:
              time = UNLIMITED ;
  variables:
              short array1(time) ;
  data:

  array1 = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1;
  }

Brian Eaton's suggestion of adding an extra even dimension such as

  netcdf t2 {
  dimensions:
           d2 = 2 ;
           time = UNLIMITED ; // (100 currently)
  variables:
           short array1(time,d2) ;
           short array2(time,d2) ;
 ...
  }

is another way to avoid wasting space when storing shorts as record
variables when you have multiple record variables, but is not
necessary if there is only one record variable.

Also, if you don't need the UNLIMITED dimension because you won't be
appending data to the files, using a fixed size dimension eliminates
the waste of space when storing multiple arrays of shorts, as in:

  netcdf t4 {
  dimensions:
          time = 100 ;
  variables:
          short array1(time) ;
          short array2(time) ;
  data:

   array1 = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      1, 1, 1, 1, 1, 1;

   array2 = 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
      2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
      2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
      2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
      2, 2, 2, 2, 2, 2;
  }

I think the alignment restriction for record variables was deemed
necessary for efficient access to data on platforms that require
32-bit alignment for disk seeks.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
russ@xxxxxxxxxxxxxxxx                     http://www.unidata.ucar.edu


  • 2001 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: