Aligning NC_SHORT, unlimited dimension

Sorry if this message is repeated --- I had trouble with majordomo.

I have question about packing 1-d NC_SHORT arrays with unlimited
dimension.

We have gigabytes of telescope data, stored as 2 byte integer time
traces. I am trying to move our data acquisition and archival system
from homegrown format to NetCDF.

When playing with NetCDF, I found that files usually take twice as
much space as I would expect. A close examinationg with od -x and lessdemonstrates that half of the space is not used.

I realize that every record should be aligned at 4-byte boundary, but it looks like every member of record structure is aligned at 4-byte
boundary as well.

Here is the a small file demostrating the problem:

netcdf t2 {
dimensions:
      time = UNLIMITED ; // (100 currently)
variables:
      short array1(time) ;
      short array2(time) ;
data:

array1 = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1 ;

array2 = 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2,
  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2,
  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2,
  2, 2, 2, 2, 2, 2 ;
}


and here is hex dump of this 924 byte (I'd expect it to be 524 byte)
file:

0000000 4443 0146 0000 6400 0000 0a00 0000 0100
0000020 0000 0400 6974 656d 0000 0000 0000 0000
0000040 0000 0000 0000 0b00 0000 0200 0000 0600
0000060 7261 6172 3179 0000 0000 0100 0000 0000
0000100 0000 0000 0000 0000 0000 0300 0000 0400
0000120 0000 7c00 0000 0600 7261 6172 3279 0000
0000140 0000 0100 0000 0000 0000 0000 0000 0000
0000160 0000 0300 0000 0400 0000 8000 0100 0180
0000200 0200 0180 0100 0180 0200 0180 0100 0180
0000240 0200 0180 0100 0180 0200 0180 0100 0180
0000280 0200 0180 0100 0180 0200 0180 0100 0180
*
0001620 0200 0180 0100 0180 0200 0180
0001634
           ^^^^      ^^^^      ^^^^      ^^^^


As you see, half of the space is filled by 0x0180 --- -32767, standartNC_SHORT fill value.


Is it possible to do something about it as wasting half of disk space
is not really an option?

Software: netcdf-3.5b3 on Intel Redhat-6.2

Thanks a lot for your attention!

From owner-netcdfgroup@xxxxxxxxxxxxxxxx 08 2001 Apr -0700 07:06:38
Message-ID: <m3y9tbqydt.fsf@xxxxxxxxxxxxxxxxxxx>
Date: 08 Apr 2001 07:06:38 -0700
From: Alexey Goldin <Alexey.Goldin@xxxxxxxxxxxx>
In-Reply-To: "Craig A. Mattocks"'s message of "Sun, 8 Apr 2001 01:42:24 -0400"
To: "Craig A. Mattocks" <morfz@xxxxxxxxxxxxxx>
Subject: Re: NC_SHORT alignment, unlimited dimension
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id f38IYRw11773
        for netcdfgroup-out; Sun, 8 Apr 2001 12:34:27 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200104081834.f38IYPL11769
X-Authentication-Warning: allegro.caltech.edu: goldin set sender to 
Alexey.Goldin@xxxxxxxxxxxx using -f
Cc: netcdfgroup@xxxxxxxxxxxxxxxx
References: <3ACE4BEC.7020006@xxxxxxxxxxxx>
        <p05010407b6f5a9b98361@[216.192.203.22]>
Lines: 47
User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdfgroup@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: Alexey Goldin <Alexey.Goldin@xxxxxxxxxxxx>



"Craig A. Mattocks" <morfz@xxxxxxxxxxxxxx> writes:

At 4:06 PM -0700 4/6/01, Alexey Goldin wrote:

>Is it possible to do something about it as wasting half of disk space
>is not really an option?

Have you seen this site:

http://snow.cit.cornell.edu/noon/z_netcdf.html

I'd like to avoid this option. One of the main attraction of NetCDF
format for us is possibility of reading it directly from IDL and lots
of other programs like grace, Data Explorer ..... If we need to
recompile all of them, we could just as well modify them to use our
existing format. We already have interface to IDL.



Also, the bzip2 file compressor (which works like gzip):

http://sources.redhat.com/bzip2/

ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-1.0.1.tar.gz

seems to do the best job on NetCDF files. You can always
compress/uncompress files on the fly using a Fortran or C SYSTEM call
to bzip2.


But often times that means uncompressing a 1 Meg file to get one
record of data.


I'd better find a way to use only 2 bytes (rather then 4) for each
NC_SHORT in uncompressed file. Is it possible when using UNLIMITED
dimension?

Hope these ideas are helpful,
Craig

Thanks, but is it the only way to handle this problem? It was not even
obvious to me from documentation.

  • 2001 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: