Re: Unexpected File Growth

Ken Prada wrote:

>We record raw data on both ships and buoys in netCDF form.  The use of netCDF
>throughout the acquisition/processing/analysis sequence has been instrumental
>in improving our ability to handle and share data and tools.

Since that was our intent, were glad to hear it.

>In a buoy scenario power and storage space are the predominant
>controls.  Most of our variables are stored NC_FLOAT.  To accomodate an
>additional, low resolution, variable I chose NC_BYTE.  In a file with
>more than 300,000 records I expected a growth of 300,000 bytes.  I was
>suprised when the growth was four times that.  Consequently, the
>storage considerations will need re-thinking.

What can I tell you?  In our implementation of netCDF we chose to
round-up contiguous portions of a variable to the nearest 32-bits to
speed-up access.  This means that an 1-octet variable whose
one-dimensional shape is along the record dimension will be stored in
4-octets in each record.  Other, alternative implementations are
possible.

In a way, it's kind of funny.  When we first came out with netCDF,
a lot of people were concerned about our decision to have all data pass
through an XDR layer.  They feared the effect that decision would have
on access times.  So we rounded all contigous data up to 32-bits for
increased speed.

Now were getting hit for wasting space.  ;-)

Oh well.  I guess when you're the only netCDF implemenation on the block
you're the focus of *all* commentary (both good and bad).

>Yes, I have considered a non-netCDF format for raw recording.  However,
>the integrity of netCDF and its convenience to raw data distribution
>are too valuable.

I must admit, we never considered power-starved data-recording devices
when we designed our netCDF implementation (then again, we never
considered half the things netCDF is getting used for ;-).  Sorry about
that.

>If there are few advantages to the use of lower resolution (non-float)
>variables, why not declare all integer storage to be NC_LONG?

If it's for a one-dimensional vector which is broken across record 
boundaries (contact me if you don't understand what that means) then
there is no benefit is declaring it other than NC_LONG or NC_FLOAT (in
our netCDF implementation).

Variables having two or more non-record dimensions, however, do benefit
(in our implementation) from using the smallest, capacious-enough variable
type.

>Cheers,
>   ken

Regards,
Steve


  • 1992 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: