Re: [netcdfgroup] File size, unlimited dimensions, compression and chunks

Hi Ted,

Thanks - I guess I'm under the impression that netcdf4 should always
be used over netcdf3 (i.e. better).  One test I did do is to remove
all unlimited variables and fix them to a size of 900 and the file
size is basically what one expects (350K vs 5Mb) so it really is the
unlimited dimensions that are causing that large file size.  I've cat
the header to the netcdf file below in case anyone is interested - I
would really like to keep the unlimited dimensions option available
for data logging.

I do use quite a few 2D dimensions and also two unlimited dimensions
(fast and slow) where the fast has 100 samples for each slow. Once
fully implemented I expect to be dumping about 2Mb/s to the netCDF
file.

Any advice much appreciated.

Ross

dimensions:
        slow_reg = UNLIMITED ; // (900 currently)
        fast_reg = UNLIMITED ; // (0 currently)

group: array {

  group: frame {
    variables:
        uint status(slow_reg) ;
        ubyte received(slow_reg) ;
        uint nsnap(slow_reg) ;
        uint record(slow_reg) ;
        double utc(slow_reg) ;
        uint features(slow_reg) ;
        uint markSeq(slow_reg) ;
    } // group frame

  group: pt415 {
    variables:
        uint status(slow_reg) ;
        uint record(slow_reg) ;
        ... (quite a few more in here)
        float error_code(slow_reg) ;
    } // group pt415

  group: sim900 {
    dimensions:
        dvm_volts_dim2 = 4 ;
        dvm_gnd_dim2 = 4 ;
        dvm_ref_dim2 = 4 ;
        therm_volts_dim2 = 4 ;
        therm_temperature_dim2 = 4 ;
    variables:
        uint status(slow_reg) ;
        uint record(slow_reg) ;
        double utc(slow_reg) ;
        float main_volt_monitor(slow_reg) ;
        float main_current_monitor(slow_reg) ;
        float main_power_monitor(slow_reg) ;
        float main_undervoltage(slow_reg) ;
        uint main_tick(slow_reg) ;
        float dvm_volts(slow_reg, dvm_volts_dim2) ;
        float dvm_gnd(slow_reg, dvm_gnd_dim2) ;
        float dvm_ref(slow_reg, dvm_ref_dim2) ;
        float therm_volts(slow_reg, therm_volts_dim2) ;
        float therm_temperature(slow_reg, therm_temperature_dim2) ;
        ... (Few more in here)
        float bridge_output_value(slow_reg) ;
    } // group sim900
  } // group array

group: antenna0 {

  group: frame {
    variables:
        uint status(slow_reg) ;
        ubyte received(slow_reg) ;
        uint nsnap(slow_reg) ;
        uint record(slow_reg) ;
        double utc(slow_reg) ;
        uint features(slow_reg) ;
        uint markSeq(slow_reg) ;
    } // group frame

  group: acu {
    variables:
        uint status(slow_reg) ;
        uint new_mode(slow_reg) ;
        ...
        uint px_checksum_error_count(slow_reg) ;
        uint px_resyncing(slow_reg) ;
    } // group acu

  group: gpsTime {
    variables:
        uint status(slow_reg) ;
        ...
        uint serialNumber(slow_reg) ;
    } // group gpsTime
  } // group antenna0

group: receiver {

  group: frame {
    variables:
        uint status(slow_reg) ;
        ubyte received(slow_reg) ;
        uint nsnap(slow_reg) ;
        uint record(slow_reg) ;
        double utc(slow_reg) ;
        uint features(slow_reg) ;
        uint markSeq(slow_reg) ;
    } // group frame

  group: bolometers {
    variables:
        uint status(slow_reg) ;
    } // group bolometers
  } // group receiver
}

On Mon, Jan 9, 2012 at 4:56 PM, Ted Mansell <ted.mansell@xxxxxxxx> wrote:
> I don't think you can chunk an unlimited dimension by more than 1.  What are 
> the variable dimensions?  Your formula makes it sound like they are 1-D and 
> only sized by the unlimited dimension.  If that is the case, compression 
> won't help.  You might be better off with a netcdf-3 file?
>
> -- Ted
>
> On Jan 9, 2012, at 8:15 AM, Ross Williamson wrote:
>
>> I'm trying to get my head around the filesize of my netcdf-4 file -
>> Some background.
>>
>> 1) I'm using the netcdf_c++4 API
>> 2) I have an unlimited dimensions which I write data to about every second
>> 3) There are a set of nested groups
>> 4) I'm using compression on each variable
>> 5) I'm using the default chunk size which I think is 1 for the
>> unlimited dimensions and sizeof(type) for other dimensions
>> 6) I take data for 900 samples - There are about 100 variables so I
>> would expect (given doubles) a file size of 900x100x4 = 360K. Now I
>> fully expect some level of overhead but my file sizes are 5MB which is
>> incredibly large.
>>
>> Now compression doesn't make much difference (5Mb vs 5.3Mb).  I'm
>> assuming here the thing that is screwing me over is that I haven't got
>> my chuncking set right. The issue is that I'm rather confused.  It
>> appears that you set the chunk size for each variable rather than the
>> whole file which doesn't make sense to me.  Would I just say multiply
>> each chunk size by say 100 so have 100 for the unlimited dimension and
>> sizeof(type)*100 for other dimensions?
>>
>> I'd really like to fix this as netcdf-4 seems ideal for my project but
>> I can't deal with a size overhead of an order of magnitude.
>>
>> I can attach the header of the netcdf file if it helps.
>>
>> Ross
>>
>> --
>> Ross Williamson
>> Associate Research Scientist
>> Columbia Astrophysics Laboratory
>> 212-851-9379 (office)
>> 212-854-4653 (Lab)
>> 312-504-3051 (Cell)
>>
>> _______________________________________________
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit: 
>> http://www.unidata.ucar.edu/mailing_lists/
>



-- 
Ross Williamson
Associate Research Scientist
Columbia Astrophysics Laboratory
212-851-9379 (office)
212-854-4653 (Lab)
312-504-3051 (Cell)



  • 2012 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: