Re: [netcdfgroup] File size, unlimited dimensions, compression and chunks

Hi John,

Thanks for the reply - I've set the chunk size for the unlimited
dimensions to 512 - I've left any other dimension the same as default.
 This has indeed reduced the file size down to 350K so about exactly
where it should be.

I'm currently not using the fast register but will be later.

Thanks for everyones help and it looks like just beefing up the chunk
size on the unlimited dimension solves my problems.

Cheers,

Ross

On Mon, Jan 9, 2012 at 7:50 PM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
> Hi Ross:
>
> from
>
> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/nc_005fdef_005fvar_005fchunking.html#nc_005fdef_005fvar_005fchunking
>
> "Variables that make use of one or more unlimited dimensions, compression,
> or checksums must use chunking. Such variables are created with default
> chunk sizes of 1 for each unlimited dimension and the dimension length for
> other dimensions, except that if the resulting chunks are too large, the
> default chunk sizes for non-record dimensions are reduced."
>
> So you are putting each value in each variable in its own chunk. Thats got a
> lot of overhead.
>
> You want to set the chunk size explicitly, using the call:
>
>      int nc_def_var_chunking(int ncid, int varid, int storage, size_t
> *chunksizesp);
>
>
> (note that you must do that on each variable). Experiment with the size. 100
> might solve the problem but 900 might be better.
>
> Also, you might consider using Compound types instead of Groups.
> Also, I dont see the fast_reg dimension used below.
>
> John
>
>
>
> On 1/9/2012 10:15 AM, Ross Williamson wrote:
>
> Hi Ted,
>
> Thanks - I guess I'm under the impression that netcdf4 should always
> be used over netcdf3 (i.e. better).  One test I did do is to remove
> all unlimited variables and fix them to a size of 900 and the file
> size is basically what one expects (350K vs 5Mb) so it really is the
> unlimited dimensions that are causing that large file size.  I've cat
> the header to the netcdf file below in case anyone is interested - I
> would really like to keep the unlimited dimensions option available
> for data logging.
>
> I do use quite a few 2D dimensions and also two unlimited dimensions
> (fast and slow) where the fast has 100 samples for each slow. Once
> fully implemented I expect to be dumping about 2Mb/s to the netCDF
> file.
>
> Any advice much appreciated.
>
> Ross
>
> dimensions:
>       slow_reg = UNLIMITED ; // (900 currently)
>       fast_reg = UNLIMITED ; // (0 currently)
>
> group: array {
>
>   group: frame {
>     variables:
>       uint status(slow_reg) ;
>       ubyte received(slow_reg) ;
>       uint nsnap(slow_reg) ;
>       uint record(slow_reg) ;
>       double utc(slow_reg) ;
>       uint features(slow_reg) ;
>       uint markSeq(slow_reg) ;
>     } // group frame
>
>   group: pt415 {
>     variables:
>       uint status(slow_reg) ;
>       uint record(slow_reg) ;
>       ... (quite a few more in here)
>       float error_code(slow_reg) ;
>     } // group pt415
>
>   group: sim900 {
>     dimensions:
>       dvm_volts_dim2 = 4 ;
>       dvm_gnd_dim2 = 4 ;
>       dvm_ref_dim2 = 4 ;
>       therm_volts_dim2 = 4 ;
>       therm_temperature_dim2 = 4 ;
>     variables:
>       uint status(slow_reg) ;
>       uint record(slow_reg) ;
>       double utc(slow_reg) ;
>       float main_volt_monitor(slow_reg) ;
>       float main_current_monitor(slow_reg) ;
>       float main_power_monitor(slow_reg) ;
>       float main_undervoltage(slow_reg) ;
>       uint main_tick(slow_reg) ;
>       float dvm_volts(slow_reg, dvm_volts_dim2) ;
>       float dvm_gnd(slow_reg, dvm_gnd_dim2) ;
>       float dvm_ref(slow_reg, dvm_ref_dim2) ;
>       float therm_volts(slow_reg, therm_volts_dim2) ;
>       float therm_temperature(slow_reg, therm_temperature_dim2) ;
>       ... (Few more in here)
>       float bridge_output_value(slow_reg) ;
>     } // group sim900
>   } // group array
>
> group: antenna0 {
>
>   group: frame {
>     variables:
>       uint status(slow_reg) ;
>       ubyte received(slow_reg) ;
>       uint nsnap(slow_reg) ;
>       uint record(slow_reg) ;
>       double utc(slow_reg) ;
>       uint features(slow_reg) ;
>       uint markSeq(slow_reg) ;
>     } // group frame
>
>   group: acu {
>     variables:
>       uint status(slow_reg) ;
>       uint new_mode(slow_reg) ;
>       ...
>       uint px_checksum_error_count(slow_reg) ;
>       uint px_resyncing(slow_reg) ;
>     } // group acu
>
>   group: gpsTime {
>     variables:
>       uint status(slow_reg) ;
>       ...
>       uint serialNumber(slow_reg) ;
>     } // group gpsTime
>   } // group antenna0
>
> group: receiver {
>
>   group: frame {
>     variables:
>       uint status(slow_reg) ;
>       ubyte received(slow_reg) ;
>       uint nsnap(slow_reg) ;
>       uint record(slow_reg) ;
>       double utc(slow_reg) ;
>       uint features(slow_reg) ;
>       uint markSeq(slow_reg) ;
>     } // group frame
>
>   group: bolometers {
>     variables:
>       uint status(slow_reg) ;
>     } // group bolometers
>   } // group receiver
> }
>
> On Mon, Jan 9, 2012 at 4:56 PM, Ted Mansell <ted.mansell@xxxxxxxx> wrote:
>
> I don't think you can chunk an unlimited dimension by more than 1.  What are
> the variable dimensions?  Your formula makes it sound like they are 1-D and
> only sized by the unlimited dimension.  If that is the case, compression
> won't help.  You might be better off with a netcdf-3 file?
>
> -- Ted
>
> On Jan 9, 2012, at 8:15 AM, Ross Williamson wrote:
>
> I'm trying to get my head around the filesize of my netcdf-4 file -
> Some background.
>
> 1) I'm using the netcdf_c++4 API
> 2) I have an unlimited dimensions which I write data to about every second
> 3) There are a set of nested groups
> 4) I'm using compression on each variable
> 5) I'm using the default chunk size which I think is 1 for the
> unlimited dimensions and sizeof(type) for other dimensions
> 6) I take data for 900 samples - There are about 100 variables so I
> would expect (given doubles) a file size of 900x100x4 = 360K. Now I
> fully expect some level of overhead but my file sizes are 5MB which is
> incredibly large.
>
> Now compression doesn't make much difference (5Mb vs 5.3Mb).  I'm
> assuming here the thing that is screwing me over is that I haven't got
> my chuncking set right. The issue is that I'm rather confused.  It
> appears that you set the chunk size for each variable rather than the
> whole file which doesn't make sense to me.  Would I just say multiply
> each chunk size by say 100 so have 100 for the unlimited dimension and
> sizeof(type)*100 for other dimensions?
>
> I'd really like to fix this as netcdf-4 seems ideal for my project but
> I can't deal with a size overhead of an order of magnitude.
>
> I can attach the header of the netcdf file if it helps.
>
> Ross
>
> --
> Ross Williamson
> Associate Research Scientist
> Columbia Astrophysics Laboratory
> 212-851-9379 (office)
> 212-854-4653 (Lab)
> 312-504-3051 (Cell)
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/



-- 
Ross Williamson
Associate Research Scientist
Columbia Astrophysics Laboratory
212-851-9379 (office)
212-854-4653 (Lab)
312-504-3051 (Cell)



  • 2012 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: