[netcdfgroup] File size, unlimited dimensions, compression and chunks

To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: [netcdfgroup] File size, unlimited dimensions, compression and chunks
From: Ross Williamson <rosswilliamson.columbia@xxxxxxxxx>
Date: Mon, 9 Jan 2012 14:15:31 +0000

I'm trying to get my head around the filesize of my netcdf-4 file -
Some background.

1) I'm using the netcdf_c++4 API
2) I have an unlimited dimensions which I write data to about every second
3) There are a set of nested groups
4) I'm using compression on each variable
5) I'm using the default chunk size which I think is 1 for the
unlimited dimensions and sizeof(type) for other dimensions
6) I take data for 900 samples - There are about 100 variables so I
would expect (given doubles) a file size of 900x100x4 = 360K. Now I
fully expect some level of overhead but my file sizes are 5MB which is
incredibly large.

Now compression doesn't make much difference (5Mb vs 5.3Mb).  I'm
assuming here the thing that is screwing me over is that I haven't got
my chuncking set right. The issue is that I'm rather confused.  It
appears that you set the chunk size for each variable rather than the
whole file which doesn't make sense to me.  Would I just say multiply
each chunk size by say 100 so have 100 for the unlimited dimension and
sizeof(type)*100 for other dimensions?

I'd really like to fix this as netcdf-4 seems ideal for my project but
I can't deal with a size overhead of an order of magnitude.

I can attach the header of the netcdf file if it helps.

Ross

-- 
Ross Williamson
Associate Research Scientist
Columbia Astrophysics Laboratory
212-851-9379 (office)
212-854-4653 (Lab)
312-504-3051 (Cell)

Follow-Ups:
- Re: [netcdfgroup] File size, unlimited dimensions, compression and chunks
  - From: Ted Mansell

2012 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: