Re: [netcdfgroup] NF90_SYNC question

To: Leon Maurer <leon.maurer@xxxxxxxxx>
Subject: Re: [netcdfgroup] NF90_SYNC question
From: Russ Rew <russ@xxxxxxxxxxxxxxxx>
Date: Fri, 22 Feb 2013 10:32:03 -0700

Hi Leon,

> Thanks for mentioning chunk sizing; that's not something I had thought
> about. I've got one unlimited dimension, and it sounds like that means an
> inefficient default chunks size <
> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Default-Chunking.html
> #Default-Chunking>.
> ("For
> unlimited dimensions, a chunk size of one is always used." What's the
> unit?
> One DEFAULT_CHUNK_SIZE? Maybe it'll become clear as I read more.)
> 

It means that if you have a variable with an unlimited dimension, such
as 

   float var(time, lon, lat) 

where time is unlimited, then the default chunks will be of shape 

   1 x clon x clat

values (not bytes), for integers clon, clat computed to be smaller
than but proportioanl to the sizes of the lon and lat dimensions,
resulting in a default chunksize close to but less than 4 MB (so in
this case each chunk has about 1 million values).  These default
chunks are not necessarily good for some kinds of access.  A good
chunk size and shape may depend on anticipated access patterns as well
as disk block size of the file system on which the data is stored.

I've started a series of blog postings about chunk shapes and sizes,
but so far only posted the first part:

http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters

Eventually, with feedback on these, better guidance and software
defaults for chunking may result.  I'll try to post the second
installment next week.

> I guess I've got some reading ahead of me. For resources, I see the
> powerpoint presentation<http://hdfeos.org/workshops/ws13/presentations/day1/H
> DF5-EOSXIII-Advanced-Chunking.ppt>that's
> linked to and the HDF5 page on
> chunking <http://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/>. Do you have
> any other recommendations?

I liked these papers, though they get a bit technical:

  Efficient Organization of Large Multidimensional Arrays

http://cs.brown.edu/courses/cs227/archives/2008/Papers/FileSystems/sarawagi94efficient.pdf

  Optimal Chunking of Large Multidimensional Arrays for Data Warehousing
  http://www.escholarship.org/uc/item/35201092

--Russ  

> Thanks.
> -Leon
> 
> On Wed, Feb 20, 2013 at 4:31 PM, Russ Rew <russ@xxxxxxxxxxxxxxxx> wrote:
> >
> > Large chunk sizes might mean a lot of extra I/O, as well as extra CPU
> > for uncompressing the same data chunks repeatedly.  You might see if
> > lowering your chunk size significantly improves network usage ...

Follow-Ups:
- Re: [netcdfgroup] NF90_SYNC question
  - From: Leon Maurer

References:
- [netcdfgroup] NF90_SYNC question
  - From: Leon Maurer
- Re: [netcdfgroup] NF90_SYNC question
  - From: Russ Rew
- Re: [netcdfgroup] NF90_SYNC question
  - From: Leon Maurer

2013 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: