Re: [netcdfgroup] nccopy -c does not rechunk properly (4.3.1.1)

Hi Simon,

In relation to a problem you noticed with using nccopy to rechunk data,
you asked:
> Is there an obvious mistake on my side or might there be a problem with
> variables in groups?
> 
> I am using netcdf library version 4.3.1.1 of Feb 26 2014 12:06:45

You encountered undocumented behavior in nccopy, but it wasn't related
to groups.

The new chunk size you chose for 4-byte float data, 820 by 1, results in
chunks of 3280 bytes, which is less than the (undocumented) threshold
for minimum chunk sizes, set in nccopy.c:

  #define CHUNK_THRESHOLD (8192)   /* variables with fewer bytes don't get 
chunked */

Instead of using the smaller chunk size you requested, nccopy used
default chunking for your variable, resulting in the weird 55 by 17856
chunks (approximately proportional to the shape of your original
variable, 820 by 249984).

The intent of the CHUNK_THRESHOLD minimum is to not create chunks
smaller than a physical disk block, as an I/O optimization, because
attempting to read a smaller chunk will still cause a whole disk block
to be read.  So as a workaround, you could specify 820 by 3 chunks
instead and get the same efficiency as 820 by 1 chunks, assuming your
physical disk blocks are 8192 bytes.  However, I think for the next
release, we should lower the default threshold to 512 bytes, and
document the behavior.

Thanks for reporting the problem!

--Russ

Simon Stähler wrote:
> I want to use the nccopy script to change the chunking of a large 2D
> dataset (first dimension time ("snapshots"), second point index
> ("gllpoints_all")). The original file has the structure:
> 
> $ ncdump -sch original.nc
> netcdf original {
> dimensions:
>       snapshots = 820 ;
>       gllpoints_all = 249984 ;
> variables:
> 
> // global attributes:
>               :npoints = 249984 ;
> 
> group: Snapshots {
>   variables:
>       float strain_dsus(snapshots, gllpoints_all) ;
>               strain_dsus:_Storage = "chunked" ;
>               strain_dsus:_ChunkSizes = 1, 249984 ;
>   } // group Snapshots
> 
> 
> For further processing of the file, I want to change the chunks so that
> each contains all the time steps at one point.
> I do this with
> 
> $ nccopy -c "snapshots/820,gllpoints_all/1" original.nc new.nc
> 
> However, the resulting chunk sizes are somewhat weird:
> {55, 17856} instead of {820, 1}:
> 
> $ ncdump -sch new.nc
> netcdf axisem_output_3 {
> dimensions:
>       snapshots = 820 ;
>       gllpoints_all = 249984 ;
> 
> // global attributes:
>               :npoints = 249984 ;
> 
> group: Snapshots {
>   variables:
>       float strain_dsus(snapshots, gllpoints_all) ;
>               strain_dsus:_Storage = "chunked" ;
>               strain_dsus:_ChunkSizes = 55, 17856 ;
>   } // group Snapshots
> 
> Is there an obvious mistake on my side or might there be a problem with
> variables in groups?
> 
> I am using netcdf library version 4.3.1.1 of Feb 26 2014 12:06:45
> 
> cheers,
> 
> Simon Stähler



  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: