[netcdfgroup] chunking and concatenating in NCO version 4.0.3

To: netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>
Subject: [netcdfgroup] chunking and concatenating in NCO version 4.0.3
From: Jennifer Adams <jma@xxxxxxxxxxxxx>
Date: Tue, 7 Sep 2010 19:56:14 -0400

Dear Experts,

My question is about the NetCDF Operators -- I have copied this postto the NCO forum on sourceforge.net, but I am also writing to thisgroup because it increases my chances of getting feedback. Someone onthis list who doesn't read that forum regularly may be able to help.

I am using nco-4.0.3 linked with netcdf 4.1-rc1 udunits2.1.11 andhdf5-1.8.4 on an intel Mac running OSX 10.5.8.

Using GrADS I have created a trio of 4-dimensional chunked netcdf-4files, each with 8 time steps (one file per day), and no recorddimension. The (edited) ncdump output for the first file looks likethis:


dimensions:
        lon = 720 ;
        lat = 361 ;
        lev = 3 ;
        time = 8 ;
variables:
       double time(time) ;
                time:units = "minutes since 2009-02-19 00:00" ;
       float t2(time, lev, lat, lon) ;
                t2:_ChunkSizes = 1, 1, 361, 720 ;
data:
    time = 0, 180, 360, 540, 720, 900, 1080, 1260 ;

Other files in the trio are the same except for the origin of the timeaxis: "minutes since 2009-02-20 00:00", etc.

I would like to concatenate my three files into one file with 24 timesteps and maintain the chunk sizes. Following the documentation, I amexecuting the following commands (saving intermediate files out*.nc4for debugging purposes):


ncecat -O -h t2m.19feb2009.nc4 out1.nc4
ncpdq  -O -h -a time,record out1.nc4 out2.nc4
ncwa   -O -h -a record out2.nc4 out3.nc4
ncrcat -O -h out3.nc4 t2m.20feb2009.nc4 t2m.21feb2009.nc4 out4.nc4

The (edited) ncdump output from the final file (out4.nc4) looks likethis:


dimensions:
        lon = 720 ;
        lat = 361 ;
        lev = 3 ;
        time = UNLIMITED ; // (24 currently)
variables:
        double time(time) ;
                time:units = "minutes since 2009-02-19 00:00" ;
        float t2(time, lev, lat, lon) ;
                t2:_ChunkSizes = 1, 3, 361, 720 ;
data:

time = 0, 180, 360, 540, 720, 900, 1080, 1260, 0, 180, 360, 540,720, 900,

              1080, 1260, 0, 180, 360, 540, 720, 900, 1080, 1260 ;

Problems:

1. The time axis values are wrong for timesteps 9-24 in the finaloutput. Is there a way to make ncrcat notice the different origins ofthe time axes in the files it is concatenating and adjust accordingly?2. The chunk sizes are changed so that the lev dimension chunk size is> 1.3. The size of the output is much too large. Changing the time axis toa record dimension more than doubles the file size!


>  ls -l *nc4
-rw-r--r--  1 jma  jma  24974581 Sep  7 18:09 out1.nc4
-rw-r--r--  1 jma  jma  24975534 Sep  7 16:29 out2.nc4
-rw-r--r--  1 jma  jma  24975534 Sep  7 16:29 out3.nc4
-rw-r--r--  1 jma  jma  74880174 Sep  7 15:47 out4.nc4
-rw-r--r--  1 jma  jma  10488230 Sep  7 11:20 t2m.19feb2009.nc4
-rw-r--r--  1 jma  jma  11031824 Sep  7 11:21 t2m.20feb2009.nc4
-rw-r--r--  1 jma  jma   9975740 Sep  7 11:21 t2m.21feb2009.nc4


Solutions I have tried:

1. I modified each of the three input files so that each has a recorddimension, but I got the same result.2. I tried to use "--cnk_dmn lev,1" as an additional argument toncecat (this worked) but that is an unrecognized option in ncpdq,ncwa, and ncrcat.

3. I also tried the --cnk_scl option, like this:
        ncecat --cnk_scl=1,1,1,361,720 -O -h t2m.19feb2009.nc4 out1.nc4

but this brought my laptop (4 GB of memory) to its knees for 10minutes or so, and then I got:


ncecat(28450) malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
ncecat(28450) malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Bus error (core dumped)


Suggestion:

The default chunking policy (Chunksize Equals Dimension Size) doesn'tscale well -- as dimension sizes increase, this leads to chunks thatare so big the data file becomes unusable. Can another policy beimplemented that tries to maintain current chunksize? I realize thiscan't always be achieved elegantly, especially when the user ismessing with the dimensions, but if only the inner 2 dims are chunked,can that property be preserved in all the output files?


Respectfully submitted,
Jennifer

--
Jennifer M. Adams
IGES/COLA
4041 Powder Mill Road, Suite 302
Calverton, MD 20705
jma@xxxxxxxxxxxxx

Follow-Ups:
- Re: [netcdfgroup] chunking and concatenating in NCO version 4.0.3
  - From: Denis Nadeau

References:
- [netcdfgroup] netCDF operators NCO version 4.0.3 are ready
  - From: Charlie Zender

2010 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: