[netcdfgroup] chunking and concatenating in NCO version 4.0.3

Dear Experts,
My question is about the NetCDF Operators -- I have copied this post to the NCO forum on sourceforge.net, but I am also writing to this group because it increases my chances of getting feedback. Someone on this list who doesn't read that forum regularly may be able to help.

I am using nco-4.0.3 linked with netcdf 4.1-rc1 udunits2.1.11 and hdf5-1.8.4 on an intel Mac running OSX 10.5.8.

Using GrADS I have created a trio of 4-dimensional chunked netcdf-4 files, each with 8 time steps (one file per day), and no record dimension. The (edited) ncdump output for the first file looks like this:

dimensions:
        lon = 720 ;
        lat = 361 ;
        lev = 3 ;
        time = 8 ;
variables:
       double time(time) ;
                time:units = "minutes since 2009-02-19 00:00" ;
       float t2(time, lev, lat, lon) ;
                t2:_ChunkSizes = 1, 1, 361, 720 ;
data:
    time = 0, 180, 360, 540, 720, 900, 1080, 1260 ;

Other files in the trio are the same except for the origin of the time axis: "minutes since 2009-02-20 00:00", etc.

I would like to concatenate my three files into one file with 24 time steps and maintain the chunk sizes. Following the documentation, I am executing the following commands (saving intermediate files out*.nc4 for debugging purposes):

ncecat -O -h t2m.19feb2009.nc4 out1.nc4
ncpdq  -O -h -a time,record out1.nc4 out2.nc4
ncwa   -O -h -a record out2.nc4 out3.nc4
ncrcat -O -h out3.nc4 t2m.20feb2009.nc4 t2m.21feb2009.nc4 out4.nc4

The (edited) ncdump output from the final file (out4.nc4) looks like this:

dimensions:
        lon = 720 ;
        lat = 361 ;
        lev = 3 ;
        time = UNLIMITED ; // (24 currently)
variables:
        double time(time) ;
                time:units = "minutes since 2009-02-19 00:00" ;
        float t2(time, lev, lat, lon) ;
                t2:_ChunkSizes = 1, 3, 361, 720 ;
data:
time = 0, 180, 360, 540, 720, 900, 1080, 1260, 0, 180, 360, 540, 720, 900,
              1080, 1260, 0, 180, 360, 540, 720, 900, 1080, 1260 ;

Problems:
1. The time axis values are wrong for timesteps 9-24 in the final output. Is there a way to make ncrcat notice the different origins of the time axes in the files it is concatenating and adjust accordingly? 2. The chunk sizes are changed so that the lev dimension chunk size is > 1. 3. The size of the output is much too large. Changing the time axis to a record dimension more than doubles the file size!

>  ls -l *nc4
-rw-r--r--  1 jma  jma  24974581 Sep  7 18:09 out1.nc4
-rw-r--r--  1 jma  jma  24975534 Sep  7 16:29 out2.nc4
-rw-r--r--  1 jma  jma  24975534 Sep  7 16:29 out3.nc4
-rw-r--r--  1 jma  jma  74880174 Sep  7 15:47 out4.nc4
-rw-r--r--  1 jma  jma  10488230 Sep  7 11:20 t2m.19feb2009.nc4
-rw-r--r--  1 jma  jma  11031824 Sep  7 11:21 t2m.20feb2009.nc4
-rw-r--r--  1 jma  jma   9975740 Sep  7 11:21 t2m.21feb2009.nc4


Solutions I have tried:
1. I modified each of the three input files so that each has a record dimension, but I got the same result. 2. I tried to use "--cnk_dmn lev,1" as an additional argument to ncecat (this worked) but that is an unrecognized option in ncpdq, ncwa, and ncrcat.
3. I also tried the --cnk_scl option, like this:
        ncecat --cnk_scl=1,1,1,361,720 -O -h t2m.19feb2009.nc4 out1.nc4
but this brought my laptop (4 GB of memory) to its knees for 10 minutes or so, and then I got:

ncecat(28450) malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
ncecat(28450) malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Bus error (core dumped)


Suggestion:
The default chunking policy (Chunksize Equals Dimension Size) doesn't scale well -- as dimension sizes increase, this leads to chunks that are so big the data file becomes unusable. Can another policy be implemented that tries to maintain current chunksize? I realize this can't always be achieved elegantly, especially when the user is messing with the dimensions, but if only the inner 2 dims are chunked, can that property be preserved in all the output files?

Respectfully submitted,
Jennifer

--
Jennifer M. Adams
IGES/COLA
4041 Powder Mill Road, Suite 302
Calverton, MD 20705
jma@xxxxxxxxxxxxx



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: