Re: [netcdfgroup] chunking and concatenating in NCO version 4.0.3

Hi Jennifer,

 

I had a similar problem a week ago where I need to concatenate many files.
Instead of using "ncecat" which creates a record dimension,  I used ncrcat
directly with all my files.  The program recognized the time axis correctly
and concatenate all the files.  (all my files had TIME dimension as
UNLIMITED..)

 

Can you try:

 

ncrcat -O -h t2m.19feb2009.nc4 t2m.20feb2009.nc4 t2m.21feb2009.nc4 out4.nc4

 

If this does not work, you might want to try to create new t2m files with
UNLIMTED Time dimension like you did for out3.nc4 for each t2m file.

 

Finally, to get the chunking size right,  I used ncks with chunking policy
for variables > 2D: "--cnk_plc g2d --cnk_dmn lev,1 -ch,_dmn lat,361 -chk_dmn
l,720 --chk_dmn time,1" . I chunk all the dimensions manually as you see.
The new netcdf4 /HDF5 library specify a default chunksize which is usually
not right for our specific science.  

 

ncrcat seems to allow chunking arguments, but I never tried it.

 

Let me know if this help.

Denis

 

 

From: netcdfgroup-bounces@xxxxxxxxxxxxxxxx
[mailto:netcdfgroup-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Jennifer Adams
Sent: Tuesday, September 07, 2010 7:56 PM
To: netCDF Mail List
Subject: [netcdfgroup] chunking and concatenating in NCO version 4.0.3

 

Dear Experts, 

My question is about the NetCDF Operators -- I have copied this post to the
NCO forum on sourceforge.net, but I am also writing to this group because it
increases my chances of getting feedback. Someone on this list who doesn't
read that forum regularly may be able to help. 

 

I am using nco-4.0.3 linked with netcdf 4.1-rc1 udunits2.1.11 and hdf5-1.8.4
on an intel Mac running OSX 10.5.8. 

 

Using GrADS I have created a trio of 4-dimensional chunked netcdf-4 files,
each with 8 time steps (one file per day), and no record dimension. The
(edited) ncdump output for the first file looks like this:

 

dimensions:

        lon = 720 ;

        lat = 361 ;

        lev = 3 ;

        time = 8 ;

variables:

       double time(time) ;

                time:units = "minutes since 2009-02-19 00:00" ;

       float t2(time, lev, lat, lon) ;

                t2:_ChunkSizes = 1, 1, 361, 720 ;

data: 

    time = 0, 180, 360, 540, 720, 900, 1080, 1260 ;

 

Other files in the trio are the same except for the origin of the time axis:
"minutes since 2009-02-20 00:00", etc. 

 

I would like to concatenate my three files into one file with 24 time steps
and maintain the chunk sizes. Following the documentation, I am executing
the following commands (saving intermediate files out*.nc4 for debugging
purposes): 

 

ncecat -O -h t2m.19feb2009.nc4 out1.nc4

ncpdq  -O -h -a time,record out1.nc4 out2.nc4

ncwa   -O -h -a record out2.nc4 out3.nc4

ncrcat -O -h out3.nc4 t2m.20feb2009.nc4 t2m.21feb2009.nc4 out4.nc4

 

The (edited) ncdump output from the final file (out4.nc4) looks like this: 

 

dimensions:

        lon = 720 ;

        lat = 361 ;

        lev = 3 ;

        time = UNLIMITED ; // (24 currently)

variables:

        double time(time) ;

                time:units = "minutes since 2009-02-19 00:00" ;

        float t2(time, lev, lat, lon) ;

                t2:_ChunkSizes = 1, 3, 361, 720 ;

data:

    time = 0, 180, 360, 540, 720, 900, 1080, 1260, 0, 180, 360, 540, 720,
900, 

              1080, 1260, 0, 180, 360, 540, 720, 900, 1080, 1260 ;

 

Problems: 

1. The time axis values are wrong for timesteps 9-24 in the final output. Is
there a way to make ncrcat notice the different origins of the time axes in
the files it is concatenating and adjust accordingly? 

2. The chunk sizes are changed so that the lev dimension chunk size is > 1. 

3. The size of the output is much too large. Changing the time axis to a
record dimension more than doubles the file size! 

 

>  ls -l *nc4

-rw-r--r--  1 jma  jma  24974581 Sep  7 18:09 out1.nc4

-rw-r--r--  1 jma  jma  24975534 Sep  7 16:29 out2.nc4

-rw-r--r--  1 jma  jma  24975534 Sep  7 16:29 out3.nc4

-rw-r--r--  1 jma  jma  74880174 Sep  7 15:47 out4.nc4

-rw-r--r--  1 jma  jma  10488230 Sep  7 11:20 t2m.19feb2009.nc4

-rw-r--r--  1 jma  jma  11031824 Sep  7 11:21 t2m.20feb2009.nc4

-rw-r--r--  1 jma  jma   9975740 Sep  7 11:21 t2m.21feb2009.nc4

 

 

Solutions I have tried: 

1. I modified each of the three input files so that each has a record
dimension, but I got the same result. 

2. I tried to use "--cnk_dmn lev,1" as an additional argument to ncecat
(this worked) but that is an unrecognized option in ncpdq, ncwa, and ncrcat.

3. I also tried the --cnk_scl option, like this: 

        ncecat --cnk_scl=1,1,1,361,720 -O -h t2m.19feb2009.nc4 out1.nc4

but this brought my laptop (4 GB of memory) to its knees for 10 minutes or
so, and then I got:

 

ncecat(28450) malloc: *** mmap(size=16777216) failed (error code=12)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

ncecat(28450) malloc: *** mmap(size=16777216) failed (error code=12)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

Bus error (core dumped)

 

 

Suggestion: 

The default chunking policy (Chunksize Equals Dimension Size) doesn't scale
well -- as dimension sizes increase, this leads to chunks that are so big
the data file becomes unusable. Can another policy be implemented that tries
to maintain current chunksize? I realize this can't always be achieved
elegantly, especially when the user is messing with the dimensions, but if
only the inner 2 dims are chunked, can that property be preserved in all the
output files? 

 

Respectfully submitted,

Jennifer

 

--

Jennifer M. Adams

IGES/COLA

4041 Powder Mill Road, Suite 302

Calverton, MD 20705

jma@xxxxxxxxxxxxx

 

 

 

  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: