Dear Experts,
My question is about the NetCDF Operators -- I have copied this post
to the NCO forum on sourceforge.net, but I am also writing to this
group because it increases my chances of getting feedback. Someone on
this list who doesn't read that forum regularly may be able to help.
I am using nco-4.0.3 linked with netcdf 4.1-rc1 udunits2.1.11 and
hdf5-1.8.4 on an intel Mac running OSX 10.5.8.
Using GrADS I have created a trio of 4-dimensional chunked netcdf-4
files, each with 8 time steps (one file per day), and no record
dimension. The (edited) ncdump output for the first file looks like
this:
dimensions:
lon = 720 ;
lat = 361 ;
lev = 3 ;
time = 8 ;
variables:
double time(time) ;
time:units = "minutes since 2009-02-19 00:00" ;
float t2(time, lev, lat, lon) ;
t2:_ChunkSizes = 1, 1, 361, 720 ;
data:
time = 0, 180, 360, 540, 720, 900, 1080, 1260 ;
Other files in the trio are the same except for the origin of the time
axis: "minutes since 2009-02-20 00:00", etc.
I would like to concatenate my three files into one file with 24 time
steps and maintain the chunk sizes. Following the documentation, I am
executing the following commands (saving intermediate files out*.nc4
for debugging purposes):
ncecat -O -h t2m.19feb2009.nc4 out1.nc4
ncpdq -O -h -a time,record out1.nc4 out2.nc4
ncwa -O -h -a record out2.nc4 out3.nc4
ncrcat -O -h out3.nc4 t2m.20feb2009.nc4 t2m.21feb2009.nc4 out4.nc4
The (edited) ncdump output from the final file (out4.nc4) looks like
this:
dimensions:
lon = 720 ;
lat = 361 ;
lev = 3 ;
time = UNLIMITED ; // (24 currently)
variables:
double time(time) ;
time:units = "minutes since 2009-02-19 00:00" ;
float t2(time, lev, lat, lon) ;
t2:_ChunkSizes = 1, 3, 361, 720 ;
data:
time = 0, 180, 360, 540, 720, 900, 1080, 1260, 0, 180, 360, 540,
720, 900,
1080, 1260, 0, 180, 360, 540, 720, 900, 1080, 1260 ;
Problems:
1. The time axis values are wrong for timesteps 9-24 in the final
output. Is there a way to make ncrcat notice the different origins of
the time axes in the files it is concatenating and adjust accordingly?
2. The chunk sizes are changed so that the lev dimension chunk size is
> 1.
3. The size of the output is much too large. Changing the time axis to
a record dimension more than doubles the file size!
> ls -l *nc4
-rw-r--r-- 1 jma jma 24974581 Sep 7 18:09 out1.nc4
-rw-r--r-- 1 jma jma 24975534 Sep 7 16:29 out2.nc4
-rw-r--r-- 1 jma jma 24975534 Sep 7 16:29 out3.nc4
-rw-r--r-- 1 jma jma 74880174 Sep 7 15:47 out4.nc4
-rw-r--r-- 1 jma jma 10488230 Sep 7 11:20 t2m.19feb2009.nc4
-rw-r--r-- 1 jma jma 11031824 Sep 7 11:21 t2m.20feb2009.nc4
-rw-r--r-- 1 jma jma 9975740 Sep 7 11:21 t2m.21feb2009.nc4
Solutions I have tried:
1. I modified each of the three input files so that each has a record
dimension, but I got the same result.
2. I tried to use "--cnk_dmn lev,1" as an additional argument to
ncecat (this worked) but that is an unrecognized option in ncpdq,
ncwa, and ncrcat.
3. I also tried the --cnk_scl option, like this:
ncecat --cnk_scl=1,1,1,361,720 -O -h t2m.19feb2009.nc4 out1.nc4
but this brought my laptop (4 GB of memory) to its knees for 10
minutes or so, and then I got:
ncecat(28450) malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
ncecat(28450) malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Bus error (core dumped)
Suggestion:
The default chunking policy (Chunksize Equals Dimension Size) doesn't
scale well -- as dimension sizes increase, this leads to chunks that
are so big the data file becomes unusable. Can another policy be
implemented that tries to maintain current chunksize? I realize this
can't always be achieved elegantly, especially when the user is
messing with the dimensions, but if only the inner 2 dims are chunked,
can that property be preserved in all the output files?
Respectfully submitted,
Jennifer
--
Jennifer M. Adams
IGES/COLA
4041 Powder Mill Road, Suite 302
Calverton, MD 20705
jma@xxxxxxxxxxxxx