[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #AWT-862217]: nccopy chunking argument



Mark,

> Thanks for the reply. I've attached a script and a cdl file here
> that can (hopefully) reproduce the problem on your machine.

Thanks for doing that, it made it possible to reproduce the problems
pretty quickly here. 

> The script takes the cdl file and uses ncgen to generate a netcdf
> file from it called regen.nc. The file is initially tiny (16K or so)
> but specifies storage worth about 40GB. The variables in this file
> are chunked as date/1,lat/720,lon/1440.
> 
> Then there are two nccopy commands. The first command attempts to do
> the rechunking where all three dimensions are specified (date, lon,
> lat). i.e
> 
> ./nccopy -m 2G -h 2G -e 12000 -c date/1,lon/30,lat/30 regen.nc 
> specify_date_rechunking.nc
> 
> This runs to completion on my machine (8GB RAM) in about 10 minutes
> and produces a 40GB file called specify_date_rechunking.nc, where
> the variables are chunked as date/1,lon/30,lat/30.
> 
> The second command is identical apart from the fact that the date
> rechunking is not explicitly specified ie
> 
> ./nccopy -m 2G -h 2G -e 12000 -c lon/30,lat/30 regen.nc 
> unspecified_date_rechunking.nc
> 
> This command fails to run on my machine - the memory useage of the
> nccopy process explodes and at some point the OS kills it. However,
> the file that is produced, unspecified_date_rechunking.nc, is
> readable with ncdump - when you do this, you can see that the
> variables are chunked as date/5186,lon/30,lat/30 i.e. nccopy has set
> the date dimension chunking to occupy the full variable size, rather
> than sticking to the current chunk size (which may explain the
> memory requirements).
> 
> So, the question is, is it desired behaviour that nccopy changes the
> chunking of dimensions that are not-specified in the re-chunking
> argument?

No, it's not the desired or documented behavior, it was a bug.  I've
fixed it and verified that the fix works for your example, finishing
in about 16 minutes on my machine:

  $ /usr/bin/time nccopy -m 2G -h 2G -e 12000 -c lon/30,lat/30 regen.nc 
unspecified_date_rechunking.nc
  243.79user 138.55system 16:00.66elapsed 39%CPU (0avgtext+0avgdata 
2076912maxresident)k
  8840inputs+86178344outputs (52major+519408minor)pagefaults 0swaps
  $ ncdump -s -h unspecified_date_rechunking.nc | grep _ChunkSizes
                date:_ChunkSizes = 1 ;
                CHL1_mean:_ChunkSizes = 1, 30, 30 ;
                CHL1_flags:_ChunkSizes = 1, 30, 30 ;
                CHL1_error:_ChunkSizes = 1, 30, 30 ;

The fix was pretty small, but I may not get it into the snapshot
today, as I also have to generate a test case that verifies the fix,
and include the necessary test files and auxiliary changes to
Makefiles and such, and I'm out of the office for a week starting
tomorrow.  So here's a patch you can apply locally to ncdump/nccopy.c
to fix the bug:

$ svn diff 
Index: nccopy.c
===================================================================
--- nccopy.c    (revision 1747)
+++ nccopy.c    (working copy)
@@ -774,9 +774,10 @@
                /* Copy all netCDF-4 specific variable properties such as
                 * chunking, endianness, deflation, checksumming, fill, etc. */
                NC_CHECK(copy_var_specials(igrp, varid, ogrp, o_varid));
+           } else {
+               /* Set chunking if specified in command line option */
+               NC_CHECK(set_var_chunked(ogrp, o_varid));
            }
-           /* Set chunking if specified in command line option */
-           NC_CHECK(set_var_chunked(ogrp, o_varid));
            /* Set compression if specified in command line option */
            NC_CHECK(set_var_compressed(ogrp, o_varid));
        }

> That's the first issue. The second issue is why does command two not
> run to completion? However, maybe we should take the first issue
> first, so as to avoid potential confusion....

I don't know the answer to that.  Before the bug fix, it took over my
machine too, and it became so unresponsive that attaching to the
process to debug it was impractical.  A guess is that it was just
thrashing with huge chunk sizes and too little memory, and that it
would have eventually finished (maybe after a week or a month :-) ).

Anyway, thanks for the bug report that helped get this fixed!

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: AWT-862217
Department: Support netCDF
Priority: Normal
Status: Closed