[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #TSI-527912]: nccopy advice - rechunking very large files



> On the subject of compression:
> The compression has finished for 3 different rates -d 0/5/9,
> and here are the results:

You may already be aware of this, but just to make sure, the
compression level corresponding to -d0 is *no* compression.  So
it might be useful to compare -d1, the lowest and supposedly 
fastest level of compression with -d5 and d9.  In my experience,
-d1 is a little bit faster than higher levels and they are a
little bit better compression, for a lot of large floating-point
data.  So I usually just use -d1 for compression, as the time
it saves is usually worth the small amount of extra data volume.

I used -d0 in the example I ran to explicitly specify that the
output was to be uncompressed.  I tought that would be somewhat
faster than compressing it when the output chunks were written
to disk, and it was significantly faster:

Writing uncompressed output took 35:24.38 seconds elapsed:

  $ nccopy -ctime/98128,x/8,y/6 -e 102000 -m 40M -h 40G -d0 tmp.nc4 
tmp-rechunked.nc4
  $ ls -l tmp-rechunked.nc4
  -rw-rw-r-- 1 russ ustaff 38970737448 Oct  7 12:36 tmp-rechunked.nc4
  
whereas compressing the output using level 1 (the default for 
nccopy is to compress the output at the same level as the 
input) took 52:29.25 seconds elapsed:

  $ nccopy -w -ctime/98128,x/8,y/6 -e 102000 -m 40M -h 40G tmp.nc4 
tmp-rechunked.nc4
  $ ls -l tmp-rechunked.nc4
  -rw-rw-r-- 1 russ ustaff 10951640022 Oct  7 18:55 tmp-rechunked.nc4

So in this case it looks like -d1 did pretty well, because the size of the
original compressed file (which used -d1 level compression) was only

  $ ls -l tmp.nc4
  -rw-rw-r-- 1 russ ustaff 10143354510 Oct  4 16:45 tmp.nc4

So I'm puzzled why the -d5 and -d9 were so much larger than the -d1 result.
If anything, I'd expect them to be a little smaller than the -d1 result.
But maybe your -d5 and -d9 were assuming 1/4 the size of output chunks,
using only 98128/4 along the time dimension?

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: TSI-527912
Department: Support netCDF
Priority: Normal
Status: Closed