[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: netcdf 4.1.2+ issue



Hi Benno,

> We have been looking at our netcdf read performance again, particularly with 
> hdf4/hdf5 files.
> 
> We do not have a clear story for the most part, but there seems to be a clear 
> problem with compression in hdf5-based
> netcdf files.
> 
> We would appreciate any insight.

You are seeing artifacts of 

  - Chunking with a chunk cache that's too small for the chunk shapes
    used for compression
  - Poor default chunk shapes for early netCDF-4 version (4.1.2)
  - Measuring I/O performance with ncdump utility, which is not designed
    for high performance

A chunk (or tile) is the smallest unit for HDF5 data compression and
access.  The ncdump utility just uses the default chunk cache size,
which in netCDF version 4.1.2 was small (4194304 bytes).  The
temperature variable in your test file has 9 chunks, each of size 1 x
1196 x 1196 shorts, so each chunk is 2860832 bytes.  That means only 1
uncompressed chunk will fit in the default chunk cache.  Reading all the
values in each row of 2500 values will read and uncompress 3 chunks, and
since the chunk cache only holds one of those chunks, the same chunks
will be re-read and uncompressed repeatedly until all the data is read!

I don't think ncdump is a very good program for testing read
performance.  It was not designed to be high-performance, as it spends
much of its time comparing each value with a file value before
converting it to ASCII for formatting output a row at a time.  The
ncdump utility doesn't have an option for specifying the size of chunk
cache to use for compressed files.

The nccopy utility is more appropriate for timing I/O with compression
and chunking, as it's designed to be efficient.  It uses only the netCDF
library to read and write, so it's testing the efficiency of the netCDF
software.  However, nccopy was not available for early versions of
netCDF-4, such as 4.0.1.  Here's the current man page:

  http://www.unidata.ucar.edu/netcdf/docs/nccopy-man-1.html

Later versions of netCDF, such as 4.2.x and 4.3.x have better default
chunking strategies, so perform better on your file.  For example, in
netCDF 4.3.0, better chunk sizes are used (1 x 1250 x 1250) so there's
only 4 chunks rather than 9 chunks, and compression works better, even
with the same level of deflation:

  $ nccopy -d1 spv.nc spv-d1.nc
  $ ls -l spv-d1.nc
  -rw-rw-r-- 1 russ ustaff 2832831 Nov 26 14:44 spv-d1.nc

which is better than the 3538143 bytes of the compressed file you sent.
And time for the above compression was about 0.8 sec on my Linux
desktop.

A pretty good timing test for reading is to read, uncompress, and copy
the compressed file, using nccopy.  Before running any such test, you
should make sure you aren't just reading a cached copy of the input file
in system memory.  See "A note about timings" at the end of my blog
"Chunking Data: Why it Matters" for how to do this:

  
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters

That blog also has some advice about choosing chunk shapes and sizes for
good performance.  My follow-up blog, "Chunking Data: Choosing Shapes",
has more specific advice:

  
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes

Anyway, here's how much time it takes to copy and uncompress the two
versions of your compressed file, one using the 1 x 1196 x 1196 chunks
that were from the old defaults in netCDF 4.1.2, and the other using the
1 x 1250 x 1250 chunks in the current netCDF release:

  $ clear_cache.sh; time nccopy -d0 -k1 spv-199901011900_compressed.nc tmp.nc
  real  0m1.98s
  user  0m0.27s
  sys   0m0.06s

  $ clear_cache.sh; time nccopy -d0 -k1 spv-d1.nc tmp.nc
  real  0m1.83s
  user  0m0.19s
  sys   0m0.07s

In each case, the output is a netCDF-3 classic format file matching the
uncompressed file you sent.

And just for FYI, here's the time for running ncdump on the two
versions of the compressed data:

  $ clear_cache.sh; time ncdump spv-199901011900_compressed.nc > /dev/null
  real  4m5.91s
  user  3m58.97s
  sys   0m4.43s

  $ clear_cache.sh; time ncdump spv-d1.nc > /dev/null
  real  3m29.15s
  user  3m25.79s
  sys   0m0.86s

Both of those would be much faster if ncdump reserved enough chunk cache
in memory to hold all the chunks in a row of a variable when dumping it.
I could add that optimization option, if you really need ncdump to be
faster, but it would use a lot more memory than it does now.

--Russ

> Benno
> 
> ---------- Forwarded message ----------
> From: Igor Khomyakov <address@hidden>
> Date: Thu, Nov 14, 2013 at 4:53 PM
> Subject: netcdf 4.1.2+ issue
> To: Benno Blumenthal <address@hidden>
> Cc: John del Corral <address@hidden>
> 
> Benno, here's the test case for netcdf developers. Please let me know if you 
> need more information. Attached, please
> find the sample data files and the strace log. 
> 
> Igor 
> 
> THE DATA FILES: The compressed version of netcdf file was produced using 
> nccopy (option -d). The uncompressed file is
> 12.5MB, the compressed file is 3.5MB. Attached, you may find datafiles.tgz 
> that contains both data files.
> 
> THE PROBLEM: ncdump 4.1.2+ of the compressed file takes 50 times more time 
> than ncdump of the original netcdf file.
> Ncdump 4.0.1 doesn't appear to have this issue. 
> 
> $ time ncdump spv-199901011900.nc >/dev/null
> 
> real 0m1.652s
> user 0m1.605s
> sys 0m0.017s
> 
> $ time ncdump spv-199901011900_compressed.nc >/dev/null
> 
> real 1m28.273s
> user 1m11.460s
> sys 0m16.681s
> 
> THE STRACE LOG: we straced ncdump 4.1.2 of compressed file and found that it 
> calls 'read' function 7,526 times, and
> reads 3,384,680,557 bytes!  This is 1000 times more than the size of the 
> file. Attached, please find the strace log. 
> 
> --
> Dr. M. Benno Blumenthal          address@hidden
> International Research Institute for climate and society
> The Earth Institute at Columbia University
> Lamont Campus, Palisades NY 10964-8000   (845) 680-4450