Re: [netcdfgroup] Performance Question using nc_get_vara

To: Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
Subject: Re: [netcdfgroup] Performance Question using nc_get_vara
From: "Amr, Mahmoud" <mahmoud.amr@xxxxxxxxxxxxxxxxx>
Date: Fri, 6 Dec 2019 08:02:13 +0000

Hey Ed,

thanks for replying!  We double checked and we are currently not using 
compression. We are keen on providing some data to play with but the data in 
use is protected by NDA, so we need some time to prepare a dataset.

Regarding your tip about using the benchmark tool, it seems like we are not 
able to compile it on windows because it uses some linux specific header files 
like <sys/types.h> and we are bound to using windows by our IT department (yep, 
im serious).

Meanwhile we tested different configurations for the chunk cache. Keeping the 
chunk size the same (64x64x64) for the dataset with dimensions of 
(6000x6000x3000) we clearly made a mistake when setting the chunk cache size. 
We now calculate the required bytes to hold at least dimX/64 * dimY/64 chunks 
in cache and set the cache size using  nc_set_var_chunk_cache and the 
performance while getting slices from the cache increased by a factor of 5, so 
we need around 300ms to pick a slice from the cache instead of 1.5seconds. This 
still seems slow to me but we are making progress! ☺ I’d still expect to get 
the data from the cache much faster, because its already in memory, right? 
Also, it seems like if we make the chunk cache larger than required, accessing 
data using nc_get_vara is getting slower again.

Maybe you can shed some light on how to set the chunk cache size and how to 
pick the chunk size? We used information provided by this article which 
suggests the chunk size should be small to get good average performance in all 
dimensions:

https://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters

Greetings

Von: Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
Gesendet: Montag, 2. Dezember 2019 16:26
An: Amr, Mahmoud <mahmoud.amr@xxxxxxxxxxxxxxxxx>
Cc: netcdfgroup@xxxxxxxxxxxxxxxx
Betreff: Re: [netcdfgroup] Performance Question using nc_get_vara

Performance with netCDF-4/HDF5 is better than performance with binary files, 
when settings are correct.

The biggest problem is usually compression. Are you compressing your data (i.e. 
did you use nc_def_var_deflate()?) If so, turn that off for a while and get 
your performance sorted out without compression.

Then you can turn on compression and decide if the performance hit is worth the 
compression.

Your chunksizes sound small. Try much bigger ones. Also if you build the C 
library with --enable-benchmarks there is a program nc_perf/bm_file.c which 
will rewrite any data file into one with different chunksizes, compression, and 
other settings. A line of CSV timings will be output. By running bm_file from a 
script, you can try a variety of chunksizes and other settings and get a nice 
CSV output file that you can put in excel for easy graphing of results.

If none of this helps, send me a copy of the file and I'll take a look...

Keep on netCDFing!
Ed Hartnett

On Mon, Dec 2, 2019 at 8:02 AM Amr, Mahmoud 
<mahmoud.amr@xxxxxxxxxxxxxxxxx<mailto:mahmoud.amr@xxxxxxxxxxxxxxxxx>> wrote:
Dear netcdf community,

recently we switched from our „own“ file format (data saved linear in “primary” 
direction) to netcdf for saving 3d ct voxel data in the hopes of improving 
performance when accessing the data from other dimensions, for example getting 
slices with YZ view instead of XY. The Data is way too large for memory, so we 
load them slice by slice using nc_get_vara.

In our recent attempts using uint16 voxel data with example dimensions of 
6000x6000x3000 and chunk sizes of 64x64x64, loading one slice into chunk cache 
took 5 seconds and loading slices from the chunk cache until the next set of 
chunks has to be read took 1 second per slice. The chunk cache is parameterized 
to be large enough to hold “at least” enough chunks for a slice. We are using 
Win10 systems with NvME SSDs (~3200Mb/s read).

This seems incredibly slow to me, especially when the data is already in the 
chunk cache. It seems like the CPU utilization is not very good and the disk 
does nothing as long as the chunk cache is filled.

Is this expected performance from your experience or are we doing something 
really wrong? We already tried different chunk sizes and all other chunk sizes 
gave us even worse speed.  We are using the precompiled C library.

Thanks in advance


_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx<mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe,  visit: 
https://www.unidata.ucar.edu/mailing_lists/

Follow-Ups:
- Re: [netcdfgroup] Performance Question using nc_get_vara
  - From: Chris Barker

References:
- [netcdfgroup] Performance Question using nc_get_vara
  - From: Amr, Mahmoud
- Re: [netcdfgroup] Performance Question using nc_get_vara
  - From: Ed Hartnett

2019 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: