I'm no expert (particularly not with HPC), but you could be seeing effects
from chunking. Netcdf4/HDF5 groups the arrays in chunks to allow multiple
unlimited dimensions and compression, etc.
One result is that you can't read less than one chunk of data at once. So
maximum performance is reached when you are reading one chunk at a time.
Also, you want the chunking configuration to match your access patterns.
The default may not be well suited to your use case.
Google a bit for discussion of how to select good chunking.
-CHB
On Oct 28, 2015, at 10:57 AM, Matthew Jones <M.Jones3@xxxxxxxxxxxxxxxxx>
wrote:
Hi
I am running some tests on an HPC cluster, altering the size of reads to
test the performance of the file system.
I am using python, and for sequential reads not using netCDF4 the read rate
is pretty constant across different read sizes. However, when I introduce
the netCDF4 library the smaller and larger reads see a dip in performance
with a peak on the medium sized reads (creating a hill-like profile). The
peak in the netCDF4 performance is at about the same read rate as the
non-netCDF4 reads. The peak is at reads of about 1MB.
We think this could be to do with buffering somewhere in the NetCDF
library. Does anyone know of such buffering that we should be aware of?
Many thanks
Matt
----------------------------------------
Matthew Jones
PhD Student
Atmosphere, Oceans and Climate
Department of Meteorology,
University of Reading
Room 288, ESSC, Harry Pitt Building,
3 Earley Gate, Reading, RG6 6AL, UK
Ext: 5214
https://www.linkedin.com/pub/matthew-jones/8b/b81/25a
http://www.met.reading.ac.uk/users/users/1887
_______________________________________________
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/