On 05/07/2014 09:39 AM, Kent Yang wrote:
-- There should be a paper that listed the flash benchmark comparison between
parallel NetCDF from Northwest(or parallel netcdf-3) and parallel HDF5.
However, it is an unfair comparison. It used collective IO for parallel
NetCDF-3 but independent IO for parallel HDF5. You can find more detailed
about the fair comparison with the collective IO for these two packages from
http://www.spscicomp.org/ScicomP12/Presentations/User/Yang.pdf
netcdf has define mode and data mode separate. this restricts what the
user can do, but it also means once you are out of define mode, the
metadata will not change.
HDF5's metadata book keeping code means writes require not only a bulk
data update, but also mean updating a bit of metadata. not a huge deal
if you are moving tons of data, but if you are working with many small
datasets, it can be a factor
Be aware this was also a bit old. Don't know what's the current status between
these two packages.
for most people and most workloads, the simple fact that either pnetcdf
or HDF5 is being used is great. I think both libraries have pain points
(like when parallel-netcdf tries to read or write one of several record
variables ) where performance can suffer.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA