On Tue, 2016-02-23 at 17:13 +0000, Sean Byland wrote:
> Hello,
> I’m not particularly knowledge on NetCDF but know that it can do parallel I/O
> via parallel HDF5 or ANL’s/NU's pNetCDF? What would be the pros and cons of
> each configuration?
>
The HDF5 backend ("new netcdf") allows for some nice features: VLEN
arrays, compression, multiple dimensions of NC_UNLIMITED. Those
features come at some cost of metadata.
ANL/Northwestern (thank you for mentioning both institutions!) pnetcdf
implements the much simpler classic NetCDF format (CDF-1, CDF-2 and
CDF-5), and takes advantage of the older, more restrictive constraints.
If you have very large datasets, you're unlikely to see much difference
between the two approaches, as data movement costs will dominate.
One could construct datasets impossible to implement in ANL/NU pnetcdf,
and one could likewise construct pathological datasets (e.g. a thousand
datasets, each with 4k of data in them) that would perform exceptionally
poorly under Unidada NetCDF.
Here's a fun game you can play: let's say you've got a representative
benchmark that shows Unidata NetCDF outperforming ANL/Northwestern
pnetcdf. Wei-keng and I will defend our professional pride and tune the
heck out of pnetcdf to meet or beat our good-natured competitor.
Likewise, Ward and team would do the same if the results were reversed.
You can get decades worth of experience looking at your workload for
free!
=rob
> Thanks,
> Sean
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/