Hi,
I thought I'd give netcdf 4.7.4 a try for the compression in parallel IO (using
hdf5 1.10.7, pnetcdf 1.9.0, netcdf-fortran-4.5.3) on a NOAA cluster. I've been
using intel 19 with mvapich2.3, which worked fine with earlier versions
(4.3.something). So the problem I have is that it works fine on a single node,
but get various failures when trying to run a job that uses 2 or more nodes. It
also fails if the IO is not parallel (standard netcdf-4 where each process
writes its data in turn).
I have also compiled everything (including cloud model code) using Intel MPI,
which fails promptly with a seg fault when it tries to run on 2 nodes. (Here, I
am comparing 4 or 9 threads on a single node or 16 threads split on 2 nodes. If
I force the 16 thread version to run on a single node, it runs fine.)
The problem seems to be reproducible with a simple write/read test adapted from
ftst_parallel.F, so it is seems not specific to my model code. Fails with both
pnetcdf and mpiio
Any ideas what could be the issue here? I am stumped.
-- Ted