There used to be a problem with netcdf4 and openmpi (1.4.x) where netcdf4 was
assuming the behavior of mpich in setting MPI_ERR_COMM (or something else where
mpich assigned a fixed value (improperly) but openmpi did not). That got fixed,
but perhaps the problem has come back? Did you try openmpi 1.4.x?
-- Ted
On Feb 8, 2012, at 6:03 PM, Orion Poplawski wrote:
> I'm trying to build parallel enabled netcdf 4.1.3 on Fedora 16 with hdf5
> 1.8.7 and with both mpich2 1.4.1p1 and openmpi 1.5.4. In running make check
> with the openmpi build I get:
>
> $ mpiexec -n 4 ./f90tst_parallel
> [orca.cora.nwra.com:32630] *** An error occurred in MPI_Comm_d
> [orca.cora.nwra.com:32630] *** on communicator MPI_COMM_WOR
> [orca.cora.nwra.com:32630] *** MPI_ERR_COMM: invalid communicator
> [orca.cora.nwra.com:32630] *** MPI_ERRORS_ARE_FATAL: your MPI job will now
> HDF5: infinite loop closing library
> D,T,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FDFD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,D,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,F,FD,FD,FD,FD,FD,FD,FD,FD,FD
>
> *** Testing netCDF-4 parallel I/O from Fortran 90.
> HDF5: infinite loop closing library
> D,T,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FDFD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,D,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,F,FD,FD,FD,FD,FD,FD,FD,FD,FD
> HDF5: infinite loop closing library
> D,T,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FDFD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,D,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,F,FD,FD,FD,FD,FD,FD,FD,FD,FD
> ------------------------------------------------------------------------
> mpiexec has exited due to process rank 2 with PID 32631 on
> node orca.cora.nwra.com exiting improperly. There are two reasons this could
> occu
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it wa
> for all processes to call "init". By rule, if one process calls "init"
> then ALL processes must call "init" prior to termination
>
> 2. this process called "init", but exited without calling "finaliz
> By rule, all processes that call "init" MUST call "finalize" prior
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to
> terminated by signals sent by mpiexec (as reported here)
> ------------------------------------------------------------------------
> [orca.cora.nwra.com:32628] 3 more processes have sent help message
> help-mpi-errors.trs_are_fatal
> [orca.cora.nwra.com:32628] Set MCA parameter "orte_base_help_aggregate" to 0
> to see ror messages
>
>
> It appears to work fine with mpich2. Has anyone else come across this?
>
> Thanks,
>
> Orion
>
> --
> Orion Poplawski
> Technical Manager 303-415-9701 x222
> NWRA, Boulder Office FAX: 303-415-9702
> 3380 Mitchell Lane orion@xxxxxxxxxxxxx
> Boulder, CO 80301 http://www.cora.nwra.com
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/