"John Urbanic" <urbanic@xxxxxxx> writes:
> NetCDF gurus:
>
>
>
> After successfully prototyping our parallel netcdf code, we have rolled it
> into a large community app (MFIX) and are now getting sporadic "NetCDF:
> HDF error" errors during runs. This, unsurprisingly, coincides with
> failure to write portions of related variable fields.
>
>
>
> These happen during put_vars(), and occurs across all PEs at that random
> time, and also only one associated PE's subsequent close() as well. In
> one of the smallest cases, we are writing ~100, 600K files. This problem
> will strike every 15 or 20 files, and will vary both in the file and the
> fields that are affected. With larger files it occurs more frequently -
> almost every other file with the 300MB files we need for production.
> Again, it occurs in different fields and files within runs and from run to
> run. We are using netcdf 4.1.3 and hdf 1.8.7.
>
>
>
> My question is, how can I possibly drill further into this problem? I am
> at a loss as to how to proceed. It would be nice to force HDF to be more
> specific, or course, but all debugging suggestions most welcome.
If you build netCDF with --enable-logging, then put the following in
your code:
nc_set_log_level(3);
(There is also a fortran version.)
You will then get a ton of output. Trying changing the "3" to a "1" to
get less output, or to a 5 to get more.
If this doesn't work, fire up the parallel debugger and see where HDF5
and netCDF are failing to get along...
Good luck,
Ed
--
Ed Hartnett -- ed@xxxxxxxxxxxxxxxx