On 08/19/2015 03:55 PM, Gerry Creager - NOAA Affiliate wrote:
I'll open a case to determine if Cray's MPI-IO library has this problem.
OK. Might not be any need to do so: David Knaak told me (via off-list
correspondence) that it was fixed in Cray MPI-IO much the same way I
fixed it in ROMIO.
==rob
gerry
On Wed, Aug 19, 2015 at 7:47 PM, Rob Latham <robl@xxxxxxxxxxx
<mailto:robl@xxxxxxxxxxx>> wrote:
On 08/18/2015 02:31 PM, Ward Fisher wrote:
Hello all,
I just wanted to jump in and comment that this issue, recently
reported
to us by David Knaak at Cray, is now handled in the netCDF-C
development
branch on GitHub. This fix will be in the upcoming release
candidate and
eventual final release of netCDF-C 4.4.0.
Regarding the question of short reads providing more warning; netcdf
specifically was already checking for short reads when ‘paging
in’ data
from a file, but was assuming an error when one would occur (due
to a
non-zero |errno| value). The fix shouldn’t incur any performance
penalty. The new thing I learned about “short reads” is that it is
possible for this to occur /without/ being the result of an
error, but
rather the result of an interrupt.
I found these short reads would happen in ROMIO when trying to read
2 GiB of data in one shot. Linux would give me back (2GiB-4k) worth
of data.
Today, most MPI-IO libraries should detect and retry this case.
Cray's MPI-IO library is closed source, so i don't know what they do.
In general, since they are technically allowed I think
developers are
going to have to accommodate the possibility of short reads in their
software, one way or another. Developers should already be
checking the
return value of |read()|, and when short, the fix is essentially:
1. Check to see if errno is |EINTR|
2. If so, perform some calculations and resume the read.
While that's strictly correct, I worry about short reads that for
whatever reason don't set EINTR. So I would check how much data was
read. If it is less than requested, continue the read to fetch the
missing data. If that continued read returns 0, then you are EOF
and you are done.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
_______________________________________________
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
--
Gerry Creager
NSSL/CIMMS
405.325.6371
++++++++++++++++++++++
“Big whorls have little whorls,
That feed on their velocity;
And little whorls have lesser whorls,
And so on to viscosity.”
Lewis Fry Richardson (1881-1953)
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA