[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #KZJ-320086]: Short read are not managed?



Hi David,

I’ve just merged a fix for this issue into the master branch of our git
repository at http://github.com/Unidata/netcdf-c. It will be included in an
upcoming release candidate and, in hopefully short order, the 4.4.0
release. Hopefully this will fix the issue, but if it does not, please let
us know. I’m still working on a test that we can integrate into our unit
and regression tests for this issue.

Have a great day,

-Ward

address@hidden> wrote:

New Client Reply: Short read are not managed?
>
> Dear Hans-Juergen,
>
> Thank you for your cooperation.  I will let UCAR know that our customer
> is willing to help test this and I will coordinate this with Michael and
> Stefan.
>
> Enjoy your vacation.
>
> David
>
> On Fri, Jul 17, 2015 at 08:38:01AM +0200, Panitz, Hans-Juergen  (IMK)
> wrote:
> > Dear David,
> >
> > as the mentioned customer I am of course  willing to test whether the
> > problem will be solved after the fix in
> > a new release of NetCDF library..
> > But   I will be out of the office soon, nearly until the end of August.
> > However, some weeks ago I provided  your CRAY colleague Michael Neff at
> > HLRS, Stuttgart, with a a test suite.
> > Thus, if, in the meantime, the new NetCDF release will be published, he
> > could start with the test.
> > Anyhow, I will contact him after my return
> >
> > Best regards
> >
> > Hans-Juergen
> >
> >
> > Am 16.07.2015 um 20:10 schrieb David Knaak:
> > > Hi Ward,
> > >
> > > I agree with the difficulty of testing it since the condition is hard
> to
> > > reproduce.  I think that our customer would be more than willing to
> help
> > > test it.  I'll pursue that and let you know.
> > >
> > > David
> > >
> > > On Thu, Jul 16, 2015 at 11:59:30AM -0600, Unidata netCDF Support wrote:
> > >> Hello David,
> > >>
> > >> Thank you for the comprehensive description of the issue, and the
> proposed solution!  After consulting with Russ, I have created a ticket for
> this in our JIRA system,
> https://bugtracking.unidata.ucar.edu/browse/NCF-337, and am going to try
> to integrate the fix before the next netCDF release.  We're currently
> preparing for our annual Python workshop, being held next week, but I will
> be able to turn my attention to this shortly thereafter.
> > >>
> > >> The fix seems pretty straightforward; the only confounding issue will
> be how to test for it, since it seems difficult to cause the issue; I'm
> sure I can come up with something.  We also don't have access to Cray
> hardware or a LUSTRE filesystem, but as you point out this is not limited
> to that environment.
> > >>
> > >> Thanks again for the comprehensive information!  Have a great day,
> > >>
> > >> -Ward
> > >>
> > >>
> > >>> Full Name: David Knaak
> > >>> Email Address: address@hidden
> > >>> Organization: Cray Inc.
> > >>> Package Version: 4,3,3,1
> > >>> Operating System:
> > >>> Hardware:
> > >>> Description of problem: This ticket is directly related to these
> tickets:
> > >>>
> > >>> 08 Apr 2015
> > >>> [netCDF #KZJ-320086]: Short read are not managed?
> > >>>
> http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg13072.html
> > >>>
> > >>> 23 Mar 2015
> > >>> [netCDF #PDZ-683250]: Short read are not managed?
> > >>>
> http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg13053.html
> > >>>
> > >>> Cray has analyzed the short read situation.  We believe we understand
> > >>> the problem and have a proposed fix for NetCDF.
> > >>>
> > >>> Due to a combination of factors, this short read issue has shown up
> on
> > >>> Cray systems with Lustre file systems.  But this issue is not
> limited Cray
> > >>> systems nor is it limited to Lustre file systems.  I will start with
> the
> > >>> specifics for Cray and Lustre but will then generalize it.
> > >>>
> > >>> The first factor is that a major change introduced with Lustre 2.5
> has
> > >>> caused behavior that is, by POSIX standards, legal, but is not the
> intended
> > >>> Lustre behavior.  The behavior is that a race condition can occur in
> Lustre
> > >>> that sometimes causes a read request to be only partially satisfied
> with
> > >>> a single read call.  This race condition is more likely to occur on
> large
> > >>> and very busy file systems but could occur on any Lustre 2.5 file
> system.
> > >>> Technically speaking, this is not a bug because POSIX semantics
> allows
> > >>> this (see below).  But this is not the intended behavior of Lustre
> and
> > >>> Lustre will be modified in a future release so that this does not
> happen.
> > >>>
> > >>> The second factor is that not all programs and libraries properly
> handle
> > >>> the case of a short POSIX read or POSIX write.  This is the case
> with UCAR
> > >>> NetCDF when the creation mode is NC_CLASSIC_MODEL.  It may also be
> the
> > >>> case in other libraries and many user programs that are not properly
> coded.
> > >>>
> > >>> In general, if a program calls read or write without checking for the
> > >>> number of bytes actually transferred and reading again if necessary
> > >>> then the program is exposed to the issue.  POSIX does not guarantee
> > >>> that a single read call will read all of the bytes requested or that
> a
> > >>> single write call will write all of the bytes requested.  Quoting
> from
> > >>> opengroup.org:
> > >>>
> > >>> http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
> > >>>
> > >>> Upon successful completion, where nbyte is greater than 0, read()
> shall
> > >>> mark for update the st_atime field of the file, and shall return the
> > >>> number of bytes read. This number shall never be greater than nbyte .
> The
> > >>> value returned may be less than nbyte if the number of bytes left in
> > >>> the file is less than nbyte, if the read() request was interrupted
> by a
> > >>> signal, or if the file is a pipe or FIFO or special file and has
> fewer
> > >>> than nbyte bytes immediately available for reading. For example, a
> read()
> > >>> from a file associated with a terminal may return one typed line of
> data.
> > >>>
> > >>> If a read() is interrupted by a signal before it reads any data, it
> > >>> shall return -1 with errno set to [EINTR].
> > >>>
> > >>> If a read() is interrupted by a signal after it has successfully read
> > >>> some data, it shall return the number of bytes read.
> > >>>
> > >>> The issue for POSIX write is essentially the same.  See:
> > >>>
> > >>> http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
> > >>>
> > >>> So if a read returns some but not all bytes, read should be called
> again.
> > >>> The code in libsrc/posixio.c shows that for the NC_CLASSIC_MODEL
> path,
> > >>> read is not called again if there is a short read:
> > >>>
> > >>> errno = 0;
> > >>> nread = read(nciop->fd, vp, extent);
> > >>> if(nread != (ssize_t) extent)
> > >>> {
> > >>> status = errno;
> > >>> if(nread == -1 || status != ENOERR)
> > >>> return status;
> > >>> /* else it's okay we read less than asked for */
> > >>> (void) memset((char *)vp + nread, 0, (ssize_t)extent - nread);
> > >>> }
> > >>> *nreadp = nread;
> > >>> *posp += nread;
> > >>>
> > >>> return ENOERR;
> > >>>
> > >>> With this code, if the POSIX read does not read the full number of
> bytes,
> > >>> the read is not retried but rather "memset" zeroes out the rest of
> the
> > >>> user's buffer even though there may still be more bytes in the file
> to read.
> > >>> This is the exact behavior that some of our users have experienced
> when
> > >>> using NetCDF.
> > >>>
> > >>> With some local modifications to the NetCDF library and some test
> cases,
> > >>> Cray verified that the NC_CLASSIC_MODEL path does in fact pass
> through the
> > >>> above code.  But for creation mode NC_NETCDF4 it does not.  For this
> mode,
> > >>> HDF5 I/O is called and HDF5 I/O properly handles short reads.
> > >>>
> > >>> Since a short read can potentially happen on any POSIX compliant file
> > >>> system, code calling read should handle this possibility with code
> > >>> something like this:
> > >>>
> > >>> /* fd is the file descriptor */
> > >>> /* buf is the initial address of the user buffer */
> > >>> /* request_count is the initial number of bytes requested */
> > >>> char *p = buf;
> > >>> size_t read_count;
> > >>> size_t nread;
> > >>> ssize_t bytes_xfered = 0;
> > >>>
> > >>> do {
> > >>> read_count = request_count - bytes_xfered;
> > >>> nread = read(fd, p, read_count);
> > >>> if (nread > 0) {
> > >>> bytes_xfered += nread;
> > >>> p += nread;
> > >>> }
> > >>> } while ((nread > 0 && bytes_xfered < request_count) ||
> > >>> (nread == -1 && errno == EINTR));
> > >>>
> > >>> Other examples of this method of reading again can be seen for HDF5
> I/O
> > >>> in HDF5 source and for MPI I/O in ANL MPICH2 source.
> > >>>
> > >>> After analyzing the issues, we provided one of our users who was
> seeing
> > >>> the issue with a wrapper routine for the POSIX read call.  This
> wrapper
> > >>> reads again when necessary as shown above.  With the wrapper, the
> user
> > >>> no longer had any failures, verifying both the path and the fix.
> > >>>
> > >>> As stated at the beginning, this issue is not unique to Cray systems
> or
> > >>> to Lustre file systems.  Lustre will eventually be modified so that
> it
> > >>> behaves as Lustre is intended to.  That is, Lustre will eventually
> do the
> > >>> additional reads such that POSIX read and POSIX write of a Lustre
> file will
> > >>> never return a short read.  But that doesn't remove the
> responsibility
> > >>> of program developers and library developers to handle the short read
> > >>> and short write cases.  Other file systems my exhibit the short read
> or
> > >>> write behavior.
> > >>>
> > >>> We are informing our customers of the issue and encouraging them to
> > >>> correct their own calls to POSIX read and write if necessary.  Cray
> is not
> > >>> intending to provide our customers with a locally modified NetCDF
> library.
> > >>> We leave it to UCAR to provide the appropriate fixes for NetCDF.
> When UCAR
> > >>> applies an appropriate fix and releases the new version, Cray will
> build
> > >>> it for our systems and release it to our customers.
> > >>>
> > >>> Please connect me with any questions, comments, or concerns.
> > >>>
> > >>> David Knaak
> > >>>
> > >>>
> > >>>
> > >> Ticket Details
> > >> ===================
> > >> Ticket ID: KZJ-320086
> > >> Department: Support netCDF
> > >> Priority: High
> > >> Status: Closed
> > >>
> >
> > --
> > Karlsruher Institut für Technologie (KIT)
> > Institut für Meteorologie und Klimaforschung
> > Bereich Troposphäre (IMK-TRO)
> >
> > Dr. Hans-Jürgen Panitz
> >
> > Hermann-von-Helmholtz-Platz 1
> > D-76344 Eggenstein-Leopoldshafen
> >
> > Phone: xx49-(0)721-608 22802
> > Fax  : xx49-(0)721-608 24377
> > E-Mail: address@hidden
> >
> > www.kit.edu
> > www-fzk.imk.uni-karlsruhe.de
> >
> > KIT - Universität des Landes Baden-Württemberg und
> > nationales Großforschungszentrum in der Helmholtz-Gemeinschaft
> >
>
> --
>
>
>
> Ticket Details
> ===================
> Ticket ID: KZJ-320086
> Department: Support netCDF
> Priority: High
> Status: Open
> Link:
> https://andy.unidata.ucar.edu/esupport/staff/index.php?_m=tickets&_a=viewticket&ticketid=25408
>
>  ​



Ticket Details
===================
Ticket ID: KZJ-320086
Department: Support netCDF
Priority: High
Status: Open