[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #PDZ-683250]: Short read are not managed?



Hi Laurent,

> Organization: CEA
> Package Version: 3.6.3 - 4.3.1.1
> Operating System: Linux
> Hardware: Lustre filesystem

> we are currently facing problems with short POSIX read and write with
> our Lustre filesystem. We had difficulties to link these problems with a
> Netcdf problem. We got users who get netcdf files where arrays are filled
> with zeroes. And finally we found out that when netcdf get a short read,
> it fills the remaining buffer with zeroes without warning for users
> (posixio.c in pg_gin at line 319). When we look at the function pg_get
> (at line 494), we can that the short read is not managed (or users are
> not warned that they get an incomplete array).
> 
> Is there possible to do something about that case?

A fix would be to just delete that memset call. It is not
considered an error to just read values remaining before
an end-of-file, in the case that the user asks for more 
data than is in the file being read. The OpenGroup IEEE 
Std 1003.1 says:

  If a read() is interrupted by a signal after it has 
  successfully read some data, it shall return the number 
  of bytes read. ...
  http://pubs.opengroup.org/onlinepubs/9699919799/

But why one would provide a memory buffer to a read call,
expecting values at the end of the buffer to be preserved 
if there were fewer than N values read?

Does Open Group, ISO, or some other UNIX standard specify 
that behavior for either read(2) or fread(3), for example? 
If so, I can't find it ...

Nevertheless, I just tested the current developers snapshot
when the memset call is commented out. A "make check" passes
all tests.

However, "make check" fails fairly early in the tests when 
"--enable-valgrind-tests" is specified to the configure 
script:

  valgrind -q --error-exitcode=2 --leak-check=full ./tst_h_compounds:

  *** Checking HDF5 compound types.
  *** Checking simple HDF5 compound types...ok.
  *** Checking HDF5 compound types and groups...ok.
  *** Checking HDF5 compound type which contains an array...ok.
  *** Checking HDF5 compound type 6 different types...ok.
  *** Checking HDF5 compound variable which contains a compound type...ok.
  ==26894== Syscall param write(buf) points to uninitialised byte(s)
  ==26894==    at 0x313A6D34F0: __write_nocancel (in /lib64/libc-2.13.so)
   ...
  *** Checking HDF5 variable which contains a compound type with different 
string handling...ok.
  ==26894== Syscall param write(buf) points to uninitialised byte(s)
  ==26894==    at 0x313A6D34F0: __write_nocancel (in /lib64/libc-2.13.so)
   ...
  *** Tests successful!
  FAIL: run_valgrind_tests.sh
  ================================================
  1 of 27 tests failed

It's not clear to me that the behavior you have pointed to 
is a bug. Can you describe a use case where a user would 
need the values in a read buffer to be preserved, in case 
of a short read?

If you don't care about valgrind memory-usage warnings, just
commenting out that memset call would be a workaround ...

In any case, thanks for pointing out that behavior. It looks like 
we should document what happens when an EOF is encountered during
a netCDF read call, but I don't think that should happen unless 
the netCDF file has been truncated or corrupted by other software.
If you have a small program that demonstrates otherwise, please
send it to us!

--Russ







--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: PDZ-683250
Department: Support netCDF
Priority: Normal
Status: Closed