[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #KLB-596506]: apparent bug in netcdf-4.2



Jim,

> I have created a test program that appears to demonstrate a serious bug in
> netcdf-4.2.   The program opens an existing netcdf file, reads a variable
> and closes the file.
> It prints the min and max val and sum of the variable.
> 
> Then the program opens the file to update, goes into define mode using
> nf_redef, leaves define mode
> and closes the file.
> 
> Then I reopen the file and read and print the same variable as above, the
> values of min, max and sum should be the same as
> before, but they are not - indicating that the file has been corrupted.
> 
> The input file used in the test is rather large 1.4GB so I have put the
> testcase on yellowstone in directory
> /glade/p/work/jedwards/nfbug
> I've made a few attempts to reduce the size of the file, but this causes
> the error to go away.
> 
> I have also put the test in the Makefile so you only need to run gmake to
> execute it.

I created a version of your code that just uses the C API and the bug still 
occurs, 
so it's not in the Fortran API.

I converted your original 64-bit-offset format file to a classic model format 
file, and
the bug does not occur, so apparently it's in the code implementing the 64-bit 
offset
format, first introduced in December 2004, with version 3.6.0.

Surprisingly, the bug occurs in netCDF version 3.6.0 and every subsequent 
version, up
to and including the current 4.3 release candidate, so it's been there for over 
8 years.
You're the first to report it, but I hope it hasn't corrupted other users' 
64-bit offset files
who didn't notice.  Perhaps the sequence of open, redef, enddef, close calls 
with no
changes to the header between the redef and enddef call is an uncommon enough 
pattern that the problem is rare, but it certainly is a major bug, since it 
occurs with no
indication that the file is corrupted until the wrong values are later read.

Unfortunately, the valgrind tool is of no help, as it doesn't indicate any 
detectable
memory access problems.  I'm digging into it with the gdb debugger to try to 
understand
the bug and fix it.  I've created a Jira ticket if you want to follow progress:

  http://bugtracking.unidata.ucar.edu/browse/NCF-234

If you see any other conditions under which the bug occurs, I'd be very 
interested.

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: KLB-596506
Department: Support netCDF
Priority: Normal
Status: Closed