[netcdfgroup] Intel 15 and netCDF 4.3.2: Issues with Optimization?

To: <netcdfgroup@xxxxxxxxxxxxxxxx>
Subject: [netcdfgroup] Intel 15 and netCDF 4.3.2: Issues with Optimization?
From: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" <matthew.thompson@xxxxxxxx>
Date: Tue, 2 Sep 2014 13:42:17 -0400

All,

I was wondering if anyone out there has encountered issues with NetCDF4.3.2 and Intel 15.0.0.090 (just released) because I seem to haveencountered one in our application where we throw an FPE writing a file.

To wit, I work on the GEOS-5 GCM and our Baselibs build things likeHDF4, HDF5, Netcdf, etc for use with our code. Our normal build ofnetcdf in the Baselibs usually just a simple one configured as:

netcdf.config : netcdf/configure
   @echo "Configuring netcdf $*"
   @(cd netcdf; \
          export PATH="$(prefix)/bin:$(PATH)" ;\
          export CPPFLAGS="$(CPPFLAGS) $(INC_SUPP)";\
          export LIBS="-L$(prefix)/lib -lmfhdf -ldf -lsz -ljpeg $(LINK_GPFS) 
$(LIB_CURL) -lm" ;\
          ./configure --prefix=$(prefix) \
                      --includedir=$(prefix)/include/netcdf \
                      --enable-hdf4 \
                      --enable-dap \
                      $(NC_PAR_TESTS) \
                      --disable-shared \
                      --enable-netcdf-4 \
                      CC=$(NC_CC) FC=$(NC_FC) CXX=$(NC_CXX) F77=$(NC_F77) )

In this case, since we build for Parallel HDF5, that means our CC=mpicc,FC=mpif90, etc. I built two versions, both with Intel 15.0.0.090, usingMVAPICH2 2.0 and Intel MPI 5.0.1.035 and both show the issue.

I did a "make check" with my two netcdf builds and they both passed mostof the tests (some dap tests fail, I think, because I'm on a computenode where no outside internet is seen) so it must not be a simple fail.

So, my first thought was, well, let's add '-g -O0' and rebuild thelibrary and get to the bottom of this, and, of course, the code runsjust fine now! So, my guess is that it has something to do with theoptimizer.

Then, I built the library explicitly with "-g -O" and I get the same FPEas before, so it seems as if the optimizer has done...something.Totalview shows that when we go to write an output NC4 file we get anFPE and the stack trace leads to var_create_dataset[1]:

var_create_dataset, FP=7fff42867cb0
write_var,       FP=7fff42867d70
nc4_rec_write_metadata, FP=7fff42867de0
nc4_enddef_netcdf4_file, FP=7fff42867e00
NC4__enddef,     FP=7fff42867e20
nc_enddef,       FP=7fff42867e40
ncendef,         FP=7fff42867e50
ncendf_,         FP=7fff42867e60
cfio_create_,    FP=7fff4286a900
esmf_cfiosdffilecreate, FP=7fff4286b470
esmf_cfiofilecreate, FP=7fff4286b4c0


and points to line 1453-4 of libsrc4/nc4hdf.c:

1449                         /* Unlimited dim always gets chunksize of 1. */
1450                         if (dim->unlimited)
1451                            chunksize[d] = 1;
1452                         else
1453                            chunksize[d] = 
pow((double)DEFAULT_CHUNK_SIZE/type_size,
1454                                               1/(double)(var->ndims - 
unlimdim));
1455

In Totalview, I see that "type_size" is said to be "0" which, of course,will do bad things and might be causing the FPE. Since type_size isdetermined from things within var, who knows if a struct is clobbered orwhat.

Has anyone else seen this? I suppose for now I can just point to thedebug-netcdf build so I can continue developing/testing with Intel 15though I don't know what the cost of running netCDF at -O0 is.


Thanks,
Matt

[1] Yes, that does indeed say ncendf because this code has been around awhile in our model and no one has wanted to translate all the ancientnetcdf calls to actual modern ones for fear of breaking somethingcrucial. But, in the end, it's still calling the right call it needs to.



--
Matt Thompson          SSAI, Sr Software Test Engr
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712              Fax: 301-614-6246

2014 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: