[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #GQG-203630]: Problem in saving netcdf files



> Thanks alot for your reply. I will ask the Hector help desk support to check 
> with it. Acturally my model
> is running fine before the Hector system updated recently.
> My problem is very similar as shown in the  link you provided. By the way, 
> where can I find the
> "globaldefs.h"  or "def_var.F"?

Thse are not files in the netCDF source distribution, so I suspect they were 
part of the ROMS code that
called netCDF.  So they probably aren't directly relevant to your CESM (unless 
it uses ROMS), but there
may be a similar problem in CESM code.

--Russ

> ________________________________________
> From: Unidata netCDF Support [address@hidden]
> Sent: 13 November 2011 19:29
> To: Wuhu Feng
> Cc: address@hidden; Wuhu Feng
> Subject: [netCDF #GQG-203630]: Problem in saving netcdf files
> 
> Hi Wuhu,
> 
> > Recently I have some problem to run the NCAR CESM (cesm1_0_3) model. It
> > seems to me that the problem only happens when the model is saving the
> > restart files.
> >
> > The modules I am using at the UK hector supercomputer
> > (http://www.hector.ac.uk/) are:
> >
> > 1) modules/3.2.6.6
> > 2) nodestat/2.2-1.0400.29866.4.3.gem
> > 3) sdb/1.0-1.0400.30000.6.18.gem
> > 4) MySQL/5.0.64-1.0000.4667.20.1
> > 5) lustre-cray_gem_s/1.8.4_2.6.32.45_0.3.2_1.0400.6221.1.1-1.0400.30252.1.29
> > 6) udreg/2.3.1-1.0400.3911.5.6.gem
> > 7) ugni/2.3-1.0400.3912.4.29.gem
> > 8) gni-headers/2.1-1.0400.3906.5.1.gem
> > 9) dmapp/3.2.1-1.0400.3965.10.12.gem
> > 10) xpmem/0.1-2.0400.29883.4.6.gem
> > 11) hss-llm/6.0.0
> > 12) Base-opts/1.0.2-1.0400.29823.8.1.gem
> > 13) xtpe-network-gemini
> > 14) pbs/10.4.0.101257
> > 15) packages-phase2b
> > 16) usertools/1.0
> > 17) budgets/1.0
> > 18) pgi/11.6.0
> > 19) totalview-support/1.1.2
> > 20) xt-totalview/8.9.1
> > 21) xt-libsci/11.0.00
> > 22) pmi/2.1.2-1.0000.8396.13.5.gem
> > 23) xt-asyncpe/5.00
> > 24) atp/1.2.1
> > 25) PrgEnv-pgi/4.0.30
> > 26) xt-mpich2/5.3.1
> > 27) xtpe-mc12
> > 28) svn/1.6.2
> > 29) hdf5/1.8.5.0
> > 30) netcdf/4.1.1.0
> >
> >
> > The debug information see below:
> >
> > "
> > Thread 1.1 received a signal (Floating Point Exception)
> > d1.<> dwhere
> > > 0 ncx_put_float_double PC=0x015e2ec8, FP=0x7fffffff1a90
> > [/ptmp/ulib/netcdf/4.1.1.0/source> /libsrc/ncx.c#1386]
> > 1 ncx_putn_float_double PC=0x015e6488, FP=0x7fffffff1ad0
> > [/ptmp/ulib/netcdf/4.1.1.0/source/> libsrc/ncx.c#5731]
> > 2 putNCvx_float_double PC=0x015f3d15, FP=0x7fffffff1b60
> > [/ptmp/ulib/netcdf/4.1.1.0/source/l> ibsrc/putget.c#2047]
> > 3 putNCv_double PC=0x015f500a, FP=0x7fffffff1ba0
> > [/ptmp/ulib/netcdf/4.1.1.0/source/libsrc/p> utget.c#2594]
> > 4 nc3_put_vara_double PC=0x015fbda5, FP=0x7fffffff1c10
> > [/ptmp/ulib/netcdf/4.1.1.0/source/li> bsrc/putget.c#5825]
> > 5 nc4_put_vara_tc PC=0x015cabb6, FP=0x7fffffff1c50
> > [/ptmp/ulib/netcdf/4.1.1.0/source/libsrc> 4/nc4var.c#1839]
> > 6 nc_put_vara_double PC=0x015cb2eb, FP=0x7fffffff1c90
> > [/ptmp/ulib/netcdf/4.1.1.0/source/lib> src4/nc4var.c#2106]
> > 7 nf_put_vara_double_ PC=0x01615b1f, FP=0x7fffffff5d10
> > [/ptmp/ulib/netcdf/4.1.1.0/source/fo> rtran/fort-varaio.c#151]
> > 8 netcdf`nf90_put_var_1d_eightbytereal PC=0x01631f48,
> > FP=0x7fffffff5de0 [/ptmp/ulib/netcdf/> 
> > 4.1.1.0/source/f90/netcdf_expanded.f90#1189]
> > 9 pionfwrite_mod`write_nfdarray_double PC=0x00d6d769,
> > FP=0x7fffffff61c0 [/esfs1/n02/n02/elf>
> > engwh/CESM1.0/ftn/ftn_fsdwfepmc/pio/pionfwrite_mod.F> 90#580]
> > 10 piodarray`write_darray_nf_double PC=0x00db0ef9, FP=0x7fffffff65d0
> > [/esfs1/n02/n02/elfeng> wh/CESM1.0/ftn/ftn_fsdwfepmc/pio/piodarray.F90#2305]
> > 11 piodarray`write_darray_1d_double PC=0x00da30a9, FP=0x7fffffff6650
> > [/esfs1/n02/n02/elfeng> wh/CESM1.0/ftn/ftn_fsdwfepmc/pio/piodarray.F90#392]
> > 12 piodarray`write_darray_3d_double PC=0x00da5482, FP=0x7fffffff67b0
> > [/esfs1/n02/n02/elfeng> wh/CESM1.0/ftn/ftn_fsdwfepmc/pio/piodarray.F90#865]
> > 13 cam_history`dump_field PC=0x004a7d15, FP=0x7fffffff6a80
> > [/work/n02/n02/elfengwh/CESM1.0/>
> > cesm1_0_3/models/atm/cam/src/control/cam_history.F90#4310> ]
> > 14 cam_history`wshist PC=0x004aa16c, FP=0x7fffffff73c0
> > [/work/n02/n02/elfengwh/CESM1.0/cesm>
> > 1_0_3/models/atm/cam/src/control/cam_history.F90#4609> ]
> > 15 cam_history`write_restart_history PC=0x00485d34,
> > FP=0x7fffffff8170 [/work/n02/n02/elfeng>
> > wh/CESM1.0/cesm1_0_3/models/atm/cam/src/control/ca> m_history.F90#866]
> > 16 cam_restart`cam_write_restart PC=0x004b2efd, FP=0x7fffffff8640
> > [/work/n02/n02/elfengwh/C> 
> > ESM1.0/cesm1_0_3/models/atm/cam/src/control/cam_restart.F90#251]
> > 17 cam_comp`cam_run4 PC=0x00471b96, FP=0x7fffffff8680
> > [/work/n02/n02/elfengwh/CESM1.0/cesm1> 
> > _0_3/models/atm/cam/src/control/cam_comp.F90#390]
> > 18 atm_comp_mct`atm_run_mct PC=0x0046afb5, FP=0x7fffffff8760
> > [/work/n02/n02/elfengwh/CESM1.>
> > 0/cesm1_0_3/models/atm/cam/src/cpl_mct/atm_comp_mct.F90#523> ]
> > 19 ccsm_comp_mod`ccsm_run PC=0x00412d84, FP=0x7fffffff9a60
> > [/work/n02/n02/elfengwh/CESM1.0/> 
> > cesm1_0_3/models/drv/driver/ccsm_comp_mod.F90#2165]
> > 20 ccsm_driver PC=0x004164e9, FP=0x7fffffff9a70
> > [/work/n02/n02/elfengwh/CESM1.0/cesm1_0_3/m> 
> > odels/drv/driver/ccsm_driver.F90#47]
> > 21 main PC=0x0040050b, FP=0x7fffffff9a90 [ccsm.exe]
> > 22 __libc_start_main PC=0x01a256a0, FP=0x7fffffff9b50
> > [/usr/src/packages/BUILD/glibc-2.11.1> /csu/libc-start.c#226]
> > 23 _start PC=0x004003e0, FP=0x7fffffff9b60 [ccsm.exe]
> 
> It looks like a floating-point overflow is occurring when converting a
> double-precision value to a single-precision floating-point value just
> before converting it to the portable form (XDR) for writing to disk.
> 
> That might happen if you are writing out an array of doubles to a
> netCDF variable that was declared type NF_FLOAT (a 32-bit float), and
> one of the double values was too large to fit in a float, for example
> it might be a fill value larger than 9.9692099683868690e+36.
> 
> I see you are using a Cray, so another possibility might be something
> similar to this problem:
> 
> https://www.myroms.org/projects/src/ticket/217
> 
> We don't have a Cray to test on, so we can't duplicate the problem
> here.  I'd be curious if "make check" ran successfully for netCDF
> 4.1.1 for the Cray installation you're using, as it has some tests for
> extreme floating-point values to make sure the netCDF library handles
> them according to the documentation.  The current netCDF library is
> version 4.1.3, but I believe there are no fixes in the current release
> for floating-point bugs that would have any effect on the problem you
> are seeing.
> 
> --Russ
> 
> Russ Rew                                         UCAR Unidata Program
> address@hidden                      http://www.unidata.ucar.edu
> 
> 
> 
> Ticket Details
> ===================
> Ticket ID: GQG-203630
> Department: Support netCDF
> Priority: Normal
> Status: Closed
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: GQG-203630
Department: Support netCDF
Priority: Normal
Status: Closed