[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000911: ldm 5.0.8 quitting



On Mon, 11 Sep 2000, Unidata Support wrote:

> >To: address@hidden
> >From: "Arthur A. Person" <address@hidden>
> >Subject: ldm 5.0.8 quitting
> >Organization: UCAR/Unidata
> >Keywords: 200009111353.e8BDrSb02895
> 
> Hi...
> 
> I've had a couple cases recently where the ldm has spontaneously quit,
> yesterday being another.  Here's the tail of my ldmd log:
> 
> Sep 10 21:33:00 navier pqexpire[981]:  del
> feb0eace3782d1da4b0e63da77a41239    16146 20000910195956.846    NMC2 423
> /u/ftp/gateway/n
> Sep 10 21:33:00 navier pqexpire[981]:  del
> 5c89904ce6593096e170f9f7eaeb8c51    17156 20000910200154.622    NMC2 207
> /u/ftp/gateway/n
> Sep 10 21:33:00 navier pqexpire[981]:  del
> b94729db17cdd79b3ee0a2f11fbd4ecd    20976 20000910201535.317    NMC2 028
> /u/ftp/gateway/n
> Sep 10 21:33:00 navier pqexpire[981]:  del
> c241f1adf115fd1a44847509aac52306    17156 20000910200154.692    NMC2 211
> /u/ftp/gateway/n
> Sep 10 21:33:00 navier pqexpire[981]:  del
> 930cbcaeda9c1502fa769ea8a9d1ab3d    16148 20000910195956.881    NMC2 426
> /u/ftp/gateway/n
> Sep 10 21:33:00 navier pqexpire[981]:  del
> 77a249c13323585282144485eaa53646    16148 20000910195956.981    NMC2 430
> /u/ftp/gateway/n
> Sep 10 21:33:00 navier pqexpire[981]: > Queue usage (bytes):629145600
> Sep 10 21:33:00 navier pqexpire[981]: >          (nregions):   52239
> Sep 10 21:33:00 navier pqexpire[981]: > Recycled  10618.192 kb/hr (
> 22605.934 prods per hour)
> Sep 10 21:33:16 navier pqact[984]: pbuf_flush 26: time elapsed   3.559374
> Sep 10 21:33:20 navier pqact[984]: pbuf_flush 26: time elapsed   3.648466
> Sep 10 21:33:20 navier pqact[984]: pbuf_flush (26) write: Broken pipe
> Sep 10 21:33:20 navier pqact[984]: pipe_dbufput:
> /opt1/gempak/NAWIPS/bin/sol/dcshef-v1-b36-m96-ddata/gempak/logs/dcshef.log-p/opt1/ge
> Sep 10 21:33:20 navier pqact[984]: pipe_prodput: trying again
> Sep 10 21:33:29 navier rossby(feed)[6509]: pnga2area Q1 U1 192 GOES-9_IMG
> UNKBAND 20km 20000910 1800: RPC: Timed out
> Sep 10 21:33:29 navier rossby(feed)[6509]: pq_sequence failed: I/O error
> (errno = 5)
> Sep 10 21:33:29 navier rossby(feed)[6509]: Exiting
> Sep 10 21:33:30 navier pqact[984]: pbuf_flush 30: time elapsed   7.645828
> Sep 10 21:33:35 navier rpc.ldmd[980]: child 6509 exited with status 1
> Sep 10 21:33:40 navier pqact[984]: pbuf_flush 30: time elapsed  10.112282
> Sep 10 21:34:03 navier motherlode[987]: Connection reset by peer
> Sep 10 21:34:03 navier motherlode[987]: Disconnect
> Sep 10 21:34:17 navier rossby[7464]: Connection from rossby.wcupa.edu
> Sep 10 21:34:17 navier rossby(feed)[7464]: Starting Up: 20000910210600.712
> TS_ENDT {{DDPLUS,  ".*"},{HDS,  "(^H)|(^[YZ].[QRA])|(^Y.[A
> Sep 10 21:34:17 navier rossby(feed)[7464]: topo:  rossby.wcupa.edu
> MCIDAS|HDS|DDPLUS
> Sep 10 21:34:24 navier unidata[988]: Connection reset by peer
> Sep 10 21:34:24 navier unidata[988]: Disconnect
> Sep 10 21:34:33 navier motherlode[987]: run_requester: 20000910210806.696
> TS_ENDT {{FSL2|WMO,  ".*"},{NMC2,  ".*"}}
> Sep 10 21:34:37 navier motherlode[987]: FEEDME(motherlode.ucar.edu):
> reclass: 20000910210806.696 TS_ENDT {{FSL2|IDS|HDS|DDPLUS,  ".*"
> Sep 10 21:34:54 navier unidata[988]: run_requester: 20000910213110.382
> TS_ENDT {{MCIDAS,  ".*"}}
> Sep 10 21:34:55 navier unidata[988]: FEEDME(unidata.ssec.wisc.edu): OK
> Sep 10 21:34:55 navier unidata[988]: 4626507a7812279c3fc53e1ef3a89be3:
> never completed
> Sep 10 21:35:01 navier cirrus(feed)[25135]: RECLASS: 20000910203501.167
> TS_ENDT {{DDPLUS,  ".*"},{HDS,  ".*"},{MCIDAS,  ".*"}}
> Sep 10 21:35:18 navier rossby(feed)[7464]: pnga2area Q1 U3 202 GRAPHICS
> UNKBAND 5km 20000910 2059: RPC: Timed out (5)
> Sep 10 21:35:18 navier rossby(feed)[7464]: pq_sequence failed: I/O error
> (errno = 5)
> Sep 10 21:35:18 navier rossby(feed)[7464]: Exiting
> Sep 10 21:35:24 navier rpc.ldmd[980]: child 7464 exited with status 1
> Sep 10 21:35:25 navier pqexpire[982]:  del
> 11d70be8c7d45a27bf9fc2b45fd1c10c       73 20000910210806.118 IDS|DDPLUS
> 355001  metar KLVS
> Sep 10 21:35:25 navier pqexpire[982]:  del
> 1800c5209138a5127c5fc20d1521157b       72 20000910210806.666 IDS|DDPLUS
> 356001  metar KOEO
> Sep 10 21:35:25 navier pqexpire[982]: > Queue usage (bytes): 1528416
> Sep 10 21:35:25 navier pqexpire[982]: >          (nregions):    8543
> Sep 10 21:35:25 navier pqexpire[982]: > Recycled   1186.102 kb/hr (
> 6870.642 prods per hour)
> Sep 10 21:35:35 navier ganges(feed)[25529]: h_clnt_call:
> ganges.Princeton.EDU: BLKDATA: time elapsed  22.086796
> Sep 10 21:35:37 navier motherlode[987]: FEEDME(motherlode.ucar.edu): RPC:
> Timed out
> Sep 10 21:35:48 navier rossby[9014]: Connection from rossby.wcupa.edu
> Sep 10 21:35:49 navier rossby(feed)[9014]: Starting Up: 20000910210600.712
> TS_ENDT {{DDPLUS,  ".*"},{HDS,  "(^H)|(^[YZ].[QRA])|(^Y.[A
> Sep 10 21:35:49 navier rossby(feed)[9014]: topo:  rossby.wcupa.edu
> MCIDAS|HDS|DDPLUS
> Sep 10 21:35:56 navier ganges(feed)[25529]: h_clnt_call:
> ganges.Princeton.EDU: BLKDATA: time elapsed  21.223542
> Sep 10 21:36:01 navier nora-f(feed)[28276]: pq_sequence failed: I/O error
> (errno = 5)
> Sep 10 21:36:01 navier nora-f(feed)[28276]: Exiting
> Sep 10 21:36:07 navier rpc.ldmd[980]: child 28276 exited with status 1
> Sep 10 21:36:18 navier motherlode[987]: FEEDME(motherlode.ucar.edu):
> reclass: 20000910210806.696 TS_ENDT {{FSL2|IDS|HDS|DDPLUS,  ".*"


Art,

The actual problem is that the following  assertion failed, either the
string being converted to xdr format( network transfer format) was null or
it's length was equal to 0.  At this time, I can't really predict what
caused this to happen but it might because navier is having trouble with
tranferring data to downstream hosts: rossby ganges nora-f . At this
point, I don't want to spend too much time on time because it's ldm-5.0.8
version.  Can you upgrade to ldm-5.1.2?  If you still have the problem, I
take a more detail look at the problem.  The ldm-5.1.2 has quicker product
interaction with the LDM queue, this might solve the problem.  There are
binary and the src code releases available.

Robb...



> Sep 10 21:36:18 navier motherlode[987]: assertion "pIf(xdrs->x_op ==
> XDR_ENCODE, *cpp != NULL && **cpp != 0)" failed: file "ldm_xdr.c







> Sep 10 21:36:24 navier rpc.ldmd[980]: child 987 terminated by signal 6
> Sep 10 21:36:24 navier rpc.ldmd[980]: Killing (SIGINT) process group
> Sep 10 21:36:24 navier rpc.ldmd[980]: Interrupt
> Sep 10 21:36:24 navier rpc.ldmd[980]: Exiting
> Sep 10 21:36:24 navier rossby(feed)[9014]: Interrupt
> Sep 10 21:36:24 navier rossby(feed)[9014]: Exiting
> Sep 10 21:36:24 navier sgi2(feed)[1034]: Interrupt
> Sep 10 21:36:24 navier pqact[984]: Interrupt
> Sep 10 21:36:24 navier pqact[984]: Exiting
> Sep 10 21:36:24 navier sysu1[25590]: Interrupt
> Sep 10 21:36:25 navier unidata[988]: Interrupt
> Sep 10 21:36:24 navier pqbinstats[983]: Interrupt
> Sep 10 21:36:27 navier sysu1[25590]: Exiting
> Sep 10 21:36:24 navier cirrus(feed)[25135]: Interrupt
> Sep 10 21:36:24 navier cirrus(feed)[25307]: Interrupt
> Sep 10 21:36:24 navier windfall(feed)[28595]: Interrupt
> Sep 10 21:36:24 navier catena(feed)[1041]: Interrupt
> Sep 10 21:37:05 navier cirrus(feed)[25135]: Exiting
> Sep 10 21:37:05 navier sgi2(feed)[1034]: Exiting
> Sep 10 21:37:05 navier rossby(feed)[9014]: pnga2area Q1 U3 202 GRAPHICS
> UNKBAND 5km 20000910 2059: RPC: Timed out (5)
> Sep 10 21:37:48 navier cirrus(feed)[25307]: Exiting
> Sep 10 21:36:24 navier shemp[989]: Interrupt
> Sep 10 21:37:48 navier windfall(feed)[28595]: Exiting
> Sep 10 21:36:24 navier striker[990]: Interrupt
> Sep 10 21:37:48 navier unidata[988]: Exiting
> Sep 10 21:37:48 navier striker[990]: Exiting
> Sep 10 21:37:49 navier catena(feed)[1041]: Exiting
> Sep 10 21:36:24 navier pqact[993]: Interrupt
> Sep 10 21:36:25 navier pqexpire[982]: Interrupt
> Sep 10 21:36:24 navier pqact[985]: Interrupt
> Sep 10 21:37:49 navier pqact[993]: Exiting
> Sep 10 21:37:49 navier shemp[989]: Exiting
> Sep 10 21:37:49 navier ganges(feed)[25529]: pq_sequence failed: I/O error
> (errno = 5)
> Sep 10 21:37:49 navier ganges(feed)[25529]: Exiting
> Sep 10 21:37:49 navier pqact[985]: Exiting
> Sep 10 21:37:48 navier pqexpire[981]: Interrupt
> Sep 10 21:37:50 navier pqbinstats[983]: Exiting
> Sep 10 21:37:50 navier pqexpire[982]: Exiting
> Sep 10 21:37:50 navier pqexpire[982]: > Up since:      20000907065745.615
> Sep 10 21:37:50 navier pqexpire[981]: Exiting
> Sep 10 21:37:50 navier pqexpire[981]: > Up since:      20000907065745.620
> Sep 10 21:37:51 navier pqexpire[981]: > Queue usage (bytes):629145600
> Sep 10 21:37:51 navier pqexpire[982]: > Queue usage (bytes): 1528416
> Sep 10 21:37:51 navier pqexpire[982]: >          (nregions):    8543
> Sep 10 21:37:51 navier pqexpire[981]: >          (nregions):   52239
> Sep 10 21:37:51 navier pqexpire[982]: > nbytes recycle:    105577008 (
> 1186.102 kb/hr)
> Sep 10 21:37:51 navier pqexpire[981]: > nbytes recycle:    959319840 (
> 10618.192 kb/hr)
> Sep 10 21:37:51 navier pqexpire[982]: > nprods deleted:       597234 (
> 6870.642 per hour)
> Sep 10 21:37:51 navier pqexpire[981]: > nprods deleted:      1994506 (
> 22605.934 per hour)
> Sep 10 21:37:51 navier pqexpire[982]: > First deleted: 20000907061234.877
> Sep 10 21:37:51 navier pqexpire[981]: > First deleted: 20000907041345.211
> Sep 10 21:37:51 navier pqexpire[982]: > Last  deleted: 20000910210806.666
> Sep 10 21:37:51 navier pqexpire[981]: > Last  deleted: 20000910202730.733
> Sep 10 21:37:51 navier pqsurf[986]: Exiting
> Sep 10 21:37:51 navier pqsurf[986]:   Queue usage (bytes): 1528416
> Sep 10 21:37:51 navier pqsurf[986]:            (nregions):    8543
> Sep 10 21:37:51 navier rpc.ldmd[980]: Terminating process group
> Sep 10 21:37:51 navier rpc.ldmd[980]: child 25529 exited with status 1
> Sep 10 21:37:51 navier rpc.ldmd[980]: child 9014 terminated by signal 11
> Sep 10 21:37:51 navier rpc.ldmd[980]: Killing (SIGINT) process group
> Sep 10 21:37:51 navier DCHRLY[7390]: Terminate Signal
> Sep 10 21:37:51 navier DCSHEF[7396]: Terminate Signal
> Sep 10 21:37:52 navier pqsurf[986]: Number of products 186071
> Sep 10 21:37:52 navier pqsurf[986]: Number of observations 714904
> Sep 10 21:37:52 navier pqsurf[986]: Number of dups 76726
> 
> 
> Anything you can point to that might be the cause/remedy of the problem?
> 
>                                          Thanks.
> 
>                                            Art.
> 
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email:  address@hidden, phone:  814-863-1563
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================