[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20051208: LDM death on multiple platforms/multiple versions on 2005120



Art,

Amazing!

The product-queue module contains a bad assert() that's activated
if the first four bytes of the MD5 checksum of a data-product (the
data-product's signature) are all zeros.  In order for this to occur,
the LDM package must also have been compiled with assertions enabled
(which is not the default).

Apparently, this is the first such occurrence in eleven years.

I've removed the offending assertion and will make a new release when I
return.

Regards,
Steve Emmerson

------- Original Message


Date:    Thu, 08 Dec 2005 09:18:50 MST
To:      "Arthur A. Person" <address@hidden>
cc:      address@hidden, address@hidden
From:    Unidata Support <address@hidden>
Subject: 20051208: LDM death on multiple platforms/multiple versions on 2005120
     ***7

Delivery-Date: Thu Dec  8 09:19:02 2005
Organization: UCAR/Unidata
In-reply-to: Your message of "Thu, 08 Dec 2005 10:16:45 EST."
         <address@hidden> 
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on 
         laraine.unidata.ucar.edu
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
         SUBJ_HAS_UNIQ_ID autolearn=unavailable version=3.0.1

>From: "Arthur A. Person" <address@hidden>
>Organization: PSU
>Keywords: 200512081516.jB8FGp7s001100 LDM assertion failure

Hi Art,

>I saw an email in a glimpse search regarding an LDM death last night and 
>we had a similar occurence at nearly the same time.

I saw the same thing running LDM-6.4.2 on a dual Xeon EM64T system in
my office.  I thought that it was coincidence until your email!

>We're running LDM 
>6.0.15.  I thought you might like to know this occurred at more than one 
>site.

Yes, I am very happy that you sent in this report!  The failure of your
6.0.15 LDM installation was exactly the same as the 6.0.14 installation
in Hong Kong and my 6.4.2 installation here at the UPC.  Intestingly,
the LDMs on our cluster backends and other machines did not show any
problems. I am sure that Steve will be diving into this mystery when he
returns from the AGU meeting.

Regardless of why the LDMs exited, I would like to encourage you to
upgrade your 6.0.15 installation to a current LDM release.

By the way, is there any update on your efforts of putting together a
cluster as a toplevel East coast IDD relay?  I don't know if you saw
the note from NOAA/GDS (formerly FSL); they announced to MADIS users
that their relay machine has been replaced by a "server farm"
(cluster).  We (Steve E.) were talking right along to the GDS folks
about our cluster setup as they developed theirs.  I will be contacting
them to see if they implemented anything new that we should be aware
of.

Cheers,

Tom

>Here's the tail of our ldmd.log file:
>
>Dec 08 02:15:10 ldm usgodae3[8874]: ERROR: requester6.c:231: Upstream LDM 
>died
>Dec 08 02:15:10 ldm usgodae3[8874]: Desired product class: 
>20051208005421.625 TS_ENDT {{FNMOC,  ".*"}}
>Dec 08 02:15:11 ldm usgodae3[8874]: Connected to upstream LDM-6
>Dec 08 02:15:11 ldm usgodae3[8874]: Upstream LDM is willing to feed
>Dec 08 04:08:37 ldm usgodae3[8874]: ERROR: requester6.c:231: Upstream LDM 
>died
>Dec 08 04:08:37 ldm usgodae3[8874]: Desired product class: 
>20051208025404.321 TS_ENDT {{FNMOC,  ".*"}}
>Dec 08 04:08:37 ldm usgodae3[8874]: Connected to upstream LDM-6
>Dec 08 04:08:38 ldm usgodae3[8874]: Upstream LDM is willing to feed
>Dec 08 04:17:28 ldm ldm[8879]: ERROR: requester6.c:231: Upstream LDM died
>Dec 08 04:17:28 ldm ldm[8879]: Desired product class: 20051207160017.634 
>TS_ENDT {{GEM,  ".*"}}
>Dec 08 04:17:28 ldm ldm[8879]: Connected to upstream LDM-6
>Dec 08 04:17:28 ldm ldm[8879]: Upstream LDM is willing to feed
>Dec 08 04:48:29 ldm vortex(feed)[18170]: up6.c:210: COMINGSOON: RPC: 
>Unable to receive; errno = Connection reset by peer
>Dec 08 04:48:29 ldm vortex(feed)[18170]: up6.c:427: Product send failure: 
>Input/output error
>Dec 08 04:48:30 ldm rpc.ldmd[8866]: child 18170 exited with status 6
>Dec 08 04:48:30 ldm vortex[20370]: ldm6_server.c:140: Restricting request: 
>20051208044729.563 TS_ENDT {{NNEXRAD, 
>"(/p...BUF|/pN0R(BGM|ENX|OKX|TYX))"},{NONE, 
>"SIG=2223fd5d89edf9245667d2893443fc35"}} -> 20051208044729.563 TS_ENDT 
>{{NNEXRAD,  "(/p...BUF|/pN0R(BGM|ENX|OKX|TYX))"}}
>Dec 08 04:48:30 ldm vortex(feed)[20370]: up6.c:339: Starting Up(6.0.15/6): 
>20051208044729.563 TS_ENDT {{NNEXRAD, 
>"(/p...BUF|/pN0R(BGM|ENX|OKX|TYX))"}}
>Dec 08 04:48:30 ldm vortex(feed)[20370]: topo:  vortex.esc.brockport.edu 
>NNEXRAD
>Dec 08 04:50:41 ldm vortex(feed)[20370]: up6.c:292: nullproc_6() failure 
>to vortex.esc.brockport.edu: RPC: Unable to receive; errno = Connection 
>reset by peer
>Dec 08 04:50:41 ldm rpc.ldmd[8866]: child 20370 exited with status 5
>Dec 08 04:50:41 ldm vortex[20397]: ldm6_server.c:140: Restricting request: 
>20051208044940.957 TS_ENDT {{NNEXRAD, 
>"(/p...BUF|/pN0R(BGM|ENX|OKX|TYX))"},{NONE, 
>"SIG=e24adfc7a8646982a91f432c85255985"}} -> 20051208044940.957 TS_ENDT 
>{{NNEXRAD,  "(/p...BUF|/pN0R(BGM|ENX|OKX|TYX))"}}
>Dec 08 04:50:41 ldm vortex(feed)[20397]: up6.c:339: Starting Up(6.0.15/6): 
>20051208044940.957 TS_ENDT {{NNEXRAD, 
>"(/p...BUF|/pN0R(BGM|ENX|OKX|TYX))"}}
>Dec 08 04:50:41 ldm vortex(feed)[20397]: topo:  vortex.esc.brockport.edu 
>NNEXRAD
>Dec 08 05:07:49 ldm usgodae3[8874]: ERROR: requester6.c:206: Connection to 
>upstream LDM closed
>Dec 08 05:07:49 ldm usgodae3[8874]: Desired product class: 
>20051208050607.927 TS_ENDT {{FNMOC,  ".*"}}
>Dec 08 05:07:49 ldm usgodae3[8874]: Connected to upstream LDM-6
>Dec 08 05:07:49 ldm usgodae3[8874]: Upstream LDM is willing to feed
>Dec 08 05:24:22 ldm thelma[8869]: assertion "n > 0" failed: file "pq.c", 
>line 2187
>Dec 08 05:24:28 ldm rpc.ldmd[8866]: child 8869 terminated by signal 6
>Dec 08 05:24:28 ldm rpc.ldmd[8866]: Killing (SIGINT) process group
>Dec 08 05:24:28 ldm rpc.ldmd[8866]: SIGINT
>Dec 08 05:24:28 ldm rtstats[8868]: Interrupt
>Dec 08 05:24:28 ldm thelma[8870]: SIGINT
>Dec 08 05:24:28 ldm thelma[8871]: SIGINT
>Dec 08 05:24:28 ldm atm[8872]: SIGINT
>Dec 08 05:24:28 ldm idd[8873]: SIGINT
>Dec 08 05:24:28 ldm usgodae3[8874]: SIGINT
>Dec 08 05:24:28 ldm unidata2[8875]: SIGINT
>Dec 08 05:24:28 ldm unidata2[8876]: SIGINT
>Dec 08 05:24:28 ldm striker2[8877]: SIGINT
>Dec 08 05:24:28 ldm atm[8878]: SIGINT
>Dec 08 05:24:28 ldm ldm[8879]: SIGINT
>Dec 08 05:24:28 ldm atm[8880]: SIGINT
>Dec 08 05:24:28 ldm bob(feed)[8886]: SIGINT
>Dec 08 05:24:28 ldm ls2(feed)[27559]: SIGINT
>Dec 08 05:24:28 ldm omega(feed)[16950]: SIGINT
>Dec 08 05:24:28 ldm windfall(feed)[16964]: SIGINT
>Dec 08 05:24:28 ldm windfall(feed)[16965]: SIGINT
>Dec 08 05:24:28 ldm windfall(feed)[16966]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[16982]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[16983]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[16984]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[16985]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[16986]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[16987]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[16988]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[16989]: SIGINT
>Dec 08 05:24:28 ldm aeolus(feed)[16997]: SIGINT
>Dec 08 05:24:28 ldm aeolus(feed)[17000]: SIGINT
>Dec 08 05:24:28 ldm aeolus(feed)[17001]: SIGINT
>Dec 08 05:24:28 ldm 24.115.98.159.res-cmts.ha(feed)[17006]: SIGINT
>Dec 08 05:24:28 ldm 24.115.98.159.res-cmts.ha(feed)[17007]: SIGINT
>Dec 08 05:24:28 ldm 24.115.98.159.res-cmts.ha(feed)[17008]: SIGINT
>Dec 08 05:24:28 ldm 24.115.98.159.res-cmts.ha(feed)[17009]: SIGINT
>Dec 08 05:24:28 ldm vortex(feed)[17021]: SIGINT
>Dec 08 05:24:28 ldm gusher(feed)[17024]: SIGINT
>Dec 08 05:24:28 ldm thunder(feed)[17031]: SIGINT
>Dec 08 05:24:28 ldm thunder(feed)[17044]: SIGINT
>Dec 08 05:24:28 ldm sysu1[17059]: SIGINT
>Dec 08 05:24:28 ldm wxmcidas(feed)[17062]: SIGINT
>Dec 08 05:24:28 ldm measol(feed)[17133]: SIGINT
>Dec 08 05:24:28 ldm newpsn(feed)[17147]: SIGINT
>Dec 08 05:24:28 ldm vortex(feed)[20397]: SIGINT
>Dec 08 05:24:28 ldm rtstats[8868]: Exiting
>Dec 08 05:24:28 ldm sysu1[17059]: down6.c:511: Discarding incomplete 
>product:     9222 20051208052426.829     WSI 8803 
>NEX/BUF/BREF2/200512080520
>Dec 08 05:24:28 ldm rpc.ldmd[8866]: Terminating process group

Cheers,

Tom
- --
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.

------- End of Original Message