[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with difax at Cornell



>
> Date: Mon, 6 Mar 2000 13:46:53 -0500
> From: Collin Daly <address@hidden>
> To: Robb Kambic <address@hidden>
> Subject: Fw: Problems with difax at Cornell
>
> Robb,
> Here is a forward of William Noon's email.  Can you see where a problem may
> be occuring?  If not what logs should I look into aor steps should I take to
> diagnose this?
> Thanks,
> Collin
>
> > Colin -- The basic problem is that the connection to lighting is lost
> > but doesn't reconnect.  I thought it needed an ldm restart on my end to
> > get things going but the logs indicate that it did restart briefly after
> > a couple of days outage.  The reconnection only lasted a few hours.
> >
> > LDM version 5.0.9.
> > Running on intel linux kernel 2.0.38.
> >
> > Appended is the ldmd.log with annotations.  Can you send the lightning
> > logs for these times?
> >
> > Thanks -- Bill Noon
> > Northeast Regional Climate Center
> > Cornell University
> >
> > This is where we lose the connection....
> >
> >
> > Mar 03 13:06:42 nrcc2 pqexpire[8235]: > Recycled  19025.790 kb/hr
>  9440.811 prods per hour)
> > Mar 03 13:11:50 nrcc2 pqexpire[8235]: > Recycled  19022.379 kb/hr
>  9441.873 prods per hour)
> > Mar 03 13:16:55 nrcc2 lightning[8239]: Timed out after 720 seconds
> inactivity
> > Mar 03 13:16:55 nrcc2 lightning[8239]: Disconnect
> > Mar 03 13:17:00 nrcc2 pqexpire[8235]: > Recycled  19018.394 kb/hr
>  9441.141 prods per hour)
> > Mar 03 13:17:25 nrcc2 lightning[8239]: run_requester: 20000303130447.889
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 03 13:18:25 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): 13:
> gethostbyname(lightning.alden.com): lookup Timed out
> > Mar 03 13:18:55 nrcc2 lightning[8239]: run_requester: 20000303130447.889
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 03 13:19:55 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): 13:
> gethostbyname(lightning.alden.com): lookup Timed out
> >
> > This continues for some time.....
> >
> > Mar 03 13:48:04 nrcc2 lightning[8239]: run_requester: 20000303130447.889
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 03 13:48:34 nrcc2 lightning[8239]: h_clnt_call: lightning.alden.com:
> FEEDME: time elapsed  30.441091
> > Mar 03 13:48:34 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 03 13:48:34 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 03 13:48:34 nrcc2 lightning[8239]: Disconnect
> > Mar 03 13:49:05 nrcc2 lightning[8239]: run_requester: 20000303130447.889
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 03 13:49:06 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 03 13:49:06 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 03 13:49:06 nrcc2 lightning[8239]: Disconnect
> > Mar 03 13:49:37 nrcc2 lightning[8239]: run_requester: 20000303130447.889
> TS_ENDT {{DIFAX,  ".*"}}
> >
> > A couple of days pass....
> >
> > Mar 06 06:33:24 nrcc2 lightning[8239]: run_requester: 20000306053319.602
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 06:33:24 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 06:33:24 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 06:33:24 nrcc2 lightning[8239]: Disconnect
> > Mar 06 06:33:56 nrcc2 lightning[8239]: run_requester: 20000306053354.766
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 06:33:56 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 06:33:56 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 06:33:56 nrcc2 lightning[8239]: Disconnect
> > Mar 06 06:34:28 nrcc2 lightning[8239]: run_requester: 20000306053426.521
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 06:35:12 nrcc2 lightning[8239]: h_clnt_call: lightning.alden.com:
> FEEDME: time elapsed  44.020853
> > Mar 06 06:35:12 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 06:35:12 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 06:35:12 nrcc2 lightning[8239]: Disconnect
> > Mar 06 06:35:43 nrcc2 lightning[8239]: run_requester: 20000306053542.200
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 06:35:46 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 06:35:46 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 06:35:46 nrcc2 lightning[8239]: Disconnect
> > Mar 06 06:36:18 nrcc2 lightning[8239]: run_requester: 20000306053616.944
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 06:37:18 nrcc2 lightning[8239]: FEEDME(lightning.alden.com):
> h_clnt_create(lightning.alden.com): Timed out while creating connection
> > Mar 06 06:38:05 nrcc2 pqexpire[8235]: > Recycled   2035.557 kb/hr
>  9484.482 prods per hour)
> > Mar 06 06:38:18 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 06:38:18 nrcc2 lightning[8239]: RECLASS: 20000306053819.005 TS_ENDT
> {{DIFAX,  ".*"}}
> > Mar 06 06:38:18 nrcc2 lightning[8239]: skipped: 20000306053650.469 (88.536
> seconds)
> > Mar 06 06:39:03 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 06:39:03 nrcc2 lightning[8239]: Disconnect
> > Mar 06 06:39:33 nrcc2 lightning[8239]: run_requester: 20000306055450.875
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 06:39:54 nrcc2 lightning[8239]: h_clnt_call: lightning.alden.com:
> FEEDME: time elapsed  21.057007
> > Mar 06 06:39:54 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 06:39:55 nrcc2 lightning[8239]: afca79b2dd3d64470e9d46a1dba237be:
> never completed
> >
> > I get difax for a few hours.....
> >
> >
> > Mar 06 09:46:40 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 09:46:40 nrcc2 lightning[8239]: Disconnect
> > Mar 06 09:47:10 nrcc2 lightning[8239]: run_requester: 20000306093623.724
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 09:48:10 nrcc2 lightning[8239]: FEEDME(lightning.alden.com):
> h_clnt_create(lightning.alden.com): Timed out while creating connection
> > Mar 06 09:49:10 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 09:49:10 nrcc2 lightning[8239]: 713800e7077984913fb3cb7281a160eb:
> never completed
> > Mar 06 09:50:59 nrcc2 pqexpire[8235]: > Recycled   2137.992 kb/hr
>  9446.082 prods per hour)
> > Mar 06 09:56:10 nrcc2 pqexpire[8235]: > Recycled   2152.150 kb/hr
>  9447.103 prods per hour)
> > Mar 06 10:01:20 nrcc2 pqexpire[8235]: > Recycled   2173.376 kb/hr
>  9450.347 prods per hour)
> > Mar 06 10:06:25 nrcc2 pqexpire[8235]: > Recycled   2177.566 kb/hr
>  9449.671 prods per hour)
> >
> > Lost connection again....
> >
> > Mar 06 10:07:54 nrcc2 lightning[8239]: Timed out after 720 seconds
> inactivity
> > Mar 06 10:07:54 nrcc2 lightning[8239]: Disconnect
> > Mar 06 10:08:25 nrcc2 lightning[8239]: run_requester: 20000306095532.072
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 10:08:25 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 10:08:25 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 10:08:25 nrcc2 lightning[8239]: Disconnect
> > Mar 06 10:08:55 nrcc2 lightning[8239]: run_requester: 20000306095532.072
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 10:08:56 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 10:08:56 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 10:08:56 nrcc2 lightning[8239]: Disconnect
> > Mar 06 10:09:26 nrcc2 lightning[8239]: run_requester: 20000306095532.072
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 10:09:26 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): OK
> > Mar 06 10:09:26 nrcc2 lightning[8239]: Connection reset by peer
> > Mar 06 10:09:26 nrcc2 lightning[8239]: Disconnect
> > Mar 06 10:09:57 nrcc2 lightning[8239]: run_requester: 20000306095532.072
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 10:10:57 nrcc2 lightning[8239]: FEEDME(lightning.alden.com):
> h_clnt_create(lightning.alden.com): Timed out while creating connection
> >
> > This is where I restarted the ldm.....
> >
> >
> > Mar 06 13:30:08 nrcc2 rpc.ldmd[8234]: Exiting
> > Mar 06 13:30:08 nrcc2 rpc.ldmd[8234]: Terminating process group
> > Mar 06 13:30:08 nrcc2 lightning[8239]: Exiting
> > Mar 06 13:30:08 nrcc2 snow[8238]: Exiting
> > Mar 06 13:30:08 nrcc2 pqexpire[8235]: Exiting
> > Mar 06 13:30:08 nrcc2 pqexpire[8235]: > Up since:      20000224220314.068
> > Mar 06 13:30:08 nrcc2 pqexpire[8235]: > Queue usage (bytes):75001856
> > Mar 06 13:30:08 nrcc2 pqexpire[8235]: >          (nregions):   26299
> > Mar 06 13:30:08 nrcc2 pqexpire[8235]: > nbytes recycle:    593335680
>  2268.236 kb/hr)
> > Mar 06 13:30:08 nrcc2 pqexpire[8235]: > nprods deleted:      2405839
>  9417.904 per hour)
> > Mar 06 13:30:08 nrcc2 pqexpire[8235]: > First deleted: 20000224205734.052
> > Mar 06 13:30:08 nrcc2 pqexpire[8235]: > Last  deleted: 20000306122447.554
> > Mar 06 13:30:08 nrcc2 pqbinstats[8236]: Exiting
> > Mar 06 13:30:08 nrcc2 pqact[8237]: Exiting
> > Mar 06 13:30:26 nrcc2 rpc.ldmd[17639]: Starting Up (built: Dec 30 1999
> 14:15:56)
> > Mar 06 13:30:26 nrcc2 snow[17643]: run_requester: Starting Up:
> snow.cit.cornell.edu
> > Mar 06 13:30:26 nrcc2 snow[17643]: run_requester: 20000306133008.263
> TS_ENDT {{WMO,  "(^[A-OQ-X])|(^[YZ].[^AHIJRU])"}}
> > Mar 06 13:30:26 nrcc2 lightning[17644]: run_requester: Starting Up:
> lightning.alden.com
> > Mar 06 13:30:26 nrcc2 pqexpire[17640]: Starting Up
> > Mar 06 13:30:26 nrcc2 pqact[17642]: Starting Up
> > Mar 06 13:30:26 nrcc2 pqbinstats[17641]: Starting Up (17639)
> > Mar 06 13:30:26 nrcc2 snow[17643]: FEEDME(snow.cit.cornell.edu): OK
> > Mar 06 13:30:27 nrcc2 pqexpire[17640]: > Recycled   6370.283 kb/hr
>  4578.485 prods per hour)
> > Mar 06 13:30:28 nrcc2 localhost[17653]: Connection from localhost
> > Mar 06 13:30:28 nrcc2 localhost[17653]: Connection reset by peer
> > Mar 06 13:30:28 nrcc2 localhost[17653]: Exiting
> > Mar 06 13:30:29 nrcc2 lightning[17644]: run_requester: 20000306123026.259
> TS_ENDT {{DIFAX,  ".*"}}
> > Mar 06 13:30:29 nrcc2 lightning[17644]: FEEDME(lightning.alden.com): OK
> > Mar 06 13:35:33 nrcc2 pqexpire[17640]: > Recycled  11481.058 kb/hr
>  8811.615 prods per hour)
> >

Hi Collin,

There may be more than one problem here.  First, we (Robb and I) suggest that 
you
address a possible problem close to you, 'networkologically' speaking.  The log
entries

> Mar 03 13:18:25 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): 13:
gethostbyname(lightning.alden.com): lookup Timed out
> Mar 03 13:18:55 nrcc2 lightning[8239]: run_requester: 20000303130447.889
TS_ENDT {{DIFAX,  ".*"}}
> Mar 03 13:19:55 nrcc2 lightning[8239]: FEEDME(lightning.alden.com): 13:
gethostbyname(lightning.alden.com): lookup Timed out

indicate that there's at least an occasional problem in your name server in
mapping lightning.alden.com to it's IP address.  This should be checked out with
your system administrator.  Depending on your set up, adding an entry to
/etc/hosts might fix the problem.

Please check this out.  After that, if the problem persists we can investigate
whether there are other network problems.  The log entries do suggest that the
network connection is flakey at times.  FYI regarding network problems, take a
look at:
http://www.unidata.ucar.edu/packages/ldm/troubleshooting/networkTrouble.html

Anne

--
***************************************************
Anne Wilson                     UCAR Unidata Program
address@hidden                  P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************