[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: problems with motherlode and sunshine



William Noon wrote:
> 
> Is anyone else having a problem keeping a connection with motherlode and
> sunshine?  Overnight we lost about a half hour's worth of data and
> at 13Z or so we started getting disconnects to both sunshine and motherlode.
> 
> Some data has been trickling in but we are lagging by about an hour now...
> 
> Traceroutes and pings to both sites seem fine.
> 
> --Bill Noon
> Northeast Regional Climate Center
> Cornell University

Hi Bill,

I looked at the logs on motherlode.  The logs were loaded with messages
about your host, snow, like the following:

Jul 01 23:00:46 motherlode.ucar.edu snow[17156]: Connection from
snow.cit.cornell.edu
Jul 01 23:00:46 motherlode.ucar.edu snow(feed)[17156]: Starting Up:
20010701215240.249 TS_ENDT {{FSL2,
".*"},{WMO,  ".*"}}
Jul 01 23:00:47 motherlode.ucar.edu snow(feed)[17156]: topo: 
snow.cit.cornell.edu FSL2|WMO
Jul 01 23:00:47 motherlode.ucar.edu snow(feed)[17156]: RECLASS:
20010701220048.328 TS_ENDT {{FSL2,  ".*"},{WMO,  ".*"}}
Jul 01 23:10:25 motherlode.ucar.edu snow(feed)[17156]: h_clnt_call:
snow.cit.cornell.edu: BLKDATA: time
elapsed  30.416736
Jul 01 23:13:40 motherlode.ucar.edu snow(feed)[17156]: HVJI85 ECMF
011200 /mECMWF_199: RPC: Server can't decode arguments (11)
Jul 01 23:13:40 motherlode.ucar.edu snow(feed)[17156]: pq_sequence
failed: I/O error (errno = 5)
Jul 01 23:13:40 motherlode.ucar.edu snow(feed)[17156]: Exiting

There was only one other site for which these messages occurs.  This
site is in Costa Rica, and we have been working with them regarding bad
connectivity.  This, coupled with the fact that nobody else reported
having such problems, makes me thing the problem was at your site.

In looking  at the logs there were some connectivity problems on 6/29
around 2:30Z.  

Jun 29 02:30:29 motherlode.ucar.edu snow(feed)[16108]: RECLASS:
20010629013050.608 TS_ENDT {{FSL2,  ".*"},{WMO,  ".*"}}
Jun 29 03:17:06 motherlode.ucar.edu snow(feed)[16108]: pq_sequence
failed: I/O error (errno = 5)
Jun 29 03:17:06 motherlode.ucar.edu snow(feed)[16108]: Exiting

[A seven hour disconnect???]

Jun 29 11:27:11 motherlode.ucar.edu snow[17531]: Connection from
snow.cit.cornell.edu
Jun 29 11:27:11 motherlode.ucar.edu snow(feed)[17531]: Starting Up:
20010629102104.040 TS_ENDT {{FSL2,
".*"},{WMO,  ".*"}}
Jun 29 11:27:12 motherlode.ucar.edu snow(feed)[17531]: topo: 
snow.cit.cornell.edu FSL2|WMO
Jun 29 11:27:12 motherlode.ucar.edu snow(feed)[17531]: FZPN26 KWBC
291020 /pOFFPZ6: RPC: Unable to receive (4)
Jun 29 11:27:12 motherlode.ucar.edu snow(feed)[17531]: pq_sequence
failed: I/O error (errno = 5)
Jun 29 11:27:12 motherlode.ucar.edu snow(feed)[17531]: Exiting

Jun 29 11:33:09 motherlode.ucar.edu snow[18019]: Connection from
snow.cit.cornell.edu
Jun 29 11:33:09 motherlode.ucar.edu snow(feed)[18019]: Starting Up:
20010629102746.556 TS_ENDT {{FSL2,
".*"},{WMO,  ".*"}}
Jun 29 11:33:09 motherlode.ucar.edu snow(feed)[18019]: topo: 
snow.cit.cornell.edu FSL2|WMO
Jun 29 11:33:10 motherlode.ucar.edu snow(feed)[18019]: RECLASS:
20010629103331.848 TS_ENDT {{FSL2,  ".*"},{WMO,  ".*"}}

Then, around 14Z things started failing:

Jun 29 14:02:23 motherlode.ucar.edu snow(feed)[18019]: h_clnt_call:
snow.cit.cornell.edu: BLKDATA: time
elapsed  33.885993
Jun 29 14:06:01 motherlode.ucar.edu snow(feed)[18019]: pq_sequence
failed: I/O error (errno = 5)
Jun 29 14:06:01 motherlode.ucar.edu snow(feed)[18019]: Exiting

Jun 29 14:06:31 motherlode.ucar.edu snow[2814]: Connection from
snow.cit.cornell.edu
Jun 29 14:06:31 motherlode.ucar.edu snow(feed)[2814]: Starting Up:
20010629140403.619 TS_ENDT {{FSL2,  ".*"},{WMO,  ".*"}}
Jun 29 14:06:32 motherlode.ucar.edu snow(feed)[2814]: topo: 
snow.cit.cornell.edu FSL2|WMO
Jun 29 14:07:32 motherlode.ucar.edu snow(feed)[2814]: pq_sequence
failed: I/O error (errno = 5)
Jun 29 14:07:32 motherlode.ucar.edu snow(feed)[2814]: Exiting

Jun 29 14:08:10 motherlode.ucar.edu snow[3002]: Connection from
snow.cit.cornell.edu
Jun 29 14:08:10 motherlode.ucar.edu snow(feed)[3002]: Starting Up:
20010629140403.619 TS_ENDT {{FSL2,  ".*"},{WMO,  ".*"}}
Jun 29 14:08:11 motherlode.ucar.edu snow(feed)[3002]: topo: 
snow.cit.cornell.edu FSL2|WMO
Jun 29 14:08:44 motherlode.ucar.edu snow(feed)[3002]: h_clnt_call:
snow.cit.cornell.edu: BLKDATA: time elapsed  33.328610
Jun 29 14:12:02 motherlode.ucar.edu snow(feed)[3002]: h_clnt_call:
snow.cit.cornell.edu: BLKDATA: time elapsed  30.584037
Jun 29 14:14:32 motherlode.ucar.edu snow(feed)[3002]: pq_sequence
failed: I/O error (errno = 5)
Jun 29 14:14:32 motherlode.ucar.edu snow(feed)[3002]: Exiting

Jun 29 14:18:36 motherlode.ucar.edu snow[4038]: Connection from
snow.cit.cornell.edu
Jun 29 14:18:36 motherlode.ucar.edu snow(feed)[4038]: Starting Up:
20010629140741.491 TS_ENDT {{FSL2,  ".*"},{WMO,  ".*"}}
Jun 29 14:18:37 motherlode.ucar.edu snow(feed)[4038]: topo: 
snow.cit.cornell.edu FSL2|WMO
Jun 29 14:19:16 motherlode.ucar.edu snow(feed)[4038]: ZEGZ98 KRHA 291303
/mNWS_151: RPC: Server can't decode arguments (11)

After this the "can't decode arguments" problem continues up through
about 17:50Z today.  After that there are just a RECLASS and a "time
elapsed" message, but otherwise it looks as though the problem has gone
away.

Did something happen on your campus?

Anne

>From address@hidden Tue Jul  3 05:56:13 2001
>cc: Anne Wilson <address@hidden>, <address@hidden>,
>   <address@hidden>, <address@hidden>
>Subject: Re: problems with motherlode and sunshine 

On Mon, 2 Jul 2001, William Noon wrote:

> Anne -- I think the networking folks found the problem.  It was a bad
> cross connect wire in one of the closets.  I hope.  We should be back
> to normal now.

I sort of doubt it.  At Brockport we started losing data on the HDS feed
from Cornell about 335Z this AM, and lost connectivity to Cornell entirely
at 745Z.

Tom
------------------------------------------------------------------------------
Tom McDermott                           Email: address@hidden
Systems Administrator                   Phone: (716) 395-5718
Earth Sciences Dept.                    Fax: (716) 395-2416
SUNY College at Brockport

>From address@hidden Tue Jul  3 09:53:19 2001
>To: Tom McDermott <address@hidden>
>cc: William Noon <address@hidden>,
>   Anne Wilson <address@hidden>, address@hidden,
>   address@hidden, address@hidden,
>   address@hidden
>Subject: Re: problems with motherlode and sunshine 

Tom -- I was too optimistic.  We fell off the net again and only now
did Network Resources get us back on line.  I will not say that this
is a final fix.  Just that we are on for now.

--Bill Noon
Northeast Regional Climate Center
Cornell University

> 
> On Mon, 2 Jul 2001, William Noon wrote:
> 
> > Anne -- I think the networking folks found the problem.  It was a bad
> > cross connect wire in one of the closets.  I hope.  We should be back
> > to normal now.
> 
> I sort of doubt it.  At Brockport we started losing data on the HDS feed
> from Cornell about 335Z this AM, and lost connectivity to Cornell entirely
> at 745Z.
> 
> Tom
> ------------------------------------------------------------------------------
> Tom McDermott                         Email: address@hidden
> Systems Administrator                 Phone: (716) 395-5718
> Earth Sciences Dept.                  Fax: (716) 395-2416
> SUNY College at Brockport