[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000117: LDM Data Problems/Aqua Outage?



On Mon, 17 Jan 2000, Unidata Support wrote:

> 
> ------- Forwarded Message
> 
> >To: address@hidden
> >From: "Karli Lopez (McIDAS)" <address@hidden>
> >Subject: LDM Data Problems/Aqua Outage?
> >Organization: .
> >Keywords: 200001160730.AAA12839
> 
> I've got LDM back up and running, but I'm only currently getting NLDN
> and WSI feeds.  For some reason I'm not able to properly log on to
> aqua.atmos.uah.edu (now our primary server), as you can see (Is anybody
> having the same problems with the server?):
> ---------------------------------------------------------------------
> breeze 27% cat ldmd.log
> Jan 16 04:34:36 5Q:breeze rpc.ldmd[9727]: Starting Up (built: Jan  9
> 2000 02:16:
> 05)
> Jan 16 04:34:36 5Q:breeze pqexpire[9583]: Starting Up
> Jan 16 04:34:36 5Q:breeze pqexpire[9583]: > Recycled   2423.362 kb/hr
> (   256.38
> 1 prods per hour)
> Jan 16 04:34:36 5Q:breeze pqact[9772]: Starting Up
> Jan 16 04:34:36 5Q:breeze pqbinstats[9629]: Starting Up (9727)
> Jan 16 04:34:36 5Q:breeze aqua[9693]: run_requester: Starting Up:
> aqua.atmos.uah
> .edu
> Jan 16 04:34:36 5Q:breeze striker[9681]: run_requester: Starting Up:
> striker.atm
> os.albany.edu
> Jan 16 04:34:36 5Q:breeze aqua[9693]: run_requester: 20000116033436.535
> TS_ENDT
> {{UNIDATA,  ".*"},{FSL2|MCIDAS,  ".*"}}
> Jan 16 04:34:36 5Q:breeze striker[9681]: run_requester:
> 20000116033436.579 TS_EN
> DT {{NLDN,  ".*"}}
> Jan 16 04:34:38 5Q:breeze localhost[9715]: Connection from localhost
> Jan 16 04:34:38 5Q:breeze localhost[9715]: Connection reset by peer
> Jan 16 04:34:38 5Q:breeze localhost[9715]: Exiting
> Jan 16 04:35:11 5Q:breeze sysu1[9779]: Connection from
> sysu1.uni.wsicorp.com
> Jan 16 04:35:11 5Q:breeze sysu1[9779]: hiya: 20000116042811.562 TS_ENDT
> {{WSI,
> ".*"}}
> Jan 16 04:35:11 3Q:breeze aqua[9693]: FEEDME(aqua.atmos.uah.edu): can't
> contact
> portmapper: Timed out
> Jan 16 04:35:36 3Q:breeze striker[9681]:
> FEEDME(striker.atmos.albany.edu): h_cln
> t_create(striker.atmos.albany.edu): Timed out while creating connection


Karli,

It seem your network connection to uah/striker is not good enough to get
the packets through. First it was the hostname lookup, but now it appears
that it's the network connection.  I would contact your sysadmin and your
ISP about this problem for a short term solution.  For the long term, let
try to get a connection from sapodilla.rsmas.miami.edu or
pluto.met.fsu.edu.  DO traceroutes to both and then send the results back
to me.


> 
> ---------------------------------------------------------------------
> So I tried going to what was supposed to be our backup server but it
> seems we have no access:
> 
> ---------------------------------------------------------------------
> 
> Jan 16 04:28:24 5Q:breeze rpc.ldmd[9518]: Starting Up (built: Jan  9
> 2000 02:16:
> 05)
> Jan 16 04:28:24 5Q:breeze pqbinstats[9560]: Starting Up (9518)
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: Starting Up
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: > Recycled   3658.179 kb/hr
> (   251.25
> 9 prods per hour)
> Jan 16 04:28:24 5Q:breeze pluto[7735]: run_requester: Starting Up:
> pluto.met.fsu
> .edu
> Jan 16 04:28:24 5Q:breeze striker[9542]: run_requester: Starting Up:
> striker.atm
> os.albany.edu
> Jan 16 04:28:24 3Q:breeze rpc.ldmd[9518]: bind: 388: Address already in


This is caused by an abnormal ldm shutdown. All the ldm process have not
be killed off. I would do :

% ps -eaf | grep ldm

Make sure all the ldm processes are gone. Also do a:

% rpcinfo -p

and make sure port 388 is not in use before restart the ldm.

Robb...



> use
> Jan 16 04:28:24 5Q:breeze rpc.ldmd[9518]: Exiting
> Jan 16 04:28:24 5Q:breeze rpc.ldmd[9518]: Terminating process group
> Jan 16 04:28:24 5Q:breeze pqbinstats[9560]: Exiting
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: Exiting
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: > Up since:
> 20000116042824.791
> Jan 16 04:28:24 5Q:breeze rpc.ldmd[9518]: child 9504 terminated by
> signal 15
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: > Queue usage (bytes): 3047624
> 
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: >          (nregions):     333
> 
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: > nbytes recycle:        74544
> 
> (  3658
> .179 kb/hr)
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: > nprods deleted:            5
> 
> (   251
> .259 per hour)
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: > First deleted:
> 20000116032209.576
> Jan 16 04:28:24 5Q:breeze pqexpire[5958]: > Last  deleted:
> 20000116032321.215
> Jan 16 04:28:24 5Q:breeze rpc.ldmd[9518]: child 9354 terminated by
> signal 15
> Jan 16 04:28:24 5Q:breeze pluto[7735]: run_requester: 20000116032824.812
> 
> TS_ENDT
>  {{UNIDATA,  ".*"},{FSL2|MCIDAS,  ".*"}}
> Jan 16 04:28:24 5Q:breeze striker[9542]: run_requester:
> 20000116032824.830 TS_EN
> DT {{NLDN,  ".*"}}
> Jan 16 04:28:26 3Q:breeze pluto[7735]: FEEDME(pluto.met.fsu.edu): 7:
> Access deni
> ed by remote server
> Jan 16 04:28:46 5Q:breeze aqua[29748]: Exiting
> Jan 16 04:28:56 5Q:breeze pluto[7735]: Exiting
> Jan 16 04:29:21 5Q:breeze striker[29118]: Exiting
> Jan 16 04:29:24 3Q:breeze striker[9542]:
> FEEDME(striker.atmos.albany.edu): h_cln
> t_create(striker.atmos.albany.edu): Timed out while creating connection
> Jan 16 04:29:54 5Q:breeze striker[9542]: Exiting
> 
> ---------------------------------------------------------------------
> 
> (Also striker.atmos.albany.edu, our NLDN server, is down for the time
> being).
> Any help in getting the feeds back up will be greatly appreciated.
> 
> Karli Lopez
> 
> 
> 
> ------- End of Forwarded Message
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================