[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ldmping does not work but data flows fine



Chirag,

>Date: Wed, 23 Nov 2005 15:03:48 -0600
>From: "Shukla, Chirag" <address@hidden>
>Organization: San Diego State University
>To: "Steve Emmerson" <address@hidden>
>Subject: Ldmping does not work but data flows fine

The above message contained the following:

> We have a machine called 'unidata.jacks.local' that feeds
> 'ae206-06.jacks.local' and 'ae206-03.jacks.local' machine. For a few
> minutes ae206-06 machine did not receive updated data from 'unidata'
> machine. I tried to `ldmping ae206-06` and saw that LDM on ae206-06 ws
> not responding!
> 
> `ldmping ae206-06.jacks.local` from 'unidata' resulted in the following:
> unidata /data> ldmping ae206-06.jacks.local
> Nov 23 20:29:02 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:29:12 ERROR: SVC_UNAVAIL  10.002665    0
> ae206-06.jacks.local  h_clnt_create(ae206-06.jacks.local): Timed out
> while creating connection
> Nov 23 20:29:37 ERROR:  ADDRESSED   0.000002    0   ae206-06.jacks.local
> h_clnt_create(ae206-06.jacks.local): Timed out while creating connection
> Nov 23 20:30:12 ERROR:      NAMED   9.998855    0   ae206-06.jacks.local
> can't contact portmapper: RPC: Timed out

The above indicates that a downstream LDM on host unidata couldn't
connect to an upstream LDM on host ae206-06.  The reason is unclear.

Executing this command

    rpcinfo -n 388 -t ae206-06.jacks.local 300029 6

(or something similar) on host unidata should reveal the problem.

> unidata raws/data> ldmping ae206-03.jacks.local
> Nov 23 20:42:32 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:42:32 INFO: RESPONDING   0.015185  388   ae206-03.jacks.local
> 
> 
> >From ae206-03 >>
> [ldm@ae206-03 /]$ ldmping unidata.jacks.local
> Nov 23 20:32:57 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:32:57 INFO: RESPONDING   0.010462  388   unidata.jacks.local
> 
> [ldm@ae206-03 raws]$ ldmping ae206-06.jacks.local
> Nov 23 20:43:10 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> 'Nov 23 20:43:10 INFO: RESPONDING   0.002772  388   ae206-06.jacks.local
> 
> 
> >From ae206-06 >>
> [ldm@ae206-06 raws]$ ldmping unidata.jacks.local
> Nov 23 20:44:15 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:44:15 INFO: RESPONDING   0.012841  388   unidata.jacks.local
> 
> [ldm@ae206-06 raws]$ ldmping ae206-03.jacks.local
> Nov 23 20:44:07 INFO:      State    Elapsed Port   Remote_Host
> rpc_stat
> Nov 23 20:44:07 INFO: RESPONDING   0.004253  388   ae206-03.jacks.local
> 
> 
> 
> 
> I can `ping` and `host` or `nslookup` one another just fine:
> unidata /home/ldm> ping ae206-06
> PING ae206-06.jacks.local (137.216.177.37) 56(84) bytes of data.
> 64 bytes from ae206-06.jacks.local (137.216.177.37): icmp_seq=1 ttl=63
> time=3.96 ms
> 64 bytes from ae206-06.jacks.local (137.216.177.37): icmp_seq=2 ttl=63
> time=4.33 ms
> 
> [ldm@ae206-06 raws]$ ping unidata.jacks.local
> PING unidata.jacks.local (137.216.132.176) 56(84) bytes of data.
> 64 bytes from unidata.jacks.local (137.216.132.176): icmp_seq=0 ttl=63
> time=4.14 ms
> 64 bytes from unidata.jacks.local (137.216.132.176): icmp_seq=1 ttl=63
> time=4.85 ms
> 
> These are the logs:
> unidata /data> cat ~/logs/ldmd.log | grep ae206-06
> Nov 23 20:15:43 unidata ae206-06[3026] NOTE: Data-product with signature
> df17b19bdbab14359eb205a7c5ec4f8e wasn't found in product-queue
> Nov 23 20:15:43 unidata ae206-06(feed)[3026] NOTE: Starting Up(6.4.2/6):
> 20051123201034.078 TS_ENDT {{ANY,  ".*"}}, Primary
> Nov 23 20:15:43 unidata ae206-06(feed)[3026] NOTE: topo:
> ae206-06.jacks.local {{ANY, (.*)}}
> Nov 23 20:15:44 unidata ae206-06[3027] NOTE: Data-product with signature
> 1e0c309abba55a19832b53bdce52901e wasn't found in product-queue
> Nov 23 20:15:44 unidata ae206-06(feed)[3027] NOTE: Starting Up(6.4.2/6):
> 20051123193310.368 TS_ENDT {{CONDUIT,  "MT.(eta|nam)"}}, Primary
> Nov 23 20:15:44 unidata ae206-06(feed)[3027] NOTE: topo:
> ae206-06.jacks.local {{CONDUIT, (.*)}}

Because a downstream LDM on host ae206-06 requested data-products of
type ANY/".*", it's unnecessary for another downstream LDM on that host
to also request data-products of type CONDUIT/".*".  Doing so, will
merely increase your bandwith utilization without any benefit.

> But seems to be something going on here on ae206-06
> [ldm@ae206-06 raws]$ cat ~/logs/ldmd.log | grep unidata.jacks.local
> Nov 23 20:05:50 ae206-06 unidata[12500] NOTE: Starting Up(6.4.2):
> unidata.jacks.local:388 20051123190550.479 TS_ENDT {{ANY,  ".*"}}
> Nov 23 20:05:50 ae206-06 unidata[12501] NOTE: Starting Up(6.4.2):
> unidata.jacks.local:388 20051123190550.482 TS_ENDT {{CONDUIT,
> "MT.(eta|nam)"}}
> Nov 23 20:05:50 ae206-06 unidata[12500] NOTE: Upstream LDM-6 on
> unidata.jacks.local is willing to be a primary feeder
> Nov 23 20:05:50 ae206-06 unidata[12501] NOTE: Upstream LDM-6 on
> unidata.jacks.local is willing to be a primary feeder
> Nov 23 20:15:36 ae206-06 unidata[12500] ERROR: Terminating due to LDM
> failure; nullproc_6 failure to unidata.jacks.local; RPC: Unable to
> receive; errno = Connection reset by peer
> Nov 23 20:15:37 ae206-06 unidata[12500] ERROR: Terminating due to LDM
> failure; Couldn't connect to LDM on unidata.jacks.local using either
> port 388 or portmapper; : RPC: Program not registered
> Nov 23 20:15:37 ae206-06 unidata[12501] ERROR: Terminating due to LDM
> failure; Couldn't connect to LDM on unidata.jacks.local using either
> port 388 or portmapper; : RPC: Program not registered
> Nov 23 20:15:39 ae206-06 unidata[12500] ERROR: Terminating due to LDM
> failure; Couldn't connect to LDM on unidata.jacks.local using either
> port 388 or portmapper; : RPC: Program not registered
> Nov 23 20:15:39 ae206-06 unidata[12501] ERROR: Terminating due to LDM
> failure; Couldn't connect to LDM on unidata.jacks.local using either
> port 388 or portmapper; : RPC: Program not registered
> Nov 23 20:15:42 ae206-06 unidata[12500] NOTE: Upstream LDM-6 on
> unidata.jacks.local is willing to be a primary feeder
> Nov 23 20:15:43 ae206-06 unidata[12501] NOTE: Upstream LDM-6 on
> unidata.jacks.local is willing to be a primary feeder
> 
> Why am I not able to `ldmping ae206-06.jacks.local`? There has been no
> change made to any firewalls, hardware or software...except that FC4 was
> updated. Despite ldmping not working, ae206-06 now gets data just fine.

Execute that rpcinfo(1) command on host ae206-06.  What does it output?

> Unidata uses: 
> unidata /home/ldm> uname -a
> Linux unidata 2.4.21-99-smp4G #1 SMP Wed Sep 24 14:13:20 UTC 2003 i686
> i686 i386 GNU/Linux
> 
> Ae206-06 uses:
> [gempak@ae206-06 raws]$ uname -a
> Linux ae206-06.jacks.local 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT
> 2005 i686 i686 i386 GNU/Linux
> 
> 
> >From unidata>>
> Traceroute'ing to ae206-06 or ae206-03 does not result in anything.
> Probably this could be a firewall issue at our end.
> 
> Is there a red flag somewhere?
> 
> Thanks.
> 
> Sincerely,
> Chirag Shukla
> South Dakota State University

Regards,
Steve Emmerson