[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LDM Status Report Question



> Date: Wed, 10 Jan 2001 09:17:49 -0500
> From: "Allan Darling" <address@hidden>
> Organization: DOC/NOAA/NWS - National Weather Service
> To: address@hidden, James Fenix <address@hidden>,
>    William Brockman <address@hidden>
> Subject: LDM Status Report Question

Hi Allan,

> I've experienced a problem using LDM and was wondering if you might have
> any insight or comments about how to resolve, or identify the problem.
> I'm using LDM (v 5.0.8) to distribute files to three receivers within
> the NWS WAN.  We recently experienced a problem where one of the three
> systems stopped receiving files while the other two system continued to
> receive files.  The affected system could ping, but not ldmping, our
> system.  We could ping and ldmping their system.  This condition
> persisted even after they restarted their LDM. After we stopped and
> started LDM at our end, the problem went away.  I ran an ldmadmin check,
> the results of which are below.
 ...
> 'NULLPROC error' message occurred 28 time(s).
>         Last one at:  Jan 10 06:02:49
>         For 205.165.7.125 it happened 4 time(s).
>         For maul.wrh.noaa.gov it happened 24 time(s).

First, I don't think the "ldmadmin check" output is very helpful in
this case.  The ldmping sends a NULLPROC remote procedure call, and the
above merely indicates something went wrong with trying to return a
result acknowledging the NULLPROC call.

This sounds like a DNS (domain name service) problem, but there could
be other causes.  It would help to see the actual ldmping output from
maul.wrh.noaa.gov to see how far it got up the protocol stack.  That
is, was the "State" it reported "NAMED" (in which case the DNS lookup
failed) or was it "SVC_UNAVAIL" (in which case it was contacting port
388 but the LDM was running on a different port, possibly due to
starting it up as some user other than "ldm" or not having run "make
install_setuids" as root).  Do you remember what the ldmping "State"
was when it failed?

> I'd like to know if you have seen this problem before and if so do you
> have a resolution.  I'm hoping an upgrade to v 5.1.2 will address this.

I can't recall seeing this specific symptom, but DNS problems are
fairly common (and there's nothing the LDM can do about them).
Another possible problem would be the upstream host tgsv not having an
"ALLOW" entry in its ldmd.conf to allow the downstream node
maul.wrh.noaa.gov to ldmping it.  Or having such an ALLOW entry, but
DNS not resolving that name to the same IP number that the ldmping
request came from.  If someone recently changed the IP number of
either host and the old number was still cached in the DNS server,
that would also cause this symptom.

I'm CC:ing Anne Wilson also, in case she has a better idea about what
might cause this problem.  In the future, you might want to send
questions like this to address@hidden instead of me
specifically, in case I'm away from my email.

> I'm also very interested in how we might monitor, from our end, the
> successful transfer of files to the remote systems. Any assistance you
> can provide would be very much appreciated.

You can monitor the successful transfer of files with the "notifyme"
command running on the upstream host asking the downstream host to
send notifications of each product.  Or you can set up a cron job to
periodically run notifyme for a little while and send you mail if it
doesn't produce any output.  The typical invocation is something
like:

  notifyme -v -l- -h <downstream_host>

where sometimes you also add a "-o xxx" argument to look back xxx
seconds in the downstream host's queue, in case it has falling behind
the data feed.

Please let us know if this helps resolve the problem or if you see it
again what the ldmping output looks like.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu