[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RSA Comm Test (Weather - JSC) LDM Troubleshooting



Jackie,

>Date: Fri, 27 Jan 2006 17:07:01 -0500
>From: "Petit, Jackie" <address@hidden>
>Organization: UCAR/Unidata
>To: "Steve Emmerson (E-mail)" <address@hidden>
>Subject: RE: RSA Comm Test (Weather - JSC) LDM Troubleshooting

The above message contained the following:

> Brice, Tim and I got together and did some troubleshooting and I was
> able to see some files that they sent.  For some reason I could only see
> type SA files (SA04 and SA11).

I'm afraid that I don't know what "SA04" and "SA11" files are.

> When he tried to send GP type files, nothing came through.

I'm afraid I don't know what "GP" files are, either.  Sorry.

In general, for a downstream LDM to be able to receive certain
data-products from an upstream LDM, the following must be true:

    1.  The upstream LDM must receive the data-products.  This can be
        verified by executing, on the upstream LDM's host, the command

            pqcat -vl- -f <<feedtype>> -p <<pattern>> -o <<offset>>

        where
            <<feedtype>>        is the feedtype of the data-products (e.g.,
                                EXP)

            <<pattern>>         is the extended regular expression for
                                the product-identifier of the data-aproducts.

            <<offset>>          Is the time-offset in seconds in which
                                to go back in the product-queue to find
                                matching data-products (e.g., 300 for 5
                                minutes).

    2.  The downstream LDM must be able to connect to the upstream LDM.
        This can be verified by executing, on the downstream host, the
        command

            ldmping -i 0 <<upstream host>>

        where <<upstream host>> is the identifier for the upstream host
        (either hostname, fully-qualified hostname, or IP address).

    3.  The downstream LDM must be allowed to receive the requested
        class of data-products from the upstream LDM (i.e., the LDM
        configuration-file on the upstream LDM must have appropriate
        entries).

These three items can be combined into executing, on the downstream
host, the single command

    notifyme -h <<upstream host>> -f <<feedtype>> -p <<pattern>> -o <<offset>>

If a downstream LDM process is unable to connect to the upstream LDM
server, then the following command can be useful in diagnosing problems:

    rpcinfo -n 388 -t <<upstream host>> 300029 6

This command attempts to contact version 6 of program 300029 (the LDM)
via a TCP connection to port 388 on host <<upstream host>>.  Because
this command is non-standard, it might be necessary to adapt it to your
system by using different options.

> They were never able to see files from us but did
> get notified of our files on the notifier.  (Brice, please elaborate.) 

Notifier?

> They had to get ready for a power outage so Brice asked if I would send
> you an E-mail to find out if having two ethernet ports could cause a
> problem with ldm.

By default, the LDM server will listen for incoming connections on all
available interfaces.  This is, usually, not a problem.  We're running
the LDM on several multi-homed computers here.

This default can be overridden via the $ip_addr variable in the file
"etc/ldmadmin-pl.conf".

> They use the workstation as a bridge/firewall between
> their LAN and ours.  He thinks it may be getting confused since he sees
> mention of an ldm5 and ldm6.  We only see ldm6 referenced in our log
> (see attached) and only have one ethernet port.  

Looking at just one downstream LDM process on host "rsaintrf", I see the
following at the beginning of the log file:

    Jan 27 21:10:40 ftpsvr rsaintrf[20183] NOTE: Starting Up(6.4.2): 
rsaintrf.midds.jsc.nasa.gov:388 20060127201040.869 TS_ENDT {{ANY,  ".*"}} 
    Jan 27 21:10:40 ftpsvr rsaintrf[20183] NOTE: LDM-6 desired product-class: 
20060127210938.347 TS_ENDT {{ANY,  ".*"},{NONE,  
"SIG=9b7a056982e167351f69140376671e58"}} 
    Jan 27 21:10:41 ftpsvr rsaintrf[20183] NOTE: Upstream LDM-6 on 
rsaintrf.midds.jsc.nasa.gov is willing to be a primary feeder 
    Jan 27 21:21:01 ftpsvr rsaintrf[20183] ERROR: Terminating due to LDM 
failure; Connection to upstream LDM closed 
    Jan 27 21:21:01 ftpsvr rsaintrf[20183] NOTE: LDM-6 desired product-class: 
20060127211936.115 TS_ENDT {{ANY,  ".*"},{NONE,  
"SIG=4330b0a1b311f387d2038d03dd7faa67"}} 
    Jan 27 21:21:01 ftpsvr rsaintrf[20183] ERROR: Terminating due to LDM 
failure; Couldn't connect to LDM on rsaintrf.midds.jsc.nasa.gov using either 
port 388 or portmapper; : RPC: Program not registered 
    ...

The above indicates that, after an initial, successful connection to the
upstream LDM on host "ftpsvr" (from 21:10:41 to 21:21:01) the downstream
LDM on "rsaintrf" lost the connection and was unable to reconnect
because the upstream LDM wasn't available: it was unable to create
a TCP connection to port 388 on the upstream host (because nothing
was listening on that port) and the LDM wasn't registered with the
portmapper on any other port on the upstream host.

The log file also contains the following:

    Jan 27 21:11:03 ftpsvr rsaintrf[20348] NOTE: Data-product with signature 
60f97e00a793afb4f67c7dd94fe46e41 wasn't found in product-queue 
    Jan 27 21:11:03 ftpsvr rsaintrf(feed)[20348] NOTE: Starting Up(6.4.2/6): 
20060127205131.054 TS_ENDT {{ANY,  ".*"}}, Primary 
    Jan 27 21:11:03 ftpsvr rsaintrf(feed)[20348] NOTE: topo:  
rsaintrf.midds.jsc.nasa.gov {{ANY, (.*)}} 
    Jan 27 21:27:11 ftpsvr rsaintrf(feed)[20348] ERROR: feed or notify failure; 
HEREIS: RPC: Unable to send; errno = Broken pipe 
    Jan 27 21:27:11 ftpsvr rpc.ldmd[20180] NOTE: child 20348 exited with status 
7 

The above indicates that an upstream LDM process was started on host 
"rsaintrf" feeding data-products of feedtype/pattern ANY/.* to a
downstream LDM on host "ftpsvr" using primary exchange mode.  This
process lasted from 21:11:03 to 21:27:11 at which time the upstream LDM
was unable to send a data-product to the downstream LDM because the
connection was broken for some reason (the reason might be found in the
LDM log file on host "ftpsvr").  At this time, the upstream LDM on host
"rsaintrf" exited.

I hope this helps.  Feel free to contact me with any questions.  Also,
if I can log onto the systems in question as the LDM user, then I should
be able to more easily diagnose any problems.

Incidentally, Tom Yoksas will also be at the AMS meeting in Atlanta,
where he will be presenting several papers on the LDM and Internet data
distribution.  He has considerable experience diagnosing connectivity
problems in LDM networks.  You might tell Brian Hoeth to look him up.
He'll have a laptop and the two of them might be able to solve all your
problems while at the convention.

Regards,
Steve Emmerson
LDM Developer