[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #EZZ-721081]: Weird issue observed on running LDM system



Hi Rob,

re:
> We have two upstream nodes (ingester and ingester_alt)  feeding a
> downstream node (chef_a also called genproc2). The problem is that
> occasionally a file can enter the queue of the upstream nodes, but does
> not make it to the downstream node. Here is an example:
> 
> This file was inserted into "ingester" :
> 
> Nov 13 21:25:35 pqutil INFO:    10016 20141113212523.352     EXP 000  
> /mnt/data/outgoing/weathertap/wrf/noram_rr/20141113_2100/postwrf_d01_20141113_2100_f00215.gr2
> 
> 
> The file should have flowed to "genproc2", however, it did not. But if
> we look in the log of "ingester", we see these entries:
> 
> 
> Nov 13 21:25:35 ingester chef_a(feed)[17556] NOTE: feed or notify failure; 
> COMINGSOON: RPC: Unable to receive; errno = Connection reset by peer
> Nov 13 21:25:35 ingester rpc.ldmd[9257] NOTE: child 17556 exited with status 6
> Nov 13 21:25:36 ingester chef_a(feed)[18862] NOTE: Starting Up(6.8.1/6): 
> 20141113212535.192 TS_ENDT {{EXP,  "weathertap/wrf/noram_rr/.*"}}, 
> SIG=25aa506ae04df822715fefd4d2c98e24, Primary
> Nov 13 21:25:36 ingester chef_a(feed)[18862] NOTE: topo:  chef_a {{EXP, (.*)}}
> 
> 
> And if we look in the log of genproc2 we see this:
> 
> 
> Nov 13 21:25:35 genproc2 ingester_alt[28119] NOTE: Switching data-product 
> transfer-mode to primary
> Nov 13 21:25:35 genproc2 ingester_alt[28119] NOTE: LDM-6 desired 
> product-class: 20141113212535.192 TS_ENDT {{EXP,  
> "weathertap/wrf/noram_rr/.*"},{NONE,  "SIG=25aa506ae04df822715fefd4d2c98e24"}}
> Nov 13 21:25:35 genproc2 ingester_alt[28119] NOTE: Upstream LDM-6 on 
> ingester_alt is willing to be a primary feeder
> Nov 13 21:25:35 genproc2 ingester[28118] NOTE: Switching data-product 
> transfer-mode to alternate
> Nov 13 21:25:35 genproc2 ingester[28118] NOTE: LDM-6 desired product-class: 
> 20141113212535.230 TS_ENDT {{EXP,  "weathertap/wrf/noram_rr/.*"},{NONE,  
> "SIG=25aa506ae04df822715fefd4d2c98e24"}}
> Nov 13 21:25:35 genproc2 ingester[28118] NOTE: Upstream LDM-6 on ingester is 
> willing to be an alternate feeder
> 

re:
> I'm not quite sure what this means. It looks like the downstream node
> switched from pulling data from ingester to ingester_alt.

Correct.

re:
> Why would this occur?

If a downstream LDM is REQUESTing the same set of products from a feed
from two or more upstream LDMs, then the downstream will keep track
of which upstream is providing data products the fastest.  In your case,
the downstream determined that products were coming faster from ingester_alt
than from ingester, so it shutdown its REQUESTs and restarted them
so that the one to ingester_alt was the PRIMARY and the one to ingester
was the SECONDARY.

re:
> The upstream node appeared to be fine. Is the upstream process
> crashing and THAT caused the downstream node to switch to alternate?

No, the auto switching is a design feature of newer LDMs.

re:
> Thanks for in insight you can provide.
> 
> REQUEST lines in ldmd.conf on genproc2
> 
> request EXP    "weathertap/wrf/noram_rr/.*"    ingester
> request EXP    "weathertap/wrf/noram_rr/.*"    ingester_alt

One important comment:

If the set of products on ingester is different from the set
of products on ingester_alt, then using the exact same extended
reqular expression in redundant REQUESTs is _very_ dangerous in
that products to be sent could well be skipped.  The safest thing
to do if the set of products on both upstream LDMs are not identical
is to force each REQUEST to act in PRIMARY mode.  Here is an
example:

request EXP    "weathertap/wrf/noram_rr/.*"    ingester
request EXP    "(weathertap/wrf/noram_rr/.*)"    ingester_alt

The set of products REQUESTed is the same, but the extended regular
expressions are different, and this will defeat the auto switching
that is built into the LDM.

One other comment:

The trailing '.*' in each extended regular expression is not needed.
The following is a better/more compact way to do the REQUESTs:

request EXP    "weathertap/wrf/noram_rr/"    ingester
request EXP    "(weathertap/wrf/noram_rr/)"    ingester_alt

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: EZZ-721081
Department: Support LDM
Priority: Normal
Status: Closed