[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #HEQ-649192]: LDM fault tolerance



Geoffry,

> It looks like LDM is entirely a push system, with no way to re-request
> notifications of products from an upstream node that were missed while a
> node was down, is that correct?

If an upstream node is down for less than the minimum residency time of its 
product-queue (typically one hour), then a downstream node requesting from it 
won't miss anything.

> Then the only way to eliminate a single
> point of failure is to have multiple nodes receive these push notifications
> of new products (then deduping requests before potentially multiple retries
> to download the actual data)?

The source of the data is always a single point of failure.

When possible, we recommend that a downstream site make identical requests to 
two, distinct upstream sites. One of those upstream sites will then transfer 
products in primary mode and the other will transfer products in alternate 
mode. The primary mode transfer will be as fast as the network allows. The 
alternate mode transfer will use very little bandwidth because the products 
will likely have already arrived on the primary mode connection. If and when 
the primary mode connection slows down or breaks, then the alternate mode 
connection will be switched by the downstream site to primary mode.

The product-queue automatically removes duplicate data-products -- so having 
two upstream feeds is safe as well as efficient.

> This model assumes that you can request a download multiple times after you
> receive a notification that a product is available, is that true as well?

Not quite. See the above description.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: HEQ-649192
Department: Support LDM
Priority: Normal
Status: Closed