[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010515: Slow Downstream Node Problem



>To: address@hidden
>From: Paul Hamer <address@hidden>
>Subject: Slow Downstream Node Problem
>Organization: NOAA/FSL
>Keywords: 200105152257.f4FMvRp04890

Paul,

I wrote:

> ... I was surprised to discover that instead of jumping to the
> newest end of the product queue, the downstream client just jumps
> ahead one minute, so the problem quickly recurs. 

Steve Chiswell reminded me why the send process for a slow downstream
site doesn't just jump to the newest end of the queue when it falls
too far behind.  

"Spurty" data streams such as the CONDUIT feed have the characteristic
that a great many products are injected into the source queue over a
small interval of time, e.g. all the model output grids from a
high-resolution model run, so the injection time of all the products
are similar.  It's the injection time into the IDD that's used to
determine how old a product is and whether a RECLASS message should be
sent to jump ahead in the queue.

Products in such streams typically can't all be drained out of the
source queue anywhere near as fast as they were injected (for example,
a T1 line to a downstream site would only permit 500 Mbytes/hour to be
sent), so there can be a large inherent latency in such data streams.
Jumping ahead an hour might lose much more data than necessary,
because there are typically large time intervals between the spurts of
data dumped into the queue and hence between their injection times.

But independent of this, we still have two alternatives for how to
deal with the situation where the oldest product is locked and space
is needed in the queue for a new product:

 1. Delete the oldest non-locked product to make space, instead of
    just giving up when the oldest product is found to be locked.

 2. Signal the sender to release the lock and disconnect, so the
    product can be deleted, letting the downstream node reconnect
    asking for later products.

The first alternative can be done in the same process that detects the
lock and it can keep deleting oldest products in a loop until enough
space is available for the new product.

With the second alternative, there is some delay for the process that
detects the lock to signal all the other queue scanning processes, and
for the lock to be released by the process that has it.

It may be premature optimization to assume the time difference is
significant, but the first alternative seems enough simpler that I
think we should try it first.

--Russ