[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010515: Slow Downstream Node Problem



Russ,

Interesting... I can see that large injections would cause a problem.
I wonder about the deleting of the next oldest though?  Where do you
stop in your search for the next oldest? I've always liked the pq
concept
that maintains a buffer of data in the order received and the idea of
stepping back through the product queue looking for an unlocked resource
to delete is counterintuitive.

This also doesn't solve the problem of the slow downstream host causing
the holdup as it will just end up jumping to the next oldest product
and locking it. As you point out downstream nodes through a slow
line can be a big problem if it's getting a satellite image for example
and you want to inject a few text messages. You might end up deleting
quite alot that other nodes haven't got yet just so this slow downstream
node can hold onto its resource.

Still, if it the easiest solution to try who am I to argue. :-) Thanks
for keeping me in the loop.

Paul.


Russ Rew wrote:
> 
> Paul,
> 
> I wrote:
> 
> > ... I was surprised to discover that instead of jumping to the
> > newest end of the product queue, the downstream client just jumps
> > ahead one minute, so the problem quickly recurs.
> 
> Steve Chiswell reminded me why the send process for a slow downstream
> site doesn't just jump to the newest end of the queue when it falls
> too far behind.
> 
> "Spurty" data streams such as the CONDUIT feed have the characteristic
> that a great many products are injected into the source queue over a
> small interval of time, e.g. all the model output grids from a
> high-resolution model run, so the injection time of all the products
> are similar.  It's the injection time into the IDD that's used to
> determine how old a product is and whether a RECLASS message should be
> sent to jump ahead in the queue.
> 
> Products in such streams typically can't all be drained out of the
> source queue anywhere near as fast as they were injected (for example,
> a T1 line to a downstream site would only permit 500 Mbytes/hour to be
> sent), so there can be a large inherent latency in such data streams.
> Jumping ahead an hour might lose much more data than necessary,
> because there are typically large time intervals between the spurts of
> data dumped into the queue and hence between their injection times.
> 
> But independent of this, we still have two alternatives for how to
> deal with the situation where the oldest product is locked and space
> is needed in the queue for a new product:
> 
>  1. Delete the oldest non-locked product to make space, instead of
>     just giving up when the oldest product is found to be locked.
> 
>  2. Signal the sender to release the lock and disconnect, so the
>     product can be deleted, letting the downstream node reconnect
>     asking for later products.
> 
> The first alternative can be done in the same process that detects the
> lock and it can keep deleting oldest products in a loop until enough
> space is available for the new product.
> 
> With the second alternative, there is some delay for the process that
> detects the lock to signal all the other queue scanning processes, and
> for the lock to be released by the process that has it.
> 
> It may be premature optimization to assume the time difference is
> significant, but the first alternative seems enough simpler that I
> think we should try it first.
> 
> --Russ

-- 
Paul Hamer
phone: 303.497.6342