[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20011008: IDD latencies at PSU (cont.)



"Arthur A. Person" wrote:
> Anne, Jeff,
> 
> Since switching yesterday from motherlode to the nutone relay, our DDPLUS
> and other motherlode data have been real tight, within a few seconds.  So,
> the problem appears to be the local link from the EMS College to the Penn
> State campus network.  I'll await repairs before switching back.
> 
> The only other mystery is why the feeds from SSEC and OU are working fine
> through the old path.  My only guess is that it's related to product size,
> maybe more interupts/transmissions on the link mean more delays?  The
> average product size on the DDPLUS stuff seems to be around a few hundred
> bytes while the average size on, say, NEXRAD is ~15000 (guesses).
> Otherwise, why would one particular feed (motherlode) be the only one
> affected?
> 
> I'm starting another sort-of test...  nutone isn't feed NMC2 data, so I've
> switched NMC2 back to motherlode.  It should have larger product sizes, so
> I'll see how it does alone.
> 
>                                        Art.
"Arthur A. Person" wrote:
> 
> Anne, et.al.,
> 
> I guess I'm still confused about the problem.  I'm running NMC2 data
> through the old path and it's still bogged down.  Based on notifyme's from
> motherlode, it appears that all the current data for NMC2 is in (i.e. I
> don't get any reports of data arrival).  But ldm.meteo.psu.edu is pulling
> data, apparently from the motherlode queue, that is backlogging to almost
> an hour.  When I first started the feed, it jumped ahead and seem to catch
> up rapidly by about a half-hour's time, but then bogged down.  The average
> product size for the NMC2 stuff appears to be around 15-20KB.  The NEXRAD
> from SSEC and NEXRD2 from OU are still running within seconds of
> transmission using the same local path.
> 
> Can someone offer any kind of explanation why motherlode behaves this way
> but SSEC and OU don't?  Is there just much more NMC2 data to send than
> NEXRAD or NEXRD2?  Are there many more products in one NMC2 stream than
> either of the NEXRAD or NEXRD2 streams? ...
> 
>                                       Thanks.
> 
>                                         Art.
> 
> Arthur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email:  address@hidden, phone:  814-863-1563

Hi Art,


First, the NMC2 feed is notorious for being late, so it's not the best
candidate for testing latencies, although I understand that things
improved recently.    As I've been composing this message, over the
course of an hour, it's been running between 7 minutes late and 1/2 hour
late.  You'll have to let your notifyme run for a while - it can by
spurty.

I too wonder about the timely feeds from OU and SSEC.  Regarding product
size, the motherlode feed contains relatively large products on the HDS
stream.  The level II data from OU is in the range of 200K - 700K bytes,
so they're not small.  The motherlode stream does contain a relatively
large number of small text products. 

Here are some sizes from motherlode's stats from approximately the last
24 hours.  These only report average product sizes, not minimums or
maximums, because that's all that the stats report.  (motherlode does
not get the level II data.) 

--------------------------------- IDS_DDPLUS ------
num IDS_DDPLUS products = 154393
total IDS_DDPLUS bytes      =     148 Mb
avg IDS_DDPLUS product size =     960 bytes
min IDS_DDPLUS bytes/hour   =   12850 bytes
max IDS_DDPLUS bytes/hour   = 6690201 bytes

--------------------------------- HDS ------
num HDS products = 202766
total HDS bytes      =   2139 Mb
avg HDS product size =  10551 bytes
min HDS bytes/hour   = 152344 bytes
max HDS bytes/hour   =    169 Mb

------------------------ NMC2 (CONDUIT) ------
num NMC2 products = 241910
total NMC2 bytes      =   6234 Mb
avg NMC2 product size =  25772 bytes
min NMC2 bytes/hour   =     17 Mb
max NMC2 bytes/hour   =    497 Mb

--------------------------------- NEXRAD ------
num NEXRAD products = 150509
total NEXRAD bytes      =   1413 Mb
avg NEXRAD product size =   9393 bytes
min NEXRAD bytes/hour   =      1 Mb
max NEXRAD bytes/hour   =     66 Mb

So, from the IDS|DDPLUS, HDS, and NMC2 feeds, motherlode has presumably
transferred about 25,000 products/hour to your site.  This is about 7
products per second.

I began to wonder if the number of products is part of the problem.  The
number of products site A can transfer to site B is fundamentally
constrained by the number of rpc calls that A can make to B.  Here's a
piece of an early email from Russ regarding this subject:


> The main point I want to make is that the number of products per time
> interval may be more important than the volume of the products as a
> cause for delays.  Another way of saying this is that the network
> latency may be more important than the aggregate bandwidth for a
> network connection in determining IDD delays.
> 
> Sending each product always require at least one remote procedure call
> (RPC, a round trip network transaction) from the upstream to the
> downstream site, so the rate at which even small products can be sent
> from one host to another is limited by the maximum number of RPCs per
> second over the network connection between the hosts.  The time for a
> single RPC call is what ldmping measures, and this is close to the
> time required to send a single small product.  So you can determine a
> maximum for how many products per second a downstream host can accept
> from your LDM by taking the reciprocal of the ldmping time to that
> host (ignoring the first few times ldmping reports, letting it settle
> down to a steady state).  Similarly, the reciprocal of the ldmping
> time from an upstream site is a good approximation for how many
> products per second you can get from that site.
> 
> During some hours the rate for FOS products can be as high as 5.4
> products/second, though the long-term average is about 3.1
> products/second.
> 
> If we had been feeding FOS products to Jim Koermer's LDM during one of
> the times when it was experiencing high latency, ldmping indicates it
> would only have been able to handle about 5 product/second:
> 
>  test$ ldmping -h mammatus.plymouth.edu -i 1
>  Nov 15 15:42:55      State    Elapsed Port   Remote_Host           rpc_stat
>  Nov 15 15:42:56 RESPONDING   1.204837 4677   mammatus.plymouth.edu  
>  Nov 15 15:42:57 RESPONDING   0.241447 4677   mammatus.plymouth.edu  
>  Nov 15 15:42:58 RESPONDING   0.222650 4677   mammatus.plymouth.edu  
>  Nov 15 15:42:59 RESPONDING   0.228247 4677   mammatus.plymouth.edu  
>  Nov 15 15:43:01 RESPONDING   0.212776 4677   mammatus.plymouth.edu  
>  Nov 15 15:43:02 RESPONDING   0.204985 4677   mammatus.plymouth.edu  
>  ...
> 
> This shows that each RPC call takes about 0.2 seconds, so only 1/.2 or
> about 5 products per second can be received.  Later in the same
> afternoon, the ldmping times climbed even higher, to about .35 seconds
> per RPC call, so at this point it could only keep up with about 3
> products per second.
> 
> When the RPC rate is less than the rate at which products are injected
> into the data stream, products back up at the upstream sender
> process, until it ultimately gets a RECLASS (a message from the
> downstream host indicating the offered products are too old) and jumps
> to the start of the queue to send current data, dropping the
> intervening products.
> 

Here's an ldmping from motherlode to your machine. It shows that
motherlode could transfer about 33 products per second.

> motherlode.ucar.edu% ldmping -i5 ldm.meteo.psu.edu
Oct 16 16:57:53      State    Elapsed Port   Remote_Host          
rpc_stat
Oct 16 16:57:53 RESPONDING   0.082348  388   ldm.meteo.psu.edu  
Oct 16 16:57:58 RESPONDING   0.036529  388   ldm.meteo.psu.edu  
Oct 16 16:58:03 RESPONDING   0.038835  388   ldm.meteo.psu.edu  
Oct 16 16:58:08 RESPONDING   0.036564  388   ldm.meteo.psu.edu  
Oct 16 16:58:13 RESPONDING   0.036689  388   ldm.meteo.psu.edu  
Oct 16 16:58:18 RESPONDING   0.036617  388   ldm.meteo.psu.edu  
Oct 16 16:58:23 RESPONDING   0.037059  388   ldm.meteo.psu.edu  
Oct 16 16:58:28 RESPONDING   0.036583  388   ldm.meteo.psu.edu  
Oct 16 16:58:33 RESPONDING   0.036311  388   ldm.meteo.psu.edu  
Oct 16 16:58:38 RESPONDING   0.036691  388   ldm.meteo.psu.edu  
Oct 16 16:58:43 RESPONDING   0.042131  388   ldm.meteo.psu.edu  
Oct 16 16:58:48 RESPONDING   0.037000  388   ldm.meteo.psu.edu  
Oct 16 16:58:54 RESPONDING   0.039903  388   ldm.meteo.psu.edu  
Oct 16 16:58:59 RESPONDING   0.037044  388   ldm.meteo.psu.edu  
Oct 16 16:59:04 RESPONDING   0.036041  388   ldm.meteo.psu.edu  
Oct 16 16:59:09 RESPONDING   0.038302  388   ldm.meteo.psu.edu  

And, I looked for reclass messages in motherlode's log.  There are lots
from ldm.meteo.  However, all the ones for today, which start about
14:30Z, only about 3 hours ago, are for the NMC2 feed.  I think that's
because that's all ldm.meteo is requesting from motherlode, right? 
There are about 50 of those.  From the previous days the RECLASSes are
for your entire feed, like this one:

ldmd.log.1:Oct 15 12:38:41 motherlode.ucar.edu ldm(feed)[12715]:
RECLASS: 20011015113841.684 TS_ENDT {{DIFAX|FSL2|WMO, 
".*"},{NMC2|SPARE,  ".*"}}

Didn't you increase the default latency value for your site?

I thought I'd try the same thing to nutone for comparison, but I still
can't connect there:

motherlode.ucar.edu% ldmping -i5 nutone.ems.psu.edu
Oct 16 16:59:22      State    Elapsed Port   Remote_Host          
rpc_stat
Oct 16 16:59:35      NAMED  12.223594    0   nutone.ems.psu.edu  can't
contact portmapper: RPC: Timed out
Oct 16 16:59:52      NAMED  12.040442    0   nutone.ems.psu.edu  can't
contact portmapper: RPC: Timed out
Oct 16 17:00:09      NAMED  12.258855    0   nutone.ems.psu.edu  can't
contact portmapper: RPC: Timed out
Oct 16 17:00:26      NAMED  12.419979    0   nutone.ems.psu.edu  can't
contact portmapper: RPC: Timed out


Is nutone requesting everything from motherlode that ldm.meteo was?  Are
these all via single request lines, or are you using two request lines
to motherlode?  Sometimes people divide up requests into two, one that
using an IP address instead of a name.

So, in theory you should be able to receive all the products that
motherlode is sending, but I don't know about the effect on the network
of the interaction between the products you're getting from motherlode
and the other products you are receiving from other sites.  

So, this message is lengthy *and* inconclusive!  Would you please show
me ldmpings from ldm.meteo to all your upstream feeds?  And, please send
me ldm.meteo's request lines so I can see exactly what you're asking for
now and in what manner. 

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************