[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[CONDUIT #CEL-625898]: 20190423: CONDUIT feed latencies



Hi Carissa,

This is a quick note about two different topics related to the latencies
that Unidata top level relays have experienced/see for the IDD CONDUIT
data feed:

- first, it would be most useful (and interesting!) to know what, if
  anything, was done at NCEP to drastically reduce the observed latencies
  in the CONDUIT feed

  FYI:

  - 20-way feed REQUEST splits by Penn State and Unidata did result in
    reduced latencies, but the 20-way split at Unidata did not return
    the latencies to earlier levels

  - I contacted Pete Pokrandt (UW/AOS) to see if he had/would change
    his 10-way REQUEST split for CONDUIT

    Before Pete could start changing his existing 10-way split to a 
    20-way split, the CONDUIT latencies plummeted and have remained
    as low as they have historically been during good periods.

    We would _love_ to know what was done to effect the drop in 
    latencies at NCEP (or at some point in the network managed by
    NOAA).  If nothing was done (really nothing that is), it would also
    be good to know as this might indicate that there was a problem in
    Internet2.  While it is _extremely_ rare for there to be problems
    in Internet2, we and UW/SSEC did experience a situation where the
    Internet2 gateway connection to AWS East was significantly 
    under performing, and the slowness affected our ability to uplink
    NEXRAD Level 2 data to the AWS S3 bucket that we have been populating
    for over 3 years as part of the NOAA Big Data project.

- secondly, CONDUIT latencies have always exhibited a variation from near
  zero to 30 seconds

  These latencies are caused by the process(es) that are inserting
  products into the LDM queue on the CONDUIT origination machine NOT
  sending a CONT (continue) signal to the LDM to inform it that new
  products are available in the LDM queue.  The default behavior of
  an LDM is to process (relay to downstreams, etc.) all products it
  finds in its queue; sleep for 30 seconds; and then wake up and
  check to see if there are any products to process.  The result of
  this default behavior is latencies that range from zero/near-zero
  to 30 seconds.  The step that needs to be done to eliminate this
  "artificial" 0-30 second latency is for the process that is
  inserting the product(s) into the LDM queue to send a CONT signal
  to the negative of the process ID of the lead LDM process.  In
  practice, this looks like:

  /bin/kill -s CONT -`cat ~ldm/ldmd.pid`

  NB:

  - the system version of 'kill' must be used; 'kill' provided by,
    for instance, BASH does not work correctly

  - the file ~ldm/ldmd.pid contains the process ID of the lead
    LDM process for a running LDM

  - the CONT signal must be passed to the negative of the process ID
    for the lead LDm process

So, my question to you is how we can go about enhancing the product
insertion processes in use in NCEP to include sending of the CONT
signal to the LDM so that it is knows each time a new product is
inserted into its queue?

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************