[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030701: ULM feed problems from LSU (cont.)



>From: address@hidden
>Organization: ULM
>Keywords: 200306161954.h5GJs2Ld016710 LDM-6 IDD

Hi Adam,

>Just catching up on what you guys might of found.  
>From what i see tornado i feeding well with little to no latency.  Just let
>me know if you guys have found anything.

The list of tests that have been run, and the interactions with
contacts at LSU are longer than can be accurately recounted in a simple
email, but I can say that a number of things have been investigated and
several problems found and fixed.   You can review all of the email
transactions between Unidata and LSU by doing a search of our inquriy
tracking system:

http://www.unidata.ucar.edu/glimpsedocs/ghiddsupport.html

Use 200306161954.h5GJs2Ld016710 as the search key and set the number
of matches up to at least 40 (the default is 20).  This will return
a list of all email sent between UPC support and the LSU LDM
contact in reverse chronological order.

A very quick overview of what you will find follows:

ULM:

- needed upgrade to LDM-6 along with "tuning" of requests in
  ~ldm/etc/ldmd.conf.  Tuning consisted of splitting feed requests
  and documentation.

- LDM-6's ability to transfer data faster to receiving clients
  caused the ULM machine, tornado, to exhibit overheating-related
  system shutdowns.  Investigations revealed that one of the case fans
  in the dual 900 Mhz PII Dell was not operating and was probably
  causing the over heating problem

- after the overheating problem was overcome by opening tornado's
  case, real time statistics reported to Unidata by tornado for the HDS
  feed from seistan revealed a problem with the Internet-2 connection
  from ULM.  Rerouting the connection to I1 (presumably the commodity
  internet), showed that there was an improvement in feeding IDD data
  but that a problem still existed.  The HDS stream exhibited the feed
  problems showing latencies up to 6000 seconds even while feeds of
  the IDS|DDPLUS, UNIWISC, and FSL2 feeds showed acceptable, but still
  elevated levels.  Switching the HDS feed from seistan to first
  emo.unidata.ucar.edu and then rainbow.al.noaa.gov showed that ULM
  could reliably get the HDS feed with very low latencies.  This
  conclusively demonstrated that the feed problems being experienced by
  ULM were being caused by problems outside of the ULM domain.  This,
  of course, was the conclusion that ULM had come to before contacting
  Unidata.

LSU:

- after verifying that the ULM feed problem was not related to
  anything at ULM, investigation moved to the srcc.lsu.edu, lsu.edu,
  and LANET domains.

- extensive tests feeding the HDS stream to the LSU IDD relay
  node, seistan.srcc.lsu.edu from a machine in the unidata.ucar.edu
  domain, and then back to a different machine in the unidata.ucar.edu
  domain demonstrated that data could be fed with little to no
  latence from Unidata to LSU, but latencies for the same stream
  from LSU back to Unidata showed the same pattern exhibited
  at ULM.  This finding was confirmed by feed tests from LSU to
  the University of South Florida in Tampa.

- the system configuation of the LSU IDD relay node was examined
  in detail by Unidata support and systems staff, and, while the
  firewall rules in place were not efficiently organized, no
  show stoppers were detected.

- conversations with LSU/SRCC support staff indicated that there 
  was no "packet shaping" being done by the SRCC.  Inquiry as
  to whether LSU telecommunications was running "packet shaping"
  got a negative reply.

- LSU telecommunications was contacted to see if there was any
  known problems with networking equipment (e.g., routers, etc.)
  on campus.  Since there were none, LSU telecommunications
  personnel contacted LANET personnel to see if they knew of any
  problems in their operations.  They commented that there had
  been an open trouble ticket for "quite some time" in which
  CRC and retransmission errors were being seen.

- last Friday, June 27, we noticed a significant change in the
  latencies being seen in the HDS feed from LSU to Unidata.
  For extended periods of time, the latencies dropped to near zero
  but spikes in the latencies were still being seen.  A return
  to HDS high latencies was seen on Sunday afternoon/evening.

- contact to SRCC and LSU personnel showed that no changes to either
  the srcc and LSU domains had been made over the weekend.  In
  the early morning hours on Monday, the latencies for feeds eminating
  from LSU dropped to near-zero values and have remained there
  ever since.

- a conference call between LSU/SRCC, LSU telecommunications, and
  Unidata representatives on Tuesday afternoon resulted general
  agreement that further troubleshooting was needed so that we will
  know exactly what was wrong and what was done to fix the problem.  We
  are currently doing stress testing of the IDD node in the
  srcc.lsu.edu doman by having it feed a variety of IDD streams to
  multiple of downstream machines.  The list of the machines and the
  streams they are ingesting from seistan.srcc.lsu.edu currently stands
  at:

Machine                        Feeds from seistan
------------------------------+---------------------------------------------
emo.unidata.ucar.edu           HDS
chevy.unidata.ucar.edu         HDS, IDS|DDPLUS, UNIWISC
newshemp.unidata.ucar.edu      HDS
zasu.unidata.ucar.edu          HDS, IDS|DDPLUS, UNIWISC
zero.unidata.ucar.edu          HDS, IDS|DDPLUS, UNIWISC
imogene.unidata.ucar.edu       HDS, IDS|DDPLUS, UNIWISC, FSL2
tornado.geos.ulm.edu           HDS, FSL2, IDS|DDPLUS, NNEXRAD, UNIWISC
hail.jsums.edu                 NNEXRAD, FNEXRAD, UNIWISC, IDS|DDPLUS
aqua.nsstc.uah.edu             UNIDATA (which is HDS, UNIWISC, IDS|DDPLUS)

- During the conference call, we found out that LSU _does_ do packet
  shaping, but typically for traffice eminating from the student side
  of their network (to limit things like MP3 down/up loads).  Given
  this, LSU telecommunications setup a separate, unimpeded channel for
  LDM traffic out of srcc.lsu.edu  The newly created pipe has a
  limit set at 20 Mbps (!), so it is _highly_ unlikely that any
  IDD traffic that srcc.lsu.edu could produce would put even
  a noticible dent in the channel's capacity.

At the moment, seistan is feeding 27 downstream LDM connections and
receiving 8 feeds from upstream LDMs, for a total of 35 LDM
connections.  During the peak volume times for the HDS datastream,
seistan should be relaying over 2.4 GB of data per hour.  Additional
downstream feed hosts will be added to the stress testbed if needed and
if seistan can carry the load.

The LSU campus network, which has an OC-3 connection to LANET, is
showing little signs of the increased data volumes from seistan.   It
is entirely possible that the cause and fix for the feed problems that
was being experienced by sites downstream of srcc.lsu.edu will never be
found; there are simply too many networking organizations along the
network path to be positive of where the failure lay.  Our testing of
srcc.lsu.edu is designed to stress the network as much as we can, but
we may not be able to have seistan, which is a dual 400 Mhz PII Linux
box, relay enough data to make a noticable impact in the network path.

The short comment is that data being fed out of srcc.lsu.edu is now
flowing with little to no latencies.  Whether or not these latencies
will remain low remains to be seen.  If a return to high latencies is
seen, Unidata, SRCC, and LSU telecommunications personnel will be
actively trying to figure out where the problems are occurring and what
steps need to be taken to solve them.

Well, there you have it.  Again, the full set of email transactions
can be found in our inquiry tracking system.

Cheers,

Tom

From address@hidden Sat Jul  5 20:21:33 2003
Subject: Thanks

Unidata,

From all of us here at ULM we thank everyone involved for the effort put forth 
to try and fix this problem.

The passwords will stay the way they are for now on tornado until someone from 
unidata tells me they are done and i will change the passwords back to what 
they were.  If you ever need them again just email me.

oh,  hehe one more thing.  "I1" is the commodity Internet. I throught that i 
had said that awhile ago but i guess i was mistaken.  Sorry for the confusion.

If you ever have any questions feel free to email me.  Also, if you need me to 
allow pinging on tornado i will.

Thanks

Adam Taylor
Compuing Center
University of Louisiana at Monroe