[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #WFA-619667]: Request for CONDUIT feed



Hi Martha and Randy,

re:
> MySQL wasn't running because I forgot to turn it on at boot when I
> installed it during this latest effort and we hadn't rebooted till
> yesterday.  I have fixed that, and I started mysqld.

Very good.

> Right now we are running a notifyme to idd.cise-nsf.gov, an ldmwatch -f
> CONDUIT, and a pqcat on CONDUIT.  Do you want us to turn it off while
> you do your stuff?  If so let us know.

I don't think it is necessary to keep running these.  Our observations
are that:

- noaapxcd.it-protect.com _is_ able to receive data from idd.cise-nsf.gov

- it appears like the flow of data from idd.cise-nsf.gov to 
noaapxcd.it-protect.com
  is being choked down somehow.  We see this sort of behavior at sites that
  are running packet shaping software (imposed bandwidth limiting).  The
  "smoking gun" we use for identifying packet shaping is a comparison of
  the latencies for a high volume datafeed like CONDUIT and a low volume feed
  like IDS|DDPLUS.  This test is not easily made on noaapxcd.it-protect.com 
since
  it is not reporting realtime statistics back to us.  If you are game for 
reporting
  those statistics, you would need to:

  - have a hole in your firewall opened for LDM traffic (port 388) to 
rtstats.unidata.ucar.edu
  - uncomment out the 'exec' line that reports realtime statistics back to us:

change:

#exec   "rtstats -h rtstats.unidata.ucar.edu"

to:

exec   "rtstats -h rtstats.unidata.ucar.edu"

    and then stop and restart your LDM.

  Please be aware that a full CONDUIT datastream contains just under of 5 
GB/hour.
  For reference here is a snapshot of the volumes being received on the toplevel
  IDD relay host idd.unidata.ucar.edu:

Data Volume Summary for idd.unidata.ucar.edu

Maximum hourly volume   7936.673 M bytes/hour
Average hourly volume   4110.660 M bytes/hour

Average products per hour     187102 prods/hour

Feed                           Average             Maximum     Products
                     (M byte/hour)            (M byte/hour)   number/hour
CONDUIT                1791.048    [ 43.571%]     4883.769    52139.638
NEXRAD2                1013.858    [ 24.664%]     1372.404    46999.021
NGRID                   646.418    [ 15.725%]     1091.096    15493.447
HDS                     223.569    [  5.439%]      435.078    18950.234
NIMAGE                  160.116    [  3.895%]      319.440      193.213
NEXRAD3                 146.130    [  3.555%]      180.059    23449.043
FNEXRAD                  58.685    [  1.428%]       69.549       53.702
IDS|DDPLUS               32.622    [  0.794%]       39.069    29585.213
UNIWISC                  21.911    [  0.533%]       27.556       20.319
EXP                       9.856    [  0.240%]       20.129      179.447
DIFAX                     4.399    [  0.107%]       20.488        6.128
FSL2                      2.043    [  0.050%]        2.202       22.426
LIGHTNING                 0.006    [  0.000%]        0.031        9.745


  This listing shows that the CONDUIT feed has an average volume of
  ~1.8 GB/hr and a max if ~4.9 GB/hr.  noaapxcd.it-protect.com is receiving
  a tiny fraction of this volume currently.
  
> And I think you got me when ldm stopped Awhile ago, at least I assume it
> was you - I never had that happen before.

Yup, it was me.

re: McIDAS setup
> All the stuff with the redirect doesn't surprise me, as somehow when I
> installed mcidas2008 I lost what we had, and tried to patch it though I
> obviously missed some entries.  Sorry.

No worries.  If you two want, I can "regularize" your LDM and McIDAS setups
to better match Unidata recommendations.  This would entail:

- moving XCD-decoded data from /data to /data/ldm/mcidas
- reconfiguring file REDIRECTions in the ~mcidas account
- changing ~ldm/data from a real directory to a link to /data/ldm
- changing ~ldm/logs from a real directory to a link to /data/ldm/logs

Our idea is that sites are most likely to backup file systems where software
is installed.  Data, on the other hand, is transitory, so it is better
saved through an archive process rather than a backup process.

re:
> I forwarded your earlier speculations to our network people and here
> was their response. Do you have any comments?
> 
> Between the 10.2.15.0/24 subnet and the Internet from a TRIAD
> perspective, the traffic goes through a couple of switches and  router
> there in Westfields, then hits the firewall.  There are no ACLs on the
> routers that could interfere.  I agree that the firewall is the logical
> suspect.  You mention that some data got through overnight, I assume
> that means data got "pulled" from the far end server?

Data gets pushed from the upstream (idd.cise-nsf.gov) to the downstream 
(noaapxcd).
The sequence for a connection is:

- downstream registers a request for data to an upstream

- the upstream validates allowed access by the downstream

- the upstream turns around the connection and sends data to the
  downstream whenever data in the datastream(s) in question match the
  extended regular expression in place for the request.  In the current
  test, noaapxcd is requesting everything (".*") from idd.cise-nsf.gov
  in the single datastream CONDUIT

> From your
> observation, does it seem like the connection is getting cut off during
> the transfer of data, or is it more like the connection that is
> maintained between client and server between transfers is being
> dropped?

I would say that the throughput is being choked down.  If the connection
is being dropped, the downstream will recontact the upstream and request
the same feed.

> I've increased the timeout on the port 388/tcp "service" on
> the firewall from 30 mins to 4 hours.  The expected firewall behavior
> is that if no data is transferred over the established connection
> before the timeout expires, then it drops the connection.  Do you think
> 4 hours is long enough for a valid test?

Yes.  There is enough data in CONDUIT that there should never be a lull
lasting as long as 4 hours.

The other possibility that I have not yet mentioned is that the network
pipe out to the Internet from noaapxcd/NGC is very limited.  I never
mentioned this since I considered this an unlikely possibility. Another
possibility is that a router in the path of dataflow has some serious
problems.

One other thing that could be done for testing is for your network group
to open firewall access to idd.unidata.ucar.edu.  This is the toplevel
IDD relay node that we operate here at UPC/UCAR.  We could then compare
the flow of data from idd.cise-nsf.gov to noaapxcd against the flow
of data from idd.unidata.ucar.edu to noaapxcd.  This one is your call...

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: WFA-619667
Department: Support IDD
Priority: Normal
Status: Closed