[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030623: HDS feed to/from seistan (cont.)



>From: Robert Leche <address@hidden>
>Organization: LSU
>Keywords: 200306161954.h5GJs2Ld016710 LDM-6 IDD

Hi Bob,

>>General question:  is it OK to give all srcc.lsu.edu machines total
>>access to seistan and sirocco?

>For testing I have no problem.

OK.  What we were looking at was the set of rules listed from a
'ipchains -L' on seistan.  It just seemed to us that the list is much
more complicated that it perhaps needs to be, and even hase some
contradictory ALLOWs and DENYs.

We tried adding another rule for 'ldm' (port 388) that allowed all
traffic (so it would appear at the top of the list) to see if that
would improve the data transfer from seistan to zero, but it
essentially changed nothing.

To check another possibility, we temporarily added an entry to
/etc/hosts for zero.unidata.ucar.edu.  We wanted to know if DNS access
was slowing things down at all; it wasn't.

The next test was to repeat a test you tried a couple of days ago: ping
a downstream host using different packet sizes.  This test was
informative to say the least.  We ran a series of ping tests to
zero.unidata.ucar.edu while varying the packet size from the default
(64 ICMP data bytes) to the maximum allowed (65507 data bytes):


packet size [bytes] round-trip min/avg/max/mdev
-------------------+-------------------------
  default           25.093/29.574/45.186/4.434 ms
   1000             27.265/29.108/34.205/2.739 ms
   2000             26.212/41.732/79.915/17.509 ms
   4000             27.295/62.053/631.038/134.159 ms
   6000             28.032/48.721/128.113/35.800 ms
   8000             28.550/67.500/192.124/51.161 ms
  10000             29.861/43.767/233.212/44.220 ms
  12000             32.321/99.727/935.382/206.396 ms
  14000             31.093/151.241/401.941/117.003 ms
  16000             33.061/120.564/287.026/87.274 ms
  18000             35.566/99.640/289.394/91.020 ms
  20000             30.985/115.274/371.706/109.435 ms
  30000             272.666/533.249/1326.722/347.923 ms
  40000             88.554/733.970/998.542/194.308 ms
  50000             1520.142/1610.229/1756.897/66.452 ms
  65507             2248.710/4534.717/7079.395/1642.487 ms

While the average ('avg') round trip times were not monotonic, they do
show that when packet sizes get to be large, the round trip time grows
rapidly.  We believe that this is what is being experienced in the feed
of HDS data to any downstream feed site from either seistan or datoo
(our tests feeding from datoo earlier today showed the exact same
pattern as when feeding from seistan).

While playing around with the ping tests, we decided to do pings to
machines along the route from seistan to zero.  We found something
that was interesting, but have no idea if it means anything.

Here is the route from seistan to zero and traceroute times (output from
mtr):

 1. 130.39.188.1                        0%   19   19     4    1    4     15 
 2. lsubr1-118-6509-dsw-1.g2.lsu.edu    0%   19   19     0    0    9    142 
 3. laNoc-lsubr.LEARN.la.net            0%   19   19     2    1    4     25 
 4. abileneHou-laNoc.LEARN.la.net       0%   19   19     7    7    8     15 
 5. kscyng-hstnng.abilene.ucaid.edu     0%   19   19    23   22   23     33
 6. dnvrng-kscyng.abilene.ucaid.edu     0%   19   19    34   33   34     45
 7. 198.32.11.106                       0%   19   19    34   33   34     40
 8. gin.ucar.edu                        0%   18   18    39   34   35     39
 9. flrb.ucar.edu                       0%   18   18    35   34   35     41
10. zero.unidata.ucar.edu               0%   18   18    35   34   35     39

In our ping tests, we found that the largest size ping packet that
laNoc-lsubr.LEARN.la.net would respond to was 17997 bytes (which
gets upped to 18025 ICMP bytes):

# ping -c 20 -s 17997 laNoc-lsubr.LEARN.la.net
PING laNoc-lsubr.LEARN.la.net (162.75.0.9) from 130.39.188.204 : 17997(18025) 
bytes of data.
18005 bytes from laNoc-lsubr.LEARN.la.net (162.75.0.9): icmp_seq=0 ttl=253 
time=243.957 msec
 ...

[root@seistan bin]# ping -c 20 -s 17998 laNoc-lsubr.LEARN.la.net
PING laNoc-lsubr.LEARN.la.net (162.75.0.9) from 130.39.188.204 : 17998(18026) 
bytes of data.
-- no response --

Next, I decided to run a data movement test using something other than
the LDM.  I used scp to move a 132 MB GOES-12 VIS image from zero to
seistan and then back again.  Here are the results:
scp test:

zero.unidata.ucar.edu -> seistan.srcc.lsu.edu
AREA1234             100% |********************************|   132 MB    03:06

seistan.srcc.lsu.edu -> zero.unidata.ucar.edu
AREA1234             100% |********************************|   132 MB    04:41

Both tests were "pull" tests: the scp was initiated on the receiving
system.

As you can see, it took about 50% more time to move the data from seistan
to zero than to move the file from zero to seistan.  This parallels the
observation that we can move HDS data to seistan from zero with little
to no latency, but can not going the other direction.

Since the size of products in the HDS feed is considerably larger than
those in the IDS|DDPLUS feed (which seistan is relaying to ULM with no
significant latencies), and since the number of products in the HDS
feed is a couple of orders of magnitude higher than in the UNIWISC feed
(the UNIWISC products are a lot larger than the HDS products), it
appears that the HDS feed problems is a function of lots of large size
packets.  This fact severely limits the value of LSU's being a top
level IDD relay.

Would it be possible for you to take

- the results of the ping tests above
- the fact that we can send seistan large volumes of HDS data with
  virtually no latency, but we can't receive the same data back to
  a different machine in our network
- the results of using scp to copy data between the sysame systems

to the LSU networking support group and enlist their aid in finding out
what is limiting the traffic out of your network?  We can run a number
of tests from here, but we don't have the same facilities for tracing
down a problem that the network group there should have.

>my general firewall philosophy, is to allow that which needs allowing.
>And no more. As every hosts at LSU is connected the open internet. We
>do not have safe areas behind a firewall. The issue of network security
>is of the most importance. Not all SRCC hosts need LDM and that is why
>it is set the way it is.

The LDM has its own security facilities.  Adding a firewall rule for
each machine that want to get a feed just makes the list of rules
longer and longer.  The net affect of this is to make packet processing
take longer and longer.  We feel that it would be more efficient --
while still being secure -- to allow open access to the LDM port 388
and remove all specific rules for LDM access.  Again, the 'allow' lines
in the ~ldm/etc/ldmd.conf file take care of sites trying to access the
LDM server.

Additionally, you have multiple rules in place for the exact same
host(s).  Here are the duplicates from seistan:

ACCEPT     all  ------  aqua.nsstc.uah.edu   anywhere              n/a
ACCEPT     all  ------  aqua.nsstc.uah.edu   anywhere              n/a

ACCEPT     all  ------  atm.geo.nsf.gov      anywhere              n/a
ACCEPT     all  ------  atm.geo.nsf.gov      anywhere              n/a

ACCEPT     all  ------  betsy.jsums.edu      anywhere              n/a
DENY       all  ------  betsy.jsums.edu      anywhere              n/a

ACCEPT     all  ------  mistral.srcc.lsu.edu anywhere              n/a
ACCEPT     all  ------  mistral.srcc.lsu.edu anywhere              n/a
ACCEPT     all  ------  mistral.srcc.lsu.edu anywhere              n/a
ACCEPT     all  ------  mistral.srcc.lsu.edu anywhere              n/a
ACCEPT     all  ------  mistral.srcc.lsu.edu anywhere              n/a
ACCEPT     all  ------  mistral.srcc.lsu.edu anywhere              n/a
ACCEPT     all  ------  mistral.srcc.lsu.edu anywhere              n/a
ACCEPT     all  ------  mistral.srcc.lsu.edu anywhere              n/a

ACCEPT     all  ------  sirocco.srcc.lsu.edu anywhere              n/a
ACCEPT     all  ------  sirocco.srcc.lsu.edu anywhere              n/a

ACCEPT     all  ------  weather.admin.niu.edu anywhere              n/a
ACCEPT     all  ------  weather.admin.niu.edu anywhere              n/a

ACCEPT     all  ------  weather2.admin.niu.edu anywhere              n/a
ACCEPT     all  ------  weather2.admin.niu.edu anywhere              n/a

ACCEPT     all  ------  weather3.admin.niu.edu anywhere              n/a
ACCEPT     all  ------  weather3.admin.niu.edu anywhere              n/a
ACCEPT     all  ------  weather3.admin.niu.edu anywhere              n/a

ACCEPT     tcp  ------  anywhere             anywhere              any ->   
https
ACCEPT     tcp  ------  anywhere             anywhere              any ->   
https
ACCEPT     tcp  ------  anywhere             anywhere              any ->   smtp
ACCEPT     tcp  ------  anywhere             anywhere              any ->   smtp

ACCEPT     udp  ------  anywhere             anywhere              any ->   ntp
ACCEPT     udp  ------  anywhere             anywhere              any ->   ntp

ACCEPT     udp  ------  anywhere             anywhere              
bootps:bootpc ->   bootps:bootpc
ACCEPT     udp  ------  anywhere             anywhere              
bootps:bootpc ->   bootps:bootpc


Each rule will be processed for each packet received until a match
occurs regardless of whether or not the rule is a duplicate of one
already run.  Simply, the longer the list of rules, the longer it takes
to process each packet.

>The rules that are in place, allow the
>services we need. The guild lines presented by the SANS org have been
>followed in reguard to firewall setup. I am open to suggestions in this
>area, but just know any changes we make must be secure.

We agree that security is _the_ most important consideration.  Our
recommendations would do nothing to compromise your security.  Rather,
they would be aimed at making your setup more efficient and useful
to potential IDD sites.

>From address@hidden Mon Jun 23 14:50:05 2003

>Datoo should allow you to login. The pw is now the one you have. The SunOs
>does not use ipchains. Just tcp-wrappers.

Thanks for the access.  The fact that datoo only uses TCP wrappers
answers some questions we had.

re: ipchains rule on seistan

>Changing the order is fine.

re:
1) flush the IP chains rule set that is in place right now on seistan
2) install a new rule set that consolidates the restrictions you currently
   have in place

Tom, consolidating is fine.

3) return the HDS feed from seistan to zero.unidata.ucar.edu to see if
   the large latencies drop to zero

>Ok.

We did return the feed of HDS back to seistan after noting that the feed
from datoo had virtually the same latencies.

re: any reason to not run tests

>I am comfortable with the security as it is. But making improvements in access
>control efficiency is fine as long as security is not compromised..

OK.  Since tests run since the previous email to you indicated that
the IP chains setup was not _the_ reason for large HDS latencies, changes
to your setup is not needed to proceed.  We stand by our observations
that things would be more efficient if duplicate rules were eliminated
and the ordering of the rules was such that more general rules come
first and then be followed by more specific ones.

>The issue
>of access control on Seistan or Sirocco being a source of trouble surprises
>me. The system performance indicators have not pointed to this as a critical
>problem area or a bottleneck.

As you can see above, that is our observation also.

>If you would have told me your tests found
>Sirocco had problems, I could believe it has had loading issues. It is a
>single CPU is a 450MHZ system. Seistan on the other hand is a dual 400Mhz
>machine. The loading on Seistan is much lower then Sirocco and is the reason
>I moved the downstream users off of Sirocco.

OK.  The problem does not appear to be system loading dependent as we
can move HDS data to seistan with little to no latency.  The problem is
strictly one of being able to move high volume data off of seistan to
machines that are not located in the srcc.lsu.edu and, presumably,
lsu.edu domains.  Also, the fact that the latency introduction appears
to work on a connection basis (latencies for HDS can be quite high
while those for UNIWISC and IDS|DDPLUS are quite low), it would almost
seem to point to something like packet shaping or something being used
to limit transfers.

Tom