[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20050315: IDD top level relay atm.geo.nsf.gov PSU (cont.)



>From: "Arthur A. Person" <address@hidden>
>Organization: PSU
>Keywords:  200503102200.j2AM0Lq2027557 IDD toplevel relay

Hi Art,

>I'm getting a green light from folks here to go ahead with setting up a 
>relay that could replace atm.geo.nsf.gov based on what we've discussed 
>thus far.

Excellent!

>I believe our networks are ready now to handle the loads, 
>however, our server is not.  We're currently in the process of trying to 
>acquire funds to upgrade ldm.meteo.psu.edu and hope to have a replacement 
>in place by early summer. Our plan is to upgrade our server to an 
>enterprise class dual 64-bit system capable of larger product queues, with 
>hardware RAID1 mirrors for the product queue and gigabit ethernet. Does 
>this kind of system and time-frame sound reasonable?

Yes, but I will be interested in your reaction to what I include below.

>How should we proceed from here?

Perhaps it would be useful if I described the setup we have been moving
towards for our toplevel IDD relay nodes -- idd.unidata.ucar.edu and
thelma.ucar.edu.  Let me warn you that I am not the expert in what I am
about to say, but I think I can relate the essence of what we have been
working on.  The real brains behind what I describe below are:

John Stokes    - cluster design and implementation
Steve Emmerson - LDM development
Mike Schmidt   - system administration and cluster design
Steve Chiswell - IDD design and monitoring

I am sure that these guys will chime in when they see something I have
mis-stated :-)

As you know, in addition to atm.geo.nsf.gov we operate the top level
IDD relay nodes idd.unidata.ucar.edu and thelma.ucar.edu.  Instead of
idd.unidata and thelma.ucar being simple machines, they are part of a
cluster that is composed of 'director's (machines that directs IDD feed
requests to other machines) and 'data servers' (machines that are fed
requests by the director(s) and service those requests).  We are using
the IP Virtual Server (IPVS) available in current versions of Linux to
forward feed requests from 'directors' to 'data servers'.

In our cluster, we are using Fedora Core 3 64-bit Linux run on a set of
identically configured Sun SunFire V20Z 1U rackmount servers:  dual
Opterons; 4 GB RAM; 2x36 GB 10K RPM SCSI; dual GB Ethernet interfaces.
We got in on a Sun educational discount program and bought our 5 V20Zs
for about $3000 each.  These machines are stellar performers for IDD
work when running Fedora Core 3 64-bit Linux.  We tested three
operating systems side-by-side before settling on FC3; the others were
Sun Solaris x86 10 and FreeBSD 5.3, both of which are 64-bit.  FC3 was
the _clear_ winner; FreeBSD was second; and Solaris x86 10 was a
_distant_ third.  As I understand it, RedHat Enterprise WS 4 is FC3
with full RH support.

Here is a "picture" of what idd.unidata.ucar.edu and thelma.ucar.edu
currently look like (best viewed with fixed width fonts):

              |<----------- directors ------------>|

                  +-------+            +-------+
                  |       ^            |       ^
                  V       |            V       |
              +---------------+    +---------------+
idd.unidata   | LDM   | IPVS  |    | LDM   | IPVS  |  thelma.ucar
              +---------------+    +---------------+
                      / \    |               |   / \
                     /   \   |               |  /   \
                    /     \  +----+          | /     \
           +-------/-------\------|----------+/       \
           |      /         \     |          /         \
           |     /           \    +----------------+    \
           |    /             \            /       |     \
           V   /               \          /        V      \
        +---------------+   +---------------+   +---------------+
        |  'uni2' LDM   |   |  'uni3' LDM   |   |   'uni4' LDM  |
        +---------------+   +---------------+   +---------------+

        |<----------------- data servers ---------------------->|

The top level indicates two 'director' machines: idd.unidata.ucar.edu
and thelma.ucar.edu (thelma used to be a SunFire 480R SPARC III box).
Both of these machines are running IPVS and LDM 6.3.0 configured on a
second interface (IP).  The IPVS 'director' software forwards port 388
requests received on a one interface configured as idd.unidata.ucar.edu
on one machine and thelma.ucar.edu on the other.  The set of 'data
server' backends are the same for both directors (at present).

When an IDD feed request is received by idd.unidata.ucar.edu or
thelma.ucar.edu it is relayed by the IPVS software to one of the data
servers.  Those machines are configured to also be known internally as
idd.unidata.ucar.edu or thelma.ucar.edu, but the do not ARP, so they
are not seen by the outside world/routers.  The IPVS software keeps
track of how many connections are on each of the data servers and
forwards ("load levels") based on connection numbers (we will be
changing this metric as we learn more about the setup).  The data
servers are all configured identically: same RAM, same LDM queue size
(8 GB currently), same ldmd.conf contents, etc.

All connections from a downstream machine will always be sent to the
same data server as long as its last connection has not died more than
one minute ago.  This allows downstream LDMs to send an "are you alive"
query to a server that they have not received data from in awhile.
Once there have been no IDD request connections by a downstream host
for one minute, a new request will be forwarded to the data server that
is least loaded.

This design allows us to take down any of the data servers for whatever
maintenance is needed (hardware, software, etc.) whenever we feel like
it.  When a machine goes down, the IPVS server is informed that the
server is no longer available, and all downstream feed requests are
sent to the other data servers that remain up.  On top of that,
thelma.ucar.edu and idd.unidata.ucar.edu are on different LANs and may
soon be located in different parts of the UCAR campus.

LDM 6.3.0 was developed to allow running the LDM on a particular
interface (IP).  We are using this feature to run an LDM on the same
box that is running the IPVS 'director'.  The IPVS listens on one
interface (IP) and the LDM runs on another.  The alternate interface
does not necessarily have to represent a different Ethernet device; it
can be a virtual interface configured in software.  The ability to run
LDMs on specific interfaces (IPs) allows us to run LDMs as either 'data
collectors' or as additional data servers on the same box running the
'director'.  By 'data collector', I mean that the LDMs on the
'director' machines have multiple ldmd.conf requests that bring data to
the cluster (e.g., CONDUIT from atm, UIUC, and/or, NEXRAD2 from Purdue,
HDS from here, IDS|DDPLUS from there, etc.).  The data server LDMs
request data redundantly from the 'director' LDMs.  We currently do not
have redundancy for the directors, but we will be adding that in the
future.

We are just getting our feet wet with this cluster setup.  We will be
modifying configuations as we learn more about how well the system
works.  In stress tests run here at the UPC, we were able to
demonstrate that one V20Z was able to handle 50% more downstream
connections than the 480R thelma.ucar.edu without introducing latency.
With three data servers we believe that we can now field literally
every IDD feed request in the world if we had to (the ultimate failover
site).  If the load on the data servers ever becomes too high, all we
need do is add additional boxes to the mix.  The ultimate limiting
factor in this setup will be the routers and network bandwidth here in
UCAR.  Luckily, we have excellent networking!

All of the above may not seem like an answer to your question "How
should we proceed from here", but I felt that it was important for you
(PSU) to get a clearer picture of our IDD development.  We have talked
about upgrading atm to a cluster like that described above and have
also considered approaching GigaPops like the MAX (U Maryland) to see
if they would be interested in running a cluster there (we feel that it
is best to have top level relays as close to a GigaPop as possible).
Since you (PSU) are willing to play a leading role in the IDD relay
effort, I feel like we should come to an agreement on the class of
installation that would best handle current and future needs.

Please let us know of any questions you have on the above.  There
should be some since I have most likely not portrayed things well
enough.

Cheers,

Tom
--
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.