NLDM Progress Report
September 20, 2004
Anne Wilson
The NLDM network currently consists of the following eight machines:
Hostname
|
Location
|
Function
|
OS
|
imogene.unidata.ucar.edu
|
Boulder, CO
|
ingest and relay
|
Linux
|
atm.geo.nsf.gov
|
Washington, D.C.
|
ingest and relay
|
Solaris
|
ldm.iihr.uiowa.edu
|
Iowa City, IO
|
relay
|
Linux
|
tempest.aos.wisc.edu
|
Madison, WI
|
relay
|
Linux
|
bigbird.tamu.edu
|
College Station, TX
|
relay
|
Linux
|
methost24.met.sjsu.edu
|
San Jose, CA
|
relay
|
Linux
|
joey.unidata.ucar.edu
|
Boulder, CO
|
statistics processing
|
Linux
|
conan.unidata.ucar.edu
|
Boulder, CO
|
statistics display
|
Solaris
|
The network is relaying the CONDUIT, CRAFT,
HDS,
NEXRAD, IDS|DDPLUS, and UNIWISC data feeds. Ingest code for
NIMAGE is under development. (NIMAGE contains the largest data
products in the IDD, having product sizes under 20MB.)
The statistics page
has been augmented to allow comparison of the same statistics across
feed types for a particular machine. Bringing up multiple
copies of
the statistics pages allows comparison across machines and provides a
window to network performance as a whole.
Latencies and reception are good. A
recent analysis showed 99% of all products arriving at all sites within
5 seconds for the most part. The exceptions were for machines
having sporadic network issues or periods of high load.
An additional machine was added to the network to receive and process
the statistics in order to reduce the load on our web
server. This machine now processes the statistics and
relays the resulting binned statistics to the web sever for display.
I have been working on a comprehensive white paper describing the
research results and features of INN and NNTP.
The final section of the paper, which remains to be written, is
"Recommendations". However, I will make the general case here.
INN relays data at least as well as LDM. With latencies for both
generally quite small and with statistics being calculated differently
for each, it is difficult to argue at this point that one is better in
this regard than the other.
However, INN has additional features that the LDM does not have, which
could be useful to us:
- The flooding algorithm has been shown to provide good
automated routing.
- INN dynamically creates and destroys connections between peers.
- INN can switch automatically between CHECK and
no-CHECK streaming transmission modes.
- Working with newgroups is a big improvement over the
current limited number of LDM feed types.
- INN newsgroup lists support metacharacter expansion
similar to that used in shell file name expansion.
- INN supports "negative subscriptions", i.e., users can specify
what newsgroups they don't want to receive.
- With its backlog handling, INN can handle outages on
the order of weeks, which could be useful as we migrate into areas
beyond real time data distribution.
- For similar reasons, the pull based retrieval
provided by NNTP could also be beneficial.
- INN supports multiple methods of article storage yielding a range
of article retention times.
- The article storage method most similar to the LDM product queue
supports multiple buffers and the logical grouping of physical buffers
into higher level "meta" buffers.
- In addition to streaming, INN supports batch transmission.
- Arbitrary header information can be attached to a product.
This can serve as metadata.
- INN provides high level server administration tools, including
resource monitoring and email notification of problems.
- NNTP supports some network-level administration via control
messages, particularly notification of newsgroup changes.
- INN is IPV6 enabled.
- INN peers can be reconfigured dynamically without restarting the
server.
- Users can communicate with a remote INN server directly via
telnet. This has been useful in testing and can also be used for
retrieval. (Although one could telnet to a LDM port, the LDM
protocol doesn't support such interactive sessions.)
- Due to the lengthy history and popularity of news, there is a lot
of NNTP based software available.
The costs to changing to INN are: transitioning the
community, administration of a more complex package, and using open
source software.
Getting the community to change is a one time cost
whereas the benefits of using INN would continue over time. It is
acknowledged, however, that the NWS use of LDM may be cast in
concrete. Independent of NWS usage, the NLDM network demonstrates
that it is possible for this change to evolve, phasing out the use of
one protocol while phasing in the use of the other.
Regarding increased administration costs, these occur mainly at
installation time and can be mitigated by a range of options from
scripts to GUIs that we create, coupled with the use of JNLDM for low
powered sites.
Finally working within an open source community would be different for
us. However, I have found the community to be open and
responsive, especially with respect to projects that show off their
software.
We plan to meet in very near future to discuss how to proceed with this
project.