Principles Underlying Internet Data Distribution
Adopted by the Unidata Policy Committee, 29 June 1994
Drafted by Dave Fulker
The Internet Data Distribution (IDD) system is a means by which Unidata universities
can build and keep current their holdings of environmental data, especially
those updated in near-real time. IDD is a "distributed application," with interacting
components (data sources, data relays and data sinks) at many locations nationwide.
Responsibilities for running and maintaining the IDD system also are distributed,
on the assumption that proper balances among cost, data needs, performance and
flexibility are best achieved through community effort--organized and guided
by the Unidata Program Center (UPC)--rather than a more centralized endeavor.
Elaborated in subsequent sections are eight key principles that reflect the
above purpose and underly the IDD system design:
- Data Reception Implies Relay Responsibilities
- The UPC Acquires Data of Very High Interest
- The UPC Chooses Routes for High-Interest Data
- Routing Is Ad Hoc for Data of Lesser Interest
- The High-Interest Category Is Defined by Actual Use
- Incentives and Criteria Exist for High-Level Relays
- The LDM Design Facilitates a Community Endeavor
- The Internet Will Evolve to Simplify the IDD
1. Data Reception Implies Relay Responsibilities
Those who receive information via IDD relay data to other Unidata users where
practical and needed. Participants generally install components of the IDD system
in computer and network settings sufficient to relay all data received to a minimum
of one other site. This implies that users constrain their data requests to fall
well within the capacities of their computers and networks.
2. The UPC Acquires Data of Very High Interest
To obtain rates possible only through "bulk purchases" or special agreements with
agencies and providers, the UPC acquires widely used data streams for universities.
In some cases, universities pay nothing to receive such data; in other cases (where
usage is less or bulk purchases impractical) they pay the data providers, but
costs are discounted through UPC-negotiated agreements. Unidata acquisition does
not imply that the data flow through the UPC; the UPC will not become a data center
or hub for IDD.
3. The UPC Chooses Routes for High-Interest Data
The UPC identifies effective routes for high-interest data from each source to
each recipient. Routes are established for both normal and outage conditions,
which may occur at any relay. In general, suggested routes are chosen to optimize
performance and minimize redundant network traffic; in time, routing may become
more dynamic, especially for data (such as from radars) for which demand is intermittent.
Routings may be distinct for each data source and even for subsets from a single
source. As the IDD system matures, the UPC will make fewer routing decisions and
move toward monitoring the system and providing information and automation to
simplify users' routing and management decisions.
4. Routing Is Ad Hoc for Data of Lesser Interest
For data of low to moderate interest, the UPC offers little guidance on routing
or other management functions. The exception to this principle is that the UPC
discourages sites from handling lesser-interest data in ways that interfere with
high-interest data flows. In other words, lesser-interest flows must not consume
system and network capacities to the point that significant degradations occur
for high-interest data. This principle implies that any organization with adequate
system and network capacities may become a provider simply by offering data, and
those data become part of the IDD system when one or more Unidata sites chooses
to receive them.
5. The High-Interest Category Is Defined by Actual Use
The classification of data as high interest is determined by actual use as indicated
by the number of recipient universities. In consultation with the Unidata Policy
and Users Committees, the level at which data becomes high interest is set to
accord with resource constraints and other factors.
6. Incentives and Criteria Exist for High-Level Relays
There are natural incentives to serve as high-level relays (being close to the
data sources increases reliability and reduces latency), but not every site can
do so effectively. To ensure satisfactory reception at dependent sites, the criteria
for high-level relays are quite stringent, especially regarding system and network
capacities: top-level relays must feed between five and ten second-level relays,
which means they must handle six to eleven times the data volume required to meet
local needs. So that there are enough such relays, well maintained and well positioned
in respect to the underlying network, Unidata encourages the National Science
Foundation to offer additional incentives, such as grants for equipment and other
resources targeted to meet this need.
7. The LDM Design Facilitates a Community Endeavor
The design of the Local Data Manager (LDM) software--on which IDD is based--facilitates
adherence to these principles. Beyond offering ease of use and support for local
control, LDM software collects statistics on performance and usage, and it eventually
will utilize UPC-selected routes by default. The statistics are used to identify
high-interest data and to diagnose problems at relay sites. As the LDM matures,
routing will become simpler and increasingly automated.
8. The Internet Will Evolve to Simplify the IDD
The IDD system gains capacity and economy by exploiting new Internet services
and capabilities as they materialize. Of particular note will be the provision
by the networks of point-to-multipoint transmission services. These services may
make relays and UPC-selected data routes unnecessary, in which case Unidata likely
will focus attention on data recovery and retrospective data-access methods. These
will be necessary because multicast services are unlikely to provide full reliability.