Principles Underlying Internet Data Distribution

Adopted by the Unidata Policy Committee, 29 June 1994
Drafted by Dave Fulker

The Internet Data Distribution (IDD) system is a means by which Unidata universities can build and keep current their holdings of environmental data, especially those updated in near-real time. IDD is a "distributed application," with interacting components (data sources, data relays and data sinks) at many locations nationwide. Responsibilities for running and maintaining the IDD system also are distributed, on the assumption that proper balances among cost, data needs, performance and flexibility are best achieved through community effort--organized and guided by the Unidata Program Center (UPC)--rather than a more centralized endeavor.

Elaborated in subsequent sections are eight key principles that reflect the above purpose and underly the IDD system design:

  1. Data Reception Implies Relay Responsibilities
  2. The UPC Acquires Data of Very High Interest
  3. The UPC Chooses Routes for High-Interest Data
  4. Routing Is Ad Hoc for Data of Lesser Interest
  5. The High-Interest Category Is Defined by Actual Use
  6. Incentives and Criteria Exist for High-Level Relays
  7. The LDM Design Facilitates a Community Endeavor
  8. The Internet Will Evolve to Simplify the IDD

1. Data Reception Implies Relay Responsibilities

Those who receive information via IDD relay data to other Unidata users where practical and needed. Participants generally install components of the IDD system in computer and network settings sufficient to relay all data received to a minimum of one other site. This implies that users constrain their data requests to fall well within the capacities of their computers and networks.

2. The UPC Acquires Data of Very High Interest

To obtain rates possible only through "bulk purchases" or special agreements with agencies and providers, the UPC acquires widely used data streams for universities. In some cases, universities pay nothing to receive such data; in other cases (where usage is less or bulk purchases impractical) they pay the data providers, but costs are discounted through UPC-negotiated agreements. Unidata acquisition does not imply that the data flow through the UPC; the UPC will not become a data center or hub for IDD.

3. The UPC Chooses Routes for High-Interest Data

The UPC identifies effective routes for high-interest data from each source to each recipient. Routes are established for both normal and outage conditions, which may occur at any relay. In general, suggested routes are chosen to optimize performance and minimize redundant network traffic; in time, routing may become more dynamic, especially for data (such as from radars) for which demand is intermittent. Routings may be distinct for each data source and even for subsets from a single source. As the IDD system matures, the UPC will make fewer routing decisions and move toward monitoring the system and providing information and automation to simplify users' routing and management decisions.

4. Routing Is Ad Hoc for Data of Lesser Interest

For data of low to moderate interest, the UPC offers little guidance on routing or other management functions. The exception to this principle is that the UPC discourages sites from handling lesser-interest data in ways that interfere with high-interest data flows. In other words, lesser-interest flows must not consume system and network capacities to the point that significant degradations occur for high-interest data. This principle implies that any organization with adequate system and network capacities may become a provider simply by offering data, and those data become part of the IDD system when one or more Unidata sites chooses to receive them.

5. The High-Interest Category Is Defined by Actual Use

The classification of data as high interest is determined by actual use as indicated by the number of recipient universities. In consultation with the Unidata Policy and Users Committees, the level at which data becomes high interest is set to accord with resource constraints and other factors.

6. Incentives and Criteria Exist for High-Level Relays

There are natural incentives to serve as high-level relays (being close to the data sources increases reliability and reduces latency), but not every site can do so effectively. To ensure satisfactory reception at dependent sites, the criteria for high-level relays are quite stringent, especially regarding system and network capacities: top-level relays must feed between five and ten second-level relays, which means they must handle six to eleven times the data volume required to meet local needs. So that there are enough such relays, well maintained and well positioned in respect to the underlying network, Unidata encourages the National Science Foundation to offer additional incentives, such as grants for equipment and other resources targeted to meet this need.

7. The LDM Design Facilitates a Community Endeavor

The design of the Local Data Manager (LDM) software--on which IDD is based--facilitates adherence to these principles. Beyond offering ease of use and support for local control, LDM software collects statistics on performance and usage, and it eventually will utilize UPC-selected routes by default. The statistics are used to identify high-interest data and to diagnose problems at relay sites. As the LDM matures, routing will become simpler and increasingly automated.

8. The Internet Will Evolve to Simplify the IDD

The IDD system gains capacity and economy by exploiting new Internet services and capabilities as they materialize. Of particular note will be the provision by the networks of point-to-multipoint transmission services. These services may make relays and UPC-selected data routes unnecessary, in which case Unidata likely will focus attention on data recovery and retrospective data-access methods. These will be necessary because multicast services are unlikely to provide full reliability.