Report of the Task Force on the Future of Unidata Analysis and Visualization Applications

Russ Rew, Steve Chiswell, Don Murray, and Tom Yoksas
Revised April 11, 2006

Introduction

The October 2005 Policy Committee asked the Unidata Program Center to report on its plans for analysis and visualization software, specifically providing a rationale for the current and future allocation of resources to the development and maintenance of its current mix of packages. This report is the response to that request.

Unidata currently supports three application packages for analysis and visualization: GEMPAK, IDV, and McIDAS. As a result, the Unidata Program Center (UPC) also supports three communities of users. This raises some questions:

This report will attempt to answer these questions. In formulating answers, we have become convinced that issues concerning decoder software are also crucial to planning the development, maintenance, or sun-setting of analysis and visualization applications.

Background

Since the founding of Unidata in the 1980's, one part of its mission has been to continue the development and support of analysis and visualization applications for the "core" Unidata community. Initially, the common discipline that defined the core community was weather analysis and forecasting (synoptic meteorology). Very little use has been made of Unidata applications in atmospheric chemistry, air pollution, or other environmental sciences, although more recently some Unidata tools and data have been employed to a lesser extent in operational meteorology, hydrology, climatology, oceanography, and other earth sciences.

From the outset, there was a realization that no single application package met all the requirements of the core Unidata community, which spanned a wide range of uses, from quick looks at current weather and visualizations in classrooms to sophisticated analyses for advanced research and high-quality plots for publications. The initial Unidata plans included both a PC-oriented application requiring only modest computing resources (PC-McIDAS, derived from SSEC's mainframe McIDAS software) and a workstation-oriented application for more advanced uses (GEMPAK from NASA Goddard). A third application suite, (WXP from Purdue) was supported at the UPC for scripting and quick construction of campus weather displays of near real-time data and satellite images.

In 1992 Unidata (with its governing committees) reluctantly decided to "sunset" support for the WXP application suite from its core packages, leaving only GEMPAK and McIDAS as officially recommended applications. A lack of programmer resources to continue to support all three applications led to the decision to drop WXP. Long after UPC support ceased, the primary developer of WXP continued to maintain and support the package for remaining WXP users in the Unidata community.

We learned at least two lessons from sun-setting support for WXP:

  1. An application can survive without UPC support if it has both an active community and a heroic developer.
  2. Once users become proficient at accomplishing a set of tasks with a particular application, it is nearly impossible to get most of them to transition to using a different application.

In 1998, a User Committee MetApps task force confirmed a need for developing a new analysis and visualization application, with several goals in mind:

This led to a Java development project to prototype applications to evaluate the suitability of Java for platform-independent Unidata software, ultimately resulting in the IDV and useful platform-independent middleware. The original requirement for the IDV was to develop an integrated application with broad functionality similar to, but not replacing, GEMPAK and McIDAS. Governing committees agreed that extensibility to handle new data types, exploring new approaches to visualizing and interacting with Earth system data, and focusing on techniques that fuse data from multiple sources were more important goals than matching features in existing applications.

Strengths and limitations of the applications

McIDAS

McIDAS-X is in use at approximately 100 Unidata sites in this country and abroad. The costs of supporting McIDAS-X at the UPC are less than 0.5 FTE. Tom Yoksas handles maintaining, upgrading, testing, distributing, and supporting McIDAS-X and the associated McIDAS-XCD decoder software. He also teaches the McIDAS Workshop, serves as liaison to the McIDAS developers, supports the user community by developing and maintaining ADDE servers for both McIDAS and the IDV, and helps add servers for new data types when appropriate. Providing the McIDAS data streams to Unidata sites is an additional cost, but the data is also used by GEMPAK and IDV users, so that cost could not be recovered by dropping McIDAS support.

Unique capabilities provided by McIDAS include:

Limitations of McIDAS-X include difficulty of modifying it to handle data from new hyperspectral instruments, lack of multipanel displays, and various fixed size limits. These limitations have led SSEC to plan a transition to McIDAS-V, software based on IDV and VisAD that will include support for current capabilities of McIDAS-X. The SSEC transition will ensure that current users can continue to use their own McIDAS-X code.

GEMPAK

GEMPAK is in use at over 200 sites in this country and abroad, and its use is still growing. The costs of supporting GEMPAK are about 0.7 FTE. Steve Chiswell handles maintaining, upgrading, testing, distributing, and supporting GEMPAK. He also teaches the GEMPAK Workshop, serves as liaison to the GEMPAK developers, supports the user community, helps develop and maintaining GEMPAK decoders, and helps provide support for new data types.

Unique capabilities provided by GEMPAK include:

GEMPAK has limitations that have kept its use from growing even faster. Unlike McIDAS and IDV, it is not supported natively on Windows platforms. Grid forecast times are presently constrained to HHHmm format which limits use with long duration climate simulations. It can't handle Gaussian grids, Eulerian grids, or other irregular grids except by interpolating them onto a regular grid first. GEMPAK lacks the ability to directly access data from remote servers, which limits it to analysis and visualization of data stored locally. GEMPAK's grid diagnostics and algorithms are customized for use with weather and forecasting data, so it is less likely to be generally useful for other kinds of geoscience data.

IDV

The IDV is in use at over 80 sites, but its use is growing more rapidly than GEMPAK or McIDAS. Currently the direct costs of supporting the IDV are about 2.0 FTE. Don Murray and Jeff McWhirter handle developing, maintaining, testing, distributing, documenting, and supporting the IDV. In addition, the IDV depends on SSEC VisAD software, McIDAS ADDE servers, and Unidata-developed middleware such as THREDDS, Java netCDF, and Common Data Model interfaces for other kinds of data such as GRIB and radar data formats. Don and Jeff also teach the IDV workshops, participate with the IDV Steering Committee in planning future developments, help publicize the IDV, and add support for new data types.

Unique capabilities provided by IDV include:

Current IDV limitations include performance problems in some areas, inability to produce publication quality plots, problems with staggered grids, and derived quantities involving derivatives.

Is a single application suite desirable or practical?

With this background, it is now easier to understand why the UPC is supporting three application packages. In the Unidata 2008 proposal, the UPC committed to support and maintain both GEMPAK and McIDAS as long as they have a "substantial user base". The user base of the IDV has also become substantial and continues to grow. With three mostly independent user communities and the tendency of users to stick with what they know, there is little reason to give high priority to incorporating new features in an application just because they are supported by one of the other applications. Largely because each application has unique capabilities and its own dedicated user community with a natural resistance to change, the UPC has responded by supporting all three application packages.

From the above, it also seems clear that the UPC costs of supporting multiple applications are currently modest. Dropping support for either McIDAS or GEMPAK might save about 0.5 to 0.7 FTE, but would have a significant effect on that application's users, who would no longer benefit from the experienced, free, high-quality UPC support.

The UPC currently lacks the development resources of NCEP and SSEC for developing and maintaining GEMPAK and McIDAS. Even if we dropped GEMPAK and McIDAS and dedicated all the resulting UPC resources (1.2 FTE) to IDV development, we could not duplicate all the functionality of GEMPAK and McIDAS in the IDV while keeping up with new additions to the external packages. We could apply more development resources to applications by cutting back on our infrastructure and collections efforts (netCDF, THREDDS, OPeNDAP, ...), but that would distort Unidata's traditional balance between well-grounded applications and cyberinfrastructure. By leveraging external development efforts at NCEP and SSEC for the benefit of the Unidata community, we can stay current with the latest versions of GEMPAK and McIDAS and also port some of the best new concepts from GEMPAK and McIDAS into the IDV.

Decoder Issues

A potential for duplication of efforts occurs when a new data type is added to Unidata's data streams that is of interest to all three user communities, for example Level 2 NEXRAD data. New decoders may be required to convert the data from the form in which it is transported to an alternate form for access by the application. For GEMPAK, the decoders necessary to deal with new data types are often developed at NCEP, and likewise SSEC develops some of the new McIDAS decoders. In such cases there may be duplication of effort, but the UPC resources needed to incorporate the new decoders are modest.

In other cases (e.g. GRIB2), the application may have already been modified to access data directly from the form in which the data is transported, in which case new decoders are not needed. However, accessing data on-the-fly from its transport form means it gets decoded every time it is accessed, rather than once into a decoded file. For some datasets, decoders are needed to ensure adequate performance of data access.

Trends and External Events

External trends, events, and decisions over which Unidata has no control are likely to affect the future of Unidata applications in ways that may be difficult to anticipate. Nevertheless, we can discuss the effects of what we see as the most likely external developments.

The future of GEMPAK

It seems unlikely that NCEP will drop GEMPAK (NAWIPS) support soon, because they are starting to rewrite large portions of GEMPAK in C, to eliminate fixed-size grids and other limitations of the current Fortran implementation. However, plans change and the UPC must still be prepared to deal with contingencies.

If NCEP dropped support for GEMPAK due to budget cuts or in order to increase support for the operational AWIPS software, Unidata would lose the ability to leverage NCEP's decoder developments, and we would likely fall behind in being able to deal with new data as soon as it became available. We would then always be trying to catch up with changes in the data streams after they occurred, and might not be able to provide application access to data stream changes without additional resources. There is no significant independent external GEMPAK development community on which we could depend.

In that case, we should also plan to announce a transition away from GEMPAK. We could plan to support the current version as long as feasible, stop adding new capabilities, maintain GEMPAK in the face of data stream changes for up to a year, and use our development efforts to set up an open source repository for GEMPAK source to permit continuing community support. If such ongoing support failed to materialize, some of the GEMPAK user community might have difficulty finding an adequate alternative.

If we need to transition users away from GEMPAK or McIDAS, we must be prepared for an increase in support over the transition period. In this case we should also allocate resources for moving the sources to something like SourceForge or GForge (software development web sites that host open source repositories and tools for collaborative development of community software).

The future of McIDAS-X and McIDAS-XCD

The future of McIDAS-X and the McIDAS-XCD decoders will in large part be determined by SSEC's current efforts to move to McIDAS-V, the planned replacement that uses VisAD, the IDV, and additional development for backward compatibility. SSEC is also transitioning from ADDE data servers for McIDAS to OpenADDE. They will continue to support and upgrade McIDAS-XCD decoders for new data sources. McIDAS-X is now in maintenance mode, but SSEC will continue development of new data servers.

Unidata plans to follow SSEC's lead in transitioning McIDAS users to McIDAS-V. Since this will have Unidata's IDV as a significant subset, getting McIDAS users to try IDV even before the release of McIDAS-V could smooth the transition. SSEC is seeking resources to ensure backwards compatibility of McIDAS-V for current McIDAS-X users. If McIDAS-V development is delayed, the UPC should still plan to gradually move McIDAS users to the IDV, although the UPC will not have access to resources needed to ensure backwards compatibility.

Community support

With the notable exception of WXP, community support for Unidata software has not been very successful, and it would not be wise to depend on a community support model for complex analysis and visualization application packages. Open Source software has proved very successful when there are thousands of users or dozens of developers for a package, but many open source projects languish without a critical mass of enthusiastic developers. Although the Unidata community is large, it is probably not large enough to sustain pure open source development and support of its major infrastructure and applications software packages.

On the other hand, UPC efforts to provide plug-in frameworks for applications and data access infrastructure can work in leveraging community support and development efforts.

A vision for future Unidata applications

We're racing towards a future in which common data services will support the creation, archiving, cataloging, discovery, access, analysis, visualization, and curation of scientific data. Publish-subscribe (push) systems for data collections may be as common as client-server (pull) systems. With a publish-subscribe architecture, the addition of data to a collection generates an event that causes automatic cataloging of the data with appropriate metadata and notification of all programs and services that subscribe to matching metadata patterns.

This kind of service-oriented architecture makes it possible to build applications by invoking and dynamically composing reusable services. Unidata can be part of this future by turning some of its software into services and by building its software to take advantage of standard services offered by others. Delivering software that is built from distributed services rather than from components, objects, functions, or statements is a major change in the way software is developed and deployed. The potential benefits for users are significant, because software built from services can evolve to take advantage of advances without redeployment or recompilation of new versions. As an example, new location-based applications have been built quickly by combining various data access services with the service interfaces for Google Maps and Google Earth.

Exposing GEMPAK's powerful grid diagnostics as services would make them available to other applications, providing a simple integration of applications, since services are independent of implementation language. Eventually IDV and McIDAS-V might make use of such services to provide GEMPAK's grid diagnostics to users. Ultimately, monolithic application packages may fade away, replaced by custom-tailored software configured dynamically from services to provide just the functionality needed for a specific use.

Conclusions

We estimated that at least 5 developers at Unidata would be required to equal NCEP's development work on GEMPAK and decoders. For a 0.7 FTE investment in developer time, we are leveraging the NCEP work to serve the needs of a large Unidata GEMPAK community. So long as NCEP continues GEMPAK development and maintenance, the UPC should continue to integrate that work into a Unidata GEMPAK distribution that works well with other Unidata software. Redirecting the efforts we are devoting to GEMPAK to a different applications package would not be a wise reallocation of resources.

Similarly, it would be costly to Unidata's McIDAS users to redirect the less than 0.5 FTE investment in McIDAS development and support to a different application package. Our McIDAS plans should remain aligned with SSEC's plans to transition the current McIDAS-X based software into a VisAD-based system that builds on existing capabilities of the IDV.

The UPC could make good use of more Java development resources to implement various planned enhancements to the IDV, but not at the expense of the current GEMPAK and McIDAS user communities. The IDV has been very successful in integrating data from various sources, in providing useful visualizations of georeferenced data from other geoscience disciplines, in demonstrating the usefulness of innovative display technologies, and in providing an end-to-end application for testing and improving Unidata's data collections and data access infrastructure software.

These conclusions amount to a justification for and endorsement of the status quo. Rather than reallocating application development resources, the UPC should continue to try to acquire other resources for IDV development and maintenance, for example through cyberinfrastructure solicitations. The UPC should also attempt to identify ways to take advantage of user community experience for help in answering simple IDV support questions (for example, IDV user forums on the Unidata Web site). This would allow the current developers to spend less time on support and more time on development as the IDV user community grows.


Last modified: Thu Apr 13 13:10:55 MDT 2006