Drafted by Dave Fulker
Amended and adopted by the Unidata Policy Committee on 26 February 2002
This plan lays out goals, objectives, and strategies intended to provide high-level guidance for Unidata (specifically for the Unidata Program Center, operated as part of the UCAR Office of Programs in Boulder, Colorado) over the next 5-10 years. Consistent with Unidata’s core value of adaptability (see below), the plan will be revisited periodically. An underlying premise is that Unidata, since its inception in 1983, has gained strengths and successes that position the organization to engage in and contribute to a broader mission. As described herein, this broadening will embrace all sciences of the Earth system, including social aspects, helping to transform and enhance education and discovery:
The result is a strategy that moves Unidata—in an evolutionary fashion—toward the contribution of cyberinfrastructure [1] that is useful across all of the geosciences, including their interplay with society. (In this document we make no distinction between the geosciences and the Earth-system sciences.) This strategy is a roadmap to the future, laying out work to be done across the geosciences to utilize emerging technologies and strengthen the social fabric for doing so.
The revised mission for the Unidata Program is: Data, tools, and community leadership for enhanced Earth-system education and research.
Unidata has a well-established set of core values that drive all aspects of this strategic plan as well as the governance and structure of the organization. The changes implied in the plan will be approached so as to preserve the core values. Growth will be evolutionary and natural, rather than forced, and new efforts will be undertaken only when sufficient resources have been gained to assure progress and avoid degradation of services on which the present community depends. Unidata’s core values include the following:
The core community toward which Unidata targets its products and services is composed of post-secondary educators and researchers in Earth-system science and associated disciplines, especially the geosciences. This definition is deliberately broad, encompassing institutions that range from community colleges to research universities, and it ignores international boundaries. However, a degree of primacy is assigned to U.S. institutions in the following ways: 1) voting members in the governing bodies for Unidata are selected primarily from U.S. colleges and universities that offer courses in Earth-system science or associated disciplines; 2) English is the language employed in most Unidata transactions and documentation.
This plan recognizes and intends that certain Unidata products and services will benefit an even broader community:
However, the strategy articulated herein does not count these groups as part of the primary Unidata constituency. Specifically, support services and other staff-intensive activities within the Unidata Program Center are directed toward educators and researchers at post-secondary, academic institutions in the U.S. Others may benefit, however, as there are no constraints or fees on acquiring or using Unidata software and data, except where required to comply with the law, fulfill treaty obligations, meet contractual requirements, or protect intellectual property owned by other parties. In summary, the strategy is to focus Unidata efforts on a core constituency of post-secondary, Earth-system science academics, while intending that benefits accrue to a larger, international circle of educators, learners, researchers, and operational forecasters.
This plan deliberately positions Unidata to help the community prepare for a future that, as articulated in NSF Geosciences Beyond 2000: Understanding and Predicting Earth's Environment and Habitability, includes:
Building on a record of software provision, software support, and innovative community involvement, Unidata is poised for new contributions in an era of unprecedented data complexity, data accessibility, and cross-disciplinary data synthesis. This vision emphasizes providing a rich set of data services and tools, including powerful means for data exploration, software components that users can combine to create new capabilities, and technologies that enhance collaboration, all as part of a larger NSF cyberinfrastructure for the 21st century in addition to the data provision that has always been a part of Unidata’s mission. Thus equipped, a broader and more diverse Unidata community may be better able to integrate education with research, while studying some of the most challenging scientific problems that our nation faces.
Understanding the environment we live in and how human activities and other natural changes affect it has always been regarded as one of the most important and challenging problems in science. Recent decades have seen a rapid increase in the amount of greenhouse gases in the atmosphere and sharp changes in the climate of the Earth. While there is a continuing debate on the human contribution to the changes, it is now widely accepted that these environmental problems transcend disciplinary as well as international boundaries and in fact require both inter- and multi-disciplinary approaches to solving them. Thus environmental research is high among the scientific priorities of this nation, as articulated, for example, in: Environmental Science and Engineering for the 21st Century (NSF); Grand Challenges in Environmental Science (NAS); Our Common Journey: A Transition Toward Sustainability (NAS), and the aforementioned Geosciences Beyond 2000 document (NSF/GEO).
Various High-Performance Computing and Communications (HPCC) initiatives by federal agencies also have identified weather prediction, ocean modeling, and hydrological modeling as grand challenge problems. To address these, the geoscience community, including researchers, educators, students, and policy makers, will need access to vast amounts of data from myriad sources. Typical among the "grand challenges" is the need to understand all phases of the hydrologic cycle. Clearly, education and research in this area require finding and integrating data from the oceans, the atmosphere, and the lithosphere, as well as from models that cross these traditional discipline boundaries. The Unidata of the future will have a powerfully enabling effect on the study of the hydrologic cycle (and similar multi-disciplinary subjects) by students and researchers alike. Huge volumes of pertinent data already are being generated, so Unidata—consistent with its mission—will focus on how these data are acquired and used. As elaborated in the following two sections, the discovery and use of pertinent data will be simplified, and new capabilities for remote collaboration will reduce the social and geographic barriers that tend to Balkanize researchers, educators, and learners at all levels.
Many of the NSF and NAS reports cited earlier have documented infrastructure needs, including comprehensive data collection and management systems. Because modern environmental studies rely on large numbers of diverse observational platforms and sophisticated models, they require reliable and timely access to data from all such sources. These studies also require special tools to aid in the understanding of basic processes and interactions among the various components of the earth system, i.e., to aid in using the data. In a sense, the data discovery process has become another dimension of the scientific method, complementing theory, experimentation, and simulation as the tools of the trade for researchers, instructors, and students. Future success for both students and researchers thus will depend in part on how well they are served by those software tools that pertain to data discovery and use.
The primitive state of access to most data (from operational data flows to experimental field projects and long-term archives) is similar in many ways to the state of document access before the advent of the Web, the world's largest and most successful distributed system. In pre-Web days, access to documents involved locating the documents, transferring the needed files, converting from one document format to another, and transforming document excerpts into common forms suitable for reading or merging into papers, proposals, presentations, or reports. Accessing most of the data needed for education and research involves finding out where the data are stored, determining what specific files are needed, discovering which of many diverse formats are used, transferring the data to a local site, understanding the meaning of the acquired data, converting the data formats into whatever is required by local visualization and analysis applications, and often re-gridding the data for combination with other data.
Emerging technologies, with which Unidata already is involved, are changing the data-access paradigm. In the not-too-distant future, access to data for analysis and visualization could be almost as simple as document access through a browser interface. Just as our networks and Web browsers now interoperate to support location transparency, so our programs will be able to access, analyze, visualize, and integrate data from either local or remote sources. In the same manner that HTML has become the lingua franca of the web that enables anyone to publish documents, a common data architecture can provide the means by which our important datasets—and the metadata needed use them—can be easily published for access by local and remote applications, catalogued by search-engine services, and found by Web browsers and other applications. Furthermore, the tools Unidata plans to support will give their users unprecedented power for visualizing complex phenomena in three dimensions, with animation, particle tracing, and other sophisticated graphics.
Web browsers now support accessing text and images using multiple protocols (FTP, HTTP, NNTP, SMTP, and others) and formats (HTML, PDF, text, GIF, RealAudio, for example). Similarly, a remote data access infrastructure can be flexible enough to support multiple data access protocols (FTP, DODS, LAS, SQL, ADDE, and others) and data formats (netCDF, HDF, GRIB, GIS, Excel, among others). Just as legacy documents can be made available without converting them to new formats by dynamic generation of HTML, legacy data can be made available to applications by servers that convert data slices on-the-fly and dynamically generate derived data in forms requested by clients. Finally, given that it is practical today to find documents using search engines without a complete solution to the metadata discovery problem, useful search and data discovery may be practical without first finding a perfect representation for scientific metadata.
The existence of browsers and the Web provided benefits that compounded as their presence became widespread, stimulating ever greater use of HTML documents and HTTP servers. Similarly a "Data Web" constructed from existing component protocols and formats for remote data and metadata access can yield compound benefits as usage increases. In particular it would simplify study of "grand challenge" problems as described in the preceding section. Such an environment will become a reality only when it is as easy to publish data (along with metadata and value-added services) as it is to publish documents on the Web. Unidata might even play a role analogous to CERN in catalyzing such a vision by building the prototype infrastructure that demonstrates the practicality of the vision.
Note: Some of the above material appeared earlier in another document, titled UCAR Data and Information Services (UDIS), that Unidata's Russ Rew helped develop as part of the recent NCAR Strategic Plan.
Direct community involvement has always been a primary feature of the program's success. Three exemplars may be cited:
In the Unidata of the future, the role of "community" will become even more prominent, as new tools for collaboration mature and are disseminated. As the Internet continues to transform the higher education environment, Unidata participants will gain new capabilities for student-student and student-researcher collaborations as a means for enhanced learning. Software designed specifically for collaborative data analysis, combined with more generic multi-media capabilities for remote conferencing, will increase the amount and effectiveness of scientific collaboration, including many more conversations that cross traditional discipline boundaries. Advanced techniques for information retrieval will help community members exploit the Unidata support archives, which contain answers to a great many questions about the usage of Unidata software and data flows. Additionally, members of the Unidata community will see enhanced means for learning about and contacting one another, based on common interests.
These capabilities will support a variety of changes that may be anticipated in the geoscience community, in pedagogy, and even in society, as predicted by Charlie Murphy in his role as Chair of the MetApps Task Force:
The growth and increasing sophistication of the Internet forces us to think in terms of a distributed environment for both learning and research. Users and data resources will be widely spread across the Internet in the near future and user tools must be network capable and intelligent. Additionally, some educational activities may take advantage of the distributed nature of computing, particularly collaborative learning. Students will find the most effective ways to maximize the use of personal devices and the network, as educators struggle to integrate and formalize online and collaborative learning for their courses. Students will no longer be restricted to being physically present in a class session. They may be able to take online courses or participate directly in a class session at home with high speed video and audio as well as software and data. Distance learning will be easier to implement and, at the same time, existing modes of distance learning such as interactive TV seem likely to disappear.
Universities and departments will still continue to develop and distribute their own educational materials, however the ease of accessing information over the net will promote a greater sharing of resources. Users will be able to focus more on the pedagogic or research aspects of an activity and much less on networking and other technological considerations. Further,it is quite likely that there will be an enhanced role for commercial products that support databases, search methodologies, developing personal data profiles, and the pre-delivery of data. Also community-based cooperative projects such as DLESE will become a major source of educational materials and information.
During the past five or six years, there has been a great deal of attention given to the changing nature of earth science with terms such as "Earth System Science" emerging. Globalization, an increasing world population, and the continuing challenge of balancing environmental quality with economic growth, have produced a heightened awareness that the problems we face are more complex than originally supposed, and they require interdisciplinary expertise for their solution. This is true in many areas but especially so in the geosciences. Curricular changes will present a need for integrative tools and access to distributed data sources. The nature and extent of cooperation across the disciplines in the earth sciences remains uncertain, but it is clear there will be a need for tools that are flexible and adaptable in meeting the needs of a broader community.
Changes in society will have a significant impact on the future of application software. Certainly changes in the employment market nationally and internationally will be important. Curricula will reflect these shifts and the training of students and hence the applications that they use will have to reflect these changes as well, working with different forms of data, integrative approaches, etc. Also societal values and a desire of the populace to know more about geoscience and environmental issues coupled with access to data via the Internet points to a need for general purpose and easy to use educational tools.
Despite a lack of experience in the implementation of instructional technology and limited examples of its educational efficacy, the use of technology is beginning to mature for pedagogic applications. The educational usage of technology will continue to grow because of younger faculty who are more experienced and comfortable with technology and institutions that recognize technology as a cost effective method instruction. Technology will assume a central role in collaborative learning, in efforts to build better modes of learning, in distance and online learning, and in the competition for students.
Clearly this document envisions a community that is strengthened in many respects. However, the Unidata role is focused on just a few of them, consistent with the program's mission and experience. The relationship between broad community transformation and the Unidata focus may be articulated as follows:
Within a decade, the Unidata community of geoscience researchers, educators, and students will be part of a transformation that includes the elements listed below, with italicized phrases reflecting areas in which Unidata will make significant contributions:
In this transformation, Unidata's catalytic role will fall within a niche that is defined by two basic principles, both of which are firmly grounded in Unidata's record of success but are expected to extend the benefits to a larger fraction of the geosciences community than Unidata presently serves. These principles define the overarching strategic framework for Unidata's future:
Principle 1: Unidata's primary focus is to provide software infrastructure and advocate positions that advance community capabilities for acquiring, organizing, and using geoscience and geographic data and for fruitful discourse on the use of data in Earth-system education and research.
Principle 2: Community members are engaged as owners of and contributors to Unidata, leveraging on the power and scalability of decentralized, distributed computing. In essence, the Unidata Program Center performs no function that can be performed equally well by individual community members.
For the next 5-10 years, Unidata will pursue its mission and vision by focusing on five goals.
These goals are elaborated with lists of objectives in the following subsections.
Three objectives underlie Unidata's goal to foster and support the existence of real-time data flows that encompass a broad range of Earth-system phenomena, can be accessed with ease by all constituents, and are self managing in respect to changing contents and user needs:
Five objectives underlie Unidata's goal to provide tools that facilitate representing, analyzing, and visualizing observed or synthesized data, specializing in large data sets referenced to the geoid:
Six objectives underlie Unidata's goal to foster and support the creation of digital holdings that contain well-described data on the Earth system and that are structured and catalogued for effective remote access:
Three objectives underlie Unidata's goal of employing technology to amplify human capabilities for communication, collaboration, and information discovery among the members of its community:
Unidata intends to engage a highly diverse population of educators and researchers as core constituents, by providing a targeted set of support services and fostering meaningful community participation. Objectives supporting this goal are:
The major strategies for achieving the vision and, more specifically, realizing the above goals and objectives are:
These strategies are elaborated with likely tactics in the following subsections. Asterisks indicate where supplemental (non-NSF or non-ATM) funding has been gained or may be sought. Items in italics are called out either because they are of especially high priority or because they represent significant departures from current Unidata practice.
To achieve the level of detail required to describe tactics, the following sections incorporate a large number of acronyms. The published version of this strategic plan will include a glossary.
Unidata has had a transforming effect on the culture of real-time data access, and a strategic priority is to extend this transformation into the arena of shared, well-structured data collections, i.e., retrospective data access, accompanied by a culture of individual and institutional open data sharing. Underlying technical advances needed for such collections are discussed in Section 5.3, and the tactics for realizing them include:
A relatively rich set of data types and data sources is available to present Unidata users (in near-real time), but it is rather meteorology-oriented. Hence, a key mechanism for reaching the goal of increased disciplinary breadth is to broaden these data resources, capitalizing in particular on enhanced LDM capabilities (see Section 5.3). However, consistent with the notion of incremental and evolutionary growth, our strategy is to work at the margins, gaining access to new types and sources of data that interest current users and new ones. Specific tactics include.
Unidata's reputation is most strongly linked to the high-caliber tools, and the associated user support, that are provided on a no-cost, open-source basis. Even the zero-cost data flows for which Unidata is well known ultimately depend on software, running reliably and in synchrony on the campuses of many users. Remaining among leading-edge software developers is a challenge and a priority, manifest in the following tactics:
Unidata’s success ultimately depends on a satisfied, participating community of users that manifests increasing diversity along disciplinary and other dimensions. The following tactics are likely to be employed, although some will require study and refinement, in part to ensure congruence with the NSF strategy in "People - A diverse, internationally competitive and globally-engaged workforce of scientists, engineers and well-prepared citizens."
Listed below are changes that colleges and universities appear likely to face in the 21st century. We believe the strategies articulated above will position Unidata to be a helpful partner in addressing these changes and a significant player in the NSF’s vision for a national cyberinfrastructure that serves all citizens.
However, all such lists are speculative, so we reiterate one of the Unidata values articulated in Section 2: adaptability to changes in technology, data availability, and user needs. Hence, this plan will be revisited periodically.
[1] The term cyber-infrastructure is drawn from an NSF document, http://www.nsf.gov/dir/index.jsp?org=OCI referencing the advancing technologies on which contemporary research and education increasingly rely.