How you choose to manage your project data will depend on a variety of factors, including the size and type of project, volume of data to be collected, how you wish to make data available during and after the project, and the resources you have available for data management and storage. While we cannot anticipate your individual situation, there are common themes that apply to many types of project; these can be distilled into a set of “best practices” for data management in general.
DataONE maintains a useful and comprehensive Best Practices database, which we enourage you to peruse. In this section, we will cover some of the most important aspects, focusing on issues and tools of particular relevance to the Unidata community.
Thinking about how your project data will be gathered, stored, used, and shared can provide valuable insights that motivate the choices you make when managing your data. The process of collecting and using scientific data can be thought of as a lifecycle with some or all of the following stages:
Describing the data (and metadata) to be collected, how it will be stored, managed, and made accessible to project collaborators and others. Many funding agencies will require you to provide a synopsis of your plan along with a grant application.
Observing the phenomena to be studied, and storing the data in digital form.
Inspecting the collected data to ensure that it is of high quality and truly represents the quantities it claims to describe.
Providing a thorough description of the data (quantities measured and their units, collection methods, etc.) using the metadata standards appropriate for your scientific community.
Providing access to the digital data to others, whether they be project collaborators, peers, reviewers, or the scientific community at large. Project data may be shared with different groups at different times.
Bringing together data from different sources in a way that allows you to treat them as a single data set.
Using project data to “do science.” Analysis may generate products (statistical analyses, imagery, etc.) that you will want to Describe, Assure, and Share along with the original observational data.
|Publishing & Preservation
Making your data widely available to the scientific community, and ensuring that it will be available — and easily discoverable — to those who wish to use or re-evaluate it in the future.
Data you collect from sensors or other automated mechanisms may originally be in a proprietary or nonstandard file format. Transforming nonstandard file formats to a widely used and supported scientific data file format has several advantages:
Choose a file format that is widely used in your scientific community. Some communities go so far as to suggest or require use of specific formats; for example, the Coupled Model Intercomparison Project Phase 5 (CMIP5) requires that data follow the standard NetCDF Climate and Forecast (CF) Metadata convention. In the absence of clear community standards, we strongly suggest using a self-documenting file format such as netCDF or HDF, and following the CF metadata conventions.
Including robust metadata with your data allows others to discover, understand, and use the data now and in the future. The previously mentioned Climate and Forecast (CF) Metadata convention is an excellent choice for many projects in the Unidata community.
Following a well-defined metadata convention not only helps you include the necessary metadata with your data, it makes it often opens up additional analysis options, allowing you to use software packages written to take advantage of well-defined metatdata schemes.
Providing access to your project data beyond the results that are published allows other researchers to investigate your results and methods. While current funding agency requirements do not specify how access to digital materials should be provided, we strongly recommend making your data available via a network-accessible data server such as RAMADDA or the TDS. Using a dedicated scientific data server (as opposed to a simple web site or FTP server) not only increases the discoverability of your data, it increases its utility by making it easier to interrogate remotely, subset, or convert to other data formats.
The group FORCE11 has a concise Declaration of Data Citation Principles
TBA. Points to cover: