The Future of NetCDF
Russ Rew
NetCDF Annual Update
2012-10-26
Overview
Short- and long-term plans for netCDF and other data access
infrastructure development
Tentative plans for netCDF-4.3 and beyond
Speculations about the future of scientific data access ...
Goals for Unidata data access infrastructure
The next 5 years will be challenging for Unidata's data access infrastructure efforts.
Our efforts will be focused on
incremental innovations to:
- Manage a graceful transition from a simple data model (netCDF-3) to the enhanced Common Data Model of netCDF-4
- Provide better support for remote access and server-side data analysis
- Respond to the need to faithfully represent observational data as well as gridded data
- Scale up to handle larger volumes of data efficiently
- Serve a larger user community wishing to integrate
satellite products, geospatial data, observations, and model outputs from growing archives
Near-term plans for netCDF
We are constrained by backward compatibility commitments:
-
Don't break archives:
new versions must be able to access existing netCDF data
-
Don't break programs:
new libraries must support previous APIs
Plans for the next year are fairly
fluid. Follow changing plans on our projects site.
Tentative plans:
C-4.3 plans:
- CMake support for Windows VS
- bug and documentation fixes
Fortran-4.3 plans:
- addition of a few missing functions
- Fortran-2003 C-interoperability support ?
- CMake support for Windows VS ?
- bug and documentation fixes
Longer Term Plans
- Finish documentation conversion to Doxygen
- "Lazy open" for data from many large files
- Improve compression to GRIB2 levels
- Client support for DAP4 protocol
- Automatic packing/unpacking in library
- Support array slice query notation
- Big test data collection for tool developers
- Support high-level chunking policies
- Provide guidance on chunking & compression
- Refactor into more & smaller utilities
- Support asynchronous I/O for remote access
Even Longer Term Plans
Some of these may just be crazy talk ...
- Support data access by coordinates instead of indexes
- Make more netCDF-Java advanced functionality available from C
- Implement standard requests for server-side analysis
- Keep up with HDF5 advances for high-performance computing
- Develop and implement intelligent chunking & compression
- Space Filling Curves!
- Make library updates easy for users
Speculations
-
I/O bottlenecks for high-performance computing will worsen
-
Use of massively parallel shared-nothing file systems will grow
-
Data will be generated too fast to store, filtering will
become a priority
-
Multi-resolution wavelet representations will get more popular
-
Non-volatile memory technologies will replace most
spinning disks and change programming
-
Lack of organizational support will lead to losses of Valuable Data
-
Format-independent conventions will continue to evolve too slowly
We appreciate feedback on netCDF plans!
- Other speculations?
- Questions?
- Feedback?