On May 1, 2007, at 10:25 PM, ncdigest wrote:
Date: Tue, 01 May 2007 17:12:45 -0700 From: Katie Antypas <kantypas@xxxxxxx> Subject: 4GigB variable size limit Hi Everyone,I'm jumping into the discussion late here, but coming from a perspectiveof trying to find and develop an IO strategy which will work at the petascale level, the 4 GigB variable size limitation is a major barrier. Already a 1000^3 grid variable can not fit into a single netcdf variable. Users at NERSC and other supercomputing centers regularly run problems of this size or greater and IO demands are onlygoing to get bigger. We don't believe chopping up data structures into pieces is a good long term solution or strategy. There isn't a natural way to break up the data and chunking eliminates the elegance, ease andpurpose of a parallel IO library. Besides the direct code changes, analytics and visualization tools become more complicated as datafilesfrom the same simulation but of different sizes would not have the samenumber variables. Restarting a simulation from a checkpoint file on a different number of processors would also become more convoluted. The view from NERSC is that if Parallel-NetCDF is to be viable optionfor users running large parallel simulations, this is a limitation thatmust be lifted... Katie Antypas NERSC User Services Group Lawrence Berkeley National Lab
I'm certain that the netCDF team will speak up, but I think one of their goals in moving to use HDF5 as the underlying format for the netCDF-4 release is to benefit from the parallel I/O features in HDF5 as well as the (essentially) unlimited dataset sizes that HDF5 provides. Have you considered using the netCDF-4 release instead of Parallel-netCDF?
Quincey Koziol The HDF Group
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
netcdfgroup
archives: