Re: 4GigB variable size limit

To: John Caron <caron@xxxxxxxxxxxxxxxx>
Subject: Re: 4GigB variable size limit
From: Katie Antypas <kantypas@xxxxxxx>
Date: Fri, 04 May 2007 13:13:43 -0700

Hi John,

We'd like to push for both limits to be lifted. We definitely needvariable sizes greater than 4 GigB and we believe applications capableof using 231 for a dimension are not far off. If I'm doing mycalculations correctly a 231 dimension size would limit a doubleprecision one dimensional variable to roughly 17 GigB. (Please note I'mrepresenting the "Towards Petascale Computing" group.) Without aparallel IO library which can handle these huge datasets, the HPCcommunity will just return to plain old Fortran IO because it's fast,but will lose portability or go to the case where a user must write outthousands of broken up netcdf files or variables. Neither of which is agood solution. To us, lifting the 4 GigB variable size limit, butkeeping the 231 dimension size limit is just a temporary hold over. Wethink it is in the best interest of the netCDFand pnetCDF community toprepare for the coming increased application data and IO needs.


Katie



John Caron wrote:

Hi Katie:
It sounds to me like you're talking about the 4G total size limit on avariable. Allowing that limit to be 2^64 seems reasonable.Allowing individual dimension lengths to be greater than 2^31 is abigger deal, since array indexes are limited to 32 bit signed ints (atleast in Java). Im not sure if you are requesting that. It sounds likeunstructured meshes might push that limit someday, but do you haveanother use case for that?
Katie Antypas wrote:
Hi Everyone,
I'm jumping into the discussion late here, but coming from aperspective of trying to find and develop an IO strategy which willwork at the petascale level, the 4 GigB variable size limitation is amajor barrier. Already a 1000^3 grid variable can not fit into asingle netcdf variable. Users at NERSC and other supercomputingcenters regularly run problems of this size or greater and IO demandsare only going to get bigger. We don't believe chopping up datastructures into pieces is a good long term solution or strategy.There isn't a natural way to break up the data and chunkingeliminates the elegance, ease and purpose of a parallel IO library.Besides the direct code changes, analytics and visualization toolsbecome more complicated as datafiles from the same simulation but ofdifferent sizes would not have the same number variables. Restartinga simulation from a checkpoint file on a different number ofprocessors would also become more convoluted.
The view from NERSC is that if Parallel-NetCDF is to be viable optionfor users running large parallel simulations, this is a limitationthat must be lifted...
Katie Antypas
NERSC User Services Group
Lawrence Berkeley National Lab
===============================================================================
To unsubscribe netcdfgroup, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
===============================================================================


==============================================================================
To unsubscribe netcdfgroup, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================

References:
- 4GigB variable size limit
  - From: Katie Antypas
- Re: 4GigB variable size limit
  - From: John Caron

2007 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: