[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #ZCE-849683]: clarification on large file support



Hi Fabio,

Sorry to have taken so long to respond to your question.
> I am Fabio Milani working in IDS (Ingegneria dei Sistemi), an italian
> company involved, among others, in electromagnetic simulations.
> 
> We currently use the netcfd library for managing our simulators file
> and were interested in large file support. I read the FAQ section on
> your website and I need to clearify the following aspects, thanks to
> your precious support.
> 
> Is the following correctly understood?
> 
> 1. On 32 bit platforms the new netcdf IS ABLE to write file larger
> than 2GB (up to 2^64), BUT each variable contained in the file CANNOT
> exceed 2GB

That's not quite correct.

It's true that a program on 32-bit platforms linked to netCDF 3.6.x or
netCDF 4.x is able to write files larger than 2GB, assuming the
platform and file system are configured for Large File Support (this
is almost always the case), and that the program is compiled such that
the "off_t" type is 64 bits, such as a "long long" type.  The
configure script used to build netCDF correctly sets compile flags to
support a 64-bit off_t type, if possible.  You can check the output
from running the "configure" script to make sure it has the line

  checking size of off_t... 8

indicating a 64-bit (8-byte) off_t type.

Each variable in the file cannot exceed 4GB (not 2GB), in netCDF
versions after 3.6.1, including the current netCDF 4.1.1.  The actual
maximum size of a variable on a 32-bit platform is (2^32 - 4) bytes.
Part of the confusion is a documentation error here:

  
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html#Classic-Limitations

which I just discovered hasn't been updated since the size limit on a
single variable was changed from 2GB (2^31 - 4) to 4GB (2^32 - 4) in
versions since netCDF 3.6.1.  It should say

  If you don't use the unlimited dimension, only one variable can
  exceed 4 GiB in size, but it can be as large as the underlying file
  system permits. It must be the last variable in the dataset, and the
  offset to the beginning of this variable must be less than about 2
  GiB. 

  The limit is really 2^32 - 4. If you were to specify a variable size
  of 2^32 -3, for example, it would be rounded up to the nearest
  multiple of 4 bytes, which would be 2^32, which is larger than the
  largest unsigned 32-bit integer.  

It's also true that even on 32-bit platforms, one variable in the file
(the last) can exceed 4 GB in size, as explained in the FAQ, as long
as the system supports a 64-bit off_t type.

> 2. On 64 bit platforms the new netcdf is able to write file larger than
> 2GB (up to 2^64), AND each variable contained in the file CAN exceed
> 2GB

Yes, that's true.  But note that most 32-bit platforms support a
64-bit off_t type for file offsets, so the 64-bit offset variant of
the netCDF format is fully supported for reading or writing data on a
32-bit platform (except that you can only access at most 2GB at once,
due to the 32-bit size_t type on 32-bit platforms).

Note that with the netCDF-4 HDF5-based format, variables can also be
larger than 4GB.

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: ZCE-849683
Department: Support netCDF
Priority: Normal
Status: Closed