On Fri, 15 May 2020 at 05:13 Davide Sangalli <davide.sengalli@xxxxxx> wrote:
> also moving to the last version of the libraries the problem remains.
>
> pkgname_netcdf=netcdf-c-4.7.4
> pkgname_netcdff=netcdf-fortran-4.5.2
> pkgname_pnetcdf=pnetcdf-1.12.1
> pkgname_hdf5=hdf5-1.12.0
>
> Moreover I noticed differences between running in serial and running in
> parallel.
> (I interrupted the two runs, so it maybe that the I/O was not over)
> Below BS_K_linearized should just be a number (a dimension with netcdf)
>
> SERIAL:
> DATASET "BS_K_linearized1" {
> DATATYPE H5T_IEEE_F32BE
> DATASPACE SIMPLE { ( 2025000000 ) / ( 2025000000 ) }
> STORAGE_LAYOUT {
> CONTIGUOUS
> SIZE 0
> OFFSET 18446744073709551615
> }
> FILTERS {
> NONE
> }
> FILLVALUE {
> FILL_TIME H5D_FILL_TIME_IFSET
> VALUE H5D_FILL_VALUE_DEFAULT
> }
> ALLOCATION_TIME {
> H5D_ALLOC_TIME_LATE
> }
>
> PARALLEL:
> DATASET "BS_K_linearized1" {
> DATATYPE H5T_IEEE_F32BE
> DATASPACE SIMPLE { ( 2025000000 ) / ( 2025000000 ) }
> STORAGE_LAYOUT {
> CONTIGUOUS
> SIZE 8100000000
> OFFSET 2387
> }
> FILTERS {
> NONE
> }
> FILLVALUE {
> FILL_TIME H5D_FILL_TIME_IFSET
> VALUE H5D_FILL_VALUE_DEFAULT
> }
> ALLOCATION_TIME {
> H5D_ALLOC_TIME_EARLY
> }
Dave Allured mentioned in an earlier message that hidden dimension scales
are still stored as arrays, but the arrays are left unpopulated. That's
consistent with your (empty) dataset in the serial case. Looks like the
dimension scales are still being written as arrays in the parallel case.
(Side note: OFFSET 18446744073709551615 is ULLONG_MAX (2^64 - 1), I don't
know why h5dump prints that.)
Honestly I think you should just be using HDF5 directly. netCDF-4 is an
abstraction layer but it's not comprehensive. The main advantage of netCDF
is the structured metadata conventions and OPeNDAP which aren't relevant
if you're just dumping arrays to a file. Additional compression filters
are supported like Blosc, which you can use transparently as an
alternative to designing your own sparse array storage representation.
The HDF5 API itself is not as friendly as netCDF, but it will probably
save you time compared to debugging the abstraction layer, and there is a
Fortran interface. There are many other nice file formats that would be
suitable for your application but you would most likely have to write your
own Fortran wrapper with `iso_c_binding`.
- John Buonagurio