For an easy workaround, you might try writing the original file in 64-bit
offset format, or CDF5 with a newer version of the netcdf library. This
would bypass any mysterious netcdf-4 behavior. There is nothing in your
current data scheme that needs netcdf-4 format.
On Fri, May 1, 2020 at 6:30 PM Dave Allured - NOAA Affiliate <
dave.allured@xxxxxxxx> wrote:
> Everything looks good in ncdump -hs. The ncvalidator error is expected
> because the format is not in the netcdf-3 family.
>
> I am puzzled. This looks like the hdf5 layer lost a whole lot of file
> space, but I don't see how. One straightforward thing to try is upgrading
> to more recent versions of the netcdf and HDF5 libraries.
>
> If that doesn't help, then to get more information, try replicating the
> file with nccopy, h5copy, or h5repack.
>
> https://portal.hdfgroup.org/display/HDF5/HDF5+Command-line+Tools
>
> Use contiguous or chunked, but for testing purposes, do not enable any
> compression. The idea is that the writers in those tools should be
> correctly optimized to rewrite those large char arrays without wasted
> space, in case your own writer did something strange.
>
> I suppose there could be a storage bug in the hdf5 or netcdf support
> libraries. Your char arrays are uncommonly large, so they might have
> triggered some sort of edge case.
>
> I am refraining from suggesting low level debugging because I do not want
> to inflict pain. Otherwise, see if other readers have some ideas, or post
> the question to the HDF5 users forum.
>
>
> On Fri, May 1, 2020 at 5:40 PM Davide Sangalli <davide.sangalli@xxxxxx>
> wrote:
>
>> I also add
>>
>> ncvalidator ndb.BS_COMPRESS0.005000_Q1
>> Error: Unknow file signature
>> Expecting "CDF1", "CDF2", or "CDF5", but got "�HDF"
>> File "ndb.BS_COMPRESS0.005000_Q1" fails to conform with CDF file format
>> specifications
>>
>> Best,
>> D.
>>
>> On 02/05/20 01:26, Davide Sangalli wrote:
>>
>> Output of ncdump -hs
>>
>> D.
>>
>> ncdump -hs BSK_2-5B_X59RL-50B_SP_bse-io/ndb.BS_COMPRESS0.005000_Q1
>>
>> netcdf ndb.BS_COMPRESS0 {
>> dimensions:
>> BS_K_linearized1 = 2025000000 ;
>> BS_K_linearized2 = 781887360 ;
>> complex = 2 ;
>> BS_K_compressed1 = 24776792 ;
>> variables:
>> char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
>> BSE_RESONANT_COMPRESSED1_DONE:_Storage = "contiguous" ;
>> char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
>> BSE_RESONANT_COMPRESSED2_DONE:_Storage = "contiguous" ;
>> char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
>> BSE_RESONANT_COMPRESSED3_DONE:_Storage = "contiguous" ;
>> float BSE_RESONANT_COMPRESSED1(BS_K_compressed1, complex) ;
>> BSE_RESONANT_COMPRESSED1:_Storage = "contiguous" ;
>> BSE_RESONANT_COMPRESSED1:_Endianness = "little" ;
>> // global attributes:
>> :_NCProperties =
>> "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ;
>> :_SuperblockVersion = 0 ;
>> :_IsNetcdf4 = 1 ;
>> :_Format = "netCDF-4" ;
>>
>>
>>
>> On Sat, May 2, 2020 at 12:24 AM +0200, "Dave Allured - NOAA Affiliate" <
>> dave.allured@xxxxxxxx> wrote:
>>
>> I agree that you should expect the file size to be about 1 byte per
>>> stored character. IMO the most likely explanation is that you have a
>>> netcdf-4 file with inappropriately small chunk size. Another possibility
>>> is a 64-bit offset file with crazy huge padding between file sections.
>>> This is very unlikely, but I do not know what is inside your writer code.
>>>
>>> Diagnose, please. Ncdump -hs. If it is 64-bit offset, I think
>>> ncvalidator can display the hidden pad sizes.
>>>
>>>
>>> On Fri, May 1, 2020 at 3:37 PM Davide Sangalli <davide.sangalli@xxxxxx>
>>> wrote:
>>>
>>>> Dear all,
>>>> I'm a developer of a fortran code which uses netcdf for I/O
>>>>
>>>> In one of my runs I created a file with some huge array of characters.
>>>> The header of the file is the following:
>>>> netcdf ndb.BS_COMPRESS0 {
>>>> dimensions:
>>>> BS_K_linearized1 = 2025000000 ;
>>>> BS_K_linearized2 = 781887360 ;
>>>> variables:
>>>> char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
>>>> char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
>>>> char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
>>>> }
>>>>
>>>> The variable is declared as nf90_char which, according to the
>>>> documentation should be 1 byte per element.
>>>> Thus I would expect the total size of the file to be 1
>>>> byte*(2*2025000000+781887360) ~ 4.5 GB
>>>> Instead the file size is 16059445323 bytes ~ 14.96 GB, i.e. 10.46 GB
>>>> more and a factor 3.33 bigger
>>>>
>>>> This happens consistently if I consider the file
>>>> netcdf ndb {
>>>> dimensions:
>>>> complex = 2 ;
>>>> BS_K_linearized1 = 2025000000 ;
>>>> BS_K_linearized2 = 781887360 ;
>>>> variables:
>>>> float BSE_RESONANT_LINEARIZED1(BS_K_linearized1, complex) ;
>>>> char BSE_RESONANT_LINEARIZED1_DONE(BS_K_linearized1) ;
>>>> float BSE_RESONANT_LINEARIZED2(BS_K_linearized1, complex) ;
>>>> char BSE_RESONANT_LINEARIZED2_DONE(BS_K_linearized1) ;
>>>> float BSE_RESONANT_LINEARIZED3(BS_K_linearized2, complex) ;
>>>> char BSE_RESONANT_LINEARIZED3_DONE(BS_K_linearized2) ;
>>>> }
>>>> The float component should weight ~36 GB while the char component
>>>> should be identical to before, i.e. 4.5 GB for a total of 40.5 GB
>>>> The file is instead ~ 50.96 GB, i.e. again a factor 10.46 GB bigger
>>>> than expected.
>>>>
>>>> *Why ?*
>>>>
>>>> My character variables are something like
>>>> "tnnnntnnnntnnnnnnnntnnnnnttnnnnnnnnnnnnnnnnt..."
>>>> but the file size is already like that just after the file creation,
>>>> i.e. before filling it.
>>>>
>>>> Few info about the library, compiled linking to HDF5 (hdf5-1.8.18),
>>>> with parallel IO support:
>>>> Name: netcdf
>>>> Description: NetCDF Client Library for C
>>>> URL: http://www.unidata.ucar.edu/netcdf
>>>> Version: 4.4.1.1
>>>> Libs: -L${libdir} -lnetcdf -ldl -lm
>>>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5hl_fortran.a
>>>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_fortran.a
>>>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_hl.a
>>>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5.a
>>>> -lz -lm -ldl -lcurl
>>>> Cflags: -I${includedir}
>>>>
>>>> Name: netcdf-fortran
>>>> Description: NetCDF Client Library for Fortran
>>>> URL: http://www.unidata.ucar.edu/netcdf
>>>> Version: 4.4.4
>>>> Requires.private: netcdf > 4.1.1
>>>> Libs: -L${libdir} -lnetcdff
>>>> Libs.private: -L${libdir} -lnetcdff -lnetcdf
>>>> Cflags: -I${includedir}
>>>>
>>>> Best,
>>>> D.
>>>> --
>>>> Davide Sangalli, PhD
>>>> CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX
>>>> Centre
>>>> Area della Ricerca di Roma 1, 00016 Monterotondo Scalo, Italy
>>>> http://www.ism.cnr.it/en/davide-sangalli-cv/
>>>> http://www.max-centre.eu/
>>>>
>>>