Re: [netcdfgroup] nf90_char size

To: Dave Allured - NOAA Affiliate <dave.allured@xxxxxxxx>
Subject: Re: [netcdfgroup] nf90_char size
From: Davide Sangalli <davide.sangalli@xxxxxx>
Date: Fri, 1 May 2020 23:26:00 +0000 (UTC)

Output of ncdump -hs
D.
ncdump -hs BSK_2-5B_X59RL-50B_SP_bse-io/ndb.BS_COMPRESS0.005000_Q1
netcdf ndb.BS_COMPRESS0 {
dimensions:
        BS_K_linearized1 = 2025000000 ;
        BS_K_linearized2 = 781887360 ;
        complex = 2 ;
        BS_K_compressed1 = 24776792 ;
variables:
        char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
                BSE_RESONANT_COMPRESSED1_DONE:_Storage = "contiguous" ;
        char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
                BSE_RESONANT_COMPRESSED2_DONE:_Storage = "contiguous" ;
        char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
                BSE_RESONANT_COMPRESSED3_DONE:_Storage = "contiguous" ;
        float BSE_RESONANT_COMPRESSED1(BS_K_compressed1, complex) ;
                BSE_RESONANT_COMPRESSED1:_Storage = "contiguous" ;
                BSE_RESONANT_COMPRESSED1:_Endianness = "little" ;
// global attributes:
                :_NCProperties = 
"version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ;
                :_SuperblockVersion = 0 ;
                :_IsNetcdf4 = 1 ;
                :_Format = "netCDF-4" ;




On Sat, May 2, 2020 at 12:24 AM +0200, "Dave Allured - NOAA Affiliate" 
<dave.allured@xxxxxxxx> wrote:










I agree that you should expect the file size to be about 1 byte per stored 
character.  IMO the most likely explanation is that you have a netcdf-4 file 
with inappropriately small chunk size.  Another possibility is a 64-bit offset 
file with crazy huge padding between file sections.  This is very unlikely, but 
I do not know what is inside your writer code.
Diagnose, please.  Ncdump -hs.  If it is 64-bit offset, I think ncvalidator can 
display the hidden pad sizes.

On Fri, May 1, 2020 at 3:37 PM Davide Sangalli <davide.sangalli@xxxxxx> wrote:

  

    
  
  
    Dear all,

    I'm a developer of a fortran code which uses netcdf for I/O

    

    In one of my runs I created a file with some huge array of
    characters.

    The header of the file is the following:

    netcdf ndb.BS_COMPRESS0 {

      dimensions:

          BS_K_linearized1 = 2025000000 ;

          BS_K_linearized2 = 781887360 ;

      variables:

          char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1)
        ;

          char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1)
        ;

          char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2)
        ;

      }


    The variable is declared as nf90_char which, according to the
    documentation should be 1 byte per element.

    Thus I would expect the total size of the file to be 1
    byte*(2*2025000000+781887360) ~ 4.5 GB

    Instead the file size is 16059445323 bytes ~ 14.96 GB, i.e. 10.46 GB
    more and a factor 3.33 bigger

    

    This happens consistently if I consider the file

    netcdf ndb {

      dimensions:

          complex = 2 ;

          BS_K_linearized1 = 2025000000 ;

          BS_K_linearized2 = 781887360 ;

      variables:

          float BSE_RESONANT_LINEARIZED1(BS_K_linearized1,
        complex) ;

          char BSE_RESONANT_LINEARIZED1_DONE(BS_K_linearized1)
        ;

          float BSE_RESONANT_LINEARIZED2(BS_K_linearized1,
        complex) ;

          char BSE_RESONANT_LINEARIZED2_DONE(BS_K_linearized1)
        ;

          float BSE_RESONANT_LINEARIZED3(BS_K_linearized2,
        complex) ;

          char BSE_RESONANT_LINEARIZED3_DONE(BS_K_linearized2)
        ;

      }

      The float component should weight ~36 GB while the
    char component should be identical to before, i.e. 4.5 GB for a
    total of 40.5 GB

    The file is instead ~ 50.96 GB, i.e. again a factor 10.46 GB bigger
    than expected.

    

    Why ?

    
    

    My character variables are something like

    "tnnnntnnnntnnnnnnnntnnnnnttnnnnnnnnnnnnnnnnt..."

    but the file size is already like that just after the file creation,
    i.e. before filling it.

    

    Few info about the library, compiled linking to HDF5 (hdf5-1.8.18),
    with parallel IO support:

    Name: netcdf

      Description: NetCDF Client Library for C

      URL: http://www.unidata.ucar.edu/netcdf

      Version: 4.4.1.1

      Libs: -L${libdir}  -lnetcdf -ldl -lm
/nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5hl_fortran.a
/nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_fortran.a
/nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_hl.a
/nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5.a
        -lz -lm -ldl -lcurl

      Cflags: -I${includedir}

      

    Name: netcdf-fortran

      Description: NetCDF Client Library for Fortran

      URL: http://www.unidata.ucar.edu/netcdf

      Version: 4.4.4

      Requires.private: netcdf > 4.1.1

      Libs: -L${libdir} -lnetcdff

      Libs.private: -L${libdir} -lnetcdff -lnetcdf

      Cflags: -I${includedir}


    Best,

    D.

    -- 

      Davide Sangalli, PhD

      CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit)
      and MaX Centre

      Area della Ricerca di Roma 1, 00016 Monterotondo Scalo, Italy

      http://www.ism.cnr.it/en/davide-sangalli-cv/

      http://www.max-centre.eu/

Follow-Ups:
- Re: [netcdfgroup] nf90_char size
  - From: Davide Sangalli

References:
- [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Dave Allured - NOAA Affiliate

2020 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: