Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
There are actually three large phantom data sets taking up extra space in this file: dataset name start offset size end offset BS_K_linearized1 2,379 8,100,000,000 8,100,002,379 BS_K_linearized2 12,150,006,475 3,127,549,440 15,277,555,915 BS_K_compressed1 16,059,447,379 99,107,168 16,158,554,547 These phantom data sets are actually HDF5 dimension scales. They are 32-bit floats by default. This is part of the mechanism to support named and shared dimensions in netcdf-4. Dimension scales are actually what is commonly known as netcdf coordinate variables. When there is no actual user-defined coordinate variable, then the dimension scale must still exist in the file, but it is hidden from normal view in netcdf tools like ncdump. You will not be able to avoid dimension scales by moving to a newer netcdf library version. If you want large 1-D arrays stored efficiently, your choices are 64-bit offset format, CDF5, or perhaps HDF5 by direct access. This will be the case for all data types, not just char. On Sat, May 2, 2020 at 11:22 AM Aleksandar Jelenak <ajelenak@xxxxxxxxxxxx> wrote: > Hi Davide, > > > On May 2, 2020, at 1:06 PM, Wei-Keng Liao <wkliao@xxxxxxxxxxxxxxxx> > wrote: > > > > The dump information shows there are actually 8 datasets in the file. > > Below is the start offsets, sizes, and end offsets of individual > datasets . > > There is not much padding space in between the datasets. > > According to this, your file is expected to be of size 16 GB. > > > > dataset name start offset size end offset > > BS_K_linearized1 2,379 8,100,000,000 > 8,100,002,379 > > BS_K_linearized2 12,150,006,475 3,127,549,440 > 15,277,555,915 > > Thanks Wei-kang for preparing this useful information. > > These two are netCDF dimensions which in HDF5 (netCDF-4) files are stored > as HDF5 datasets. The above information indicate these HDF5 datasets are > taking up file space although they should not have any actual data. The > netCDF library only needs a specific value in the NAME attribute of these > datasets. I, too, suggest creating your files with the latest version of > the netCDF library as it may improve what kind of create dataset > information it passes to the HDF5 library. > > -Aleksandar >
netcdfgroup
archives: