Re: [netcdfgroup] nf90_char size

To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: Re: [netcdfgroup] nf90_char size
From: Dave Allured - NOAA Affiliate <dave.allured@xxxxxxxx>
Date: Sat, 2 May 2020 11:52:30 -0600

There are actually three large phantom data sets taking up extra space in
this file:

dataset name                    start offset    size            end offset
BS_K_linearized1                2,379           8,100,000,000
 8,100,002,379
BS_K_linearized2                12,150,006,475  3,127,549,440
 15,277,555,915
BS_K_compressed1                16,059,447,379  99,107,168
16,158,554,547

These phantom data sets are actually HDF5 dimension scales.  They are
32-bit floats by default.  This is part of the mechanism to support named
and shared dimensions in netcdf-4.  Dimension scales are actually what is
commonly known as netcdf coordinate variables.  When there is no actual
user-defined coordinate variable, then the dimension scale must still exist
in the file, but it is hidden from normal view in netcdf tools like ncdump.

You will not be able to avoid dimension scales by moving to a newer netcdf
library version.  If you want large 1-D arrays stored efficiently, your
choices are 64-bit offset format, CDF5, or perhaps HDF5 by direct access.
This will be the case for all data types, not just char.

On Sat, May 2, 2020 at 11:22 AM Aleksandar Jelenak <ajelenak@xxxxxxxxxxxx>
wrote:

> Hi Davide,
>
> > On May 2, 2020, at 1:06 PM, Wei-Keng Liao <wkliao@xxxxxxxxxxxxxxxx>
> wrote:
> >
> > The dump information shows there are actually 8 datasets in the file.
> > Below is the start offsets, sizes, and end offsets of individual
> datasets .
> > There is not much padding space in between the datasets.
> > According to this, your file is expected to be of size 16 GB.
> >
> > dataset name                  start offset    size            end offset
> > BS_K_linearized1              2,379           8,100,000,000
>  8,100,002,379
> > BS_K_linearized2              12,150,006,475  3,127,549,440
>  15,277,555,915
>
> Thanks Wei-kang for preparing this useful information.
>
> These two are netCDF dimensions which in HDF5 (netCDF-4) files are stored
> as HDF5 datasets. The above information indicate these HDF5 datasets are
> taking up file space although they should not have any actual data. The
> netCDF library only needs a specific value in the NAME attribute of these
> datasets. I, too, suggest creating your files with the latest version of
> the netCDF library as it may improve what kind of create dataset
> information it passes to the HDF5 library.
>
>         -Aleksandar
>

References:
- [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Dave Allured - NOAA Affiliate
- Re: [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Wei-Keng Liao
- Re: [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Wei-Keng Liao
- Re: [netcdfgroup] nf90_char size
  - From: Aleksandar Jelenak

2020 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: