Howdy Mark!
As you suggest, adding control over alignment would be great. Can you
submit a PR with changes to the library to support it?
Ed Hartnett
On Sat, Jan 8, 2022 at 5:30 AM Mark Harfouche <mark.harfouche@xxxxxxxxx>
wrote:
> Hello,
>
> My name is Mark Harfouche, I'm a researcher and engineer focusing on
> productizing new computational optics tools for biology.
>
> I had a question about data-alignment within netcdf4 files.
>
> Is it possible to specify the alignment boundary for each chunk of data?
> Lets say we had an array of bytes, but we only wanted that array to be
> aligned to boundaries of 128, 512, or even 4096 bytes.
> Is this possible in netcdf4?
> It seemed like this might be possible through calls to nc__enddef,
> https://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga5fe4a3fcd6db18d0583ac47f04f7ac60
> but I tried to adjust those and they didn't seem to have the desired effect.
>
> There seemed to be a post from 2014 discussing this, but I can't find the
> referenced issue
> https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg12328.html
>
> From my research, it seems like it should be possible to do through
> H5Pset/get_alignment
> https://support.hdfgroup.org/HDF5/doc/UG/FmSource/08_TheFile_favicon_test.html
>
> I typically use netcdf4 through xarray, which in turn uses the
> netcdf4-python backend.
>
> The python code below illustrates a typical problem we face where data
> becomes align to an offset of 6 bytes, not very ideal in many circumstances
> were performance is desired.
>
> Thank you very much for your help,
>
> Best,
>
> Mark
>
> ```
> import xarray as xr
> import numpy as np
> import netCDF4
> from pathlib import Path
>
> basic_filename = "basic_file_netcdf4.nc"
> if Path(basic_filename).exists():
> Path(basic_filename).unlink()
>
> dataset = xr.DataArray(
> np.zeros((3072, 3072), dtype='uint8'),
> dims=("y", "x"),
> coords={
> "y": np.arange(3072, dtype=int),
> "x": np.arange(3072, dtype=int),
> },
> name='images').to_dataset()
>
> dataset.to_netcdf(basic_filename, format="NETCDF4", engine="netcdf4")
>
> import h5py
> h5file = h5py.File(basic_filename)
> h5dataset = h5file.get("images")
> offset = h5dataset.id.get_offset()
> print(offset % 4096)
> print(offset % 2048)
> print(offset % 1024)
> print(offset % 512)
> print(offset % 128)
> print(offset % 64)
>
> """
> 3206
> 1158
> 134
> 134
> 6
> 6
> """
> ```
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> https://www.unidata.ucar.edu/mailing_lists/
>