Re: [netcdfgroup] Hello and a question on data alignment

Thanks. Cross referencing the created issue here for future reference:
https://github.com/Unidata/netcdf-c/issues/2177

On Sat, Jan 8, 2022 at 11:11 AM Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
wrote:

> Mark,
>
> Yes I think that is the correct place to make the change.
>
> Is this change always on? Or does the user turn it on and off?
>
> Yes, please open an issue on netcdf-c and continue the discussion there.
> That is the appropriate place, so that a record can be kept...
>
> Ed
>
> On Sat, Jan 8, 2022 at 8:12 AM Mark Harfouche <mark.harfouche@xxxxxxxxx>
> wrote:
>
>> Hi Ed,
>>
>> Thank you for the confirmation that the feature does not exist yet and
>> the quick reply.
>>
>> I was able to achieve the results using h5netcdf as a demo. I want to
>> keep feature parity between the python netcdf backend and the h5netcdf
>> backend.
>> One needs to change the File Access Property List at opening time for the
>> HDF5 file.
>>
>> Can you confirm that this is the correct location to make a patch to?
>>
>> https://github.com/Unidata/netcdf-c/blob/988e771a9ed99619c2e3261aea81f127dd7fa3d8/libhdf5/hdf5open.c#L772
>>
>> If so, I might be able to make a pull request in the coming months.
>>
>> Would it be appropriate to open a github issue with this info? or is the
>> mailing list the appropriate location for this information?
>>
>> Best,
>>
>> Mark
>>
>>
>> On Sat, Jan 8, 2022 at 8:18 AM Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
>> wrote:
>>
>>> Howdy Mark!
>>>
>>> As you suggest, adding control over alignment would be great. Can you
>>> submit a PR with changes to the library to support it?
>>>
>>> Ed Hartnett
>>>
>>> On Sat, Jan 8, 2022 at 5:30 AM Mark Harfouche <mark.harfouche@xxxxxxxxx>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> My name is Mark Harfouche, I'm a researcher and engineer focusing on
>>>> productizing new computational optics tools for biology.
>>>>
>>>> I had a question about data-alignment within netcdf4 files.
>>>>
>>>> Is it possible to specify the alignment boundary for each chunk of
>>>> data? Lets say we had an array of bytes, but we only wanted that array to
>>>> be aligned to boundaries of 128, 512, or even 4096 bytes.
>>>> Is this possible in netcdf4?
>>>> It seemed like this might be possible through calls to nc__enddef,
>>>> https://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga5fe4a3fcd6db18d0583ac47f04f7ac60
>>>> but I tried to adjust those and they didn't seem to have the desired 
>>>> effect.
>>>>
>>>> There seemed to be a post from 2014 discussing this, but I can't find
>>>> the referenced issue
>>>> https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg12328.html
>>>>
>>>> From my research, it seems like it should be possible to do through
>>>> H5Pset/get_alignment
>>>> https://support.hdfgroup.org/HDF5/doc/UG/FmSource/08_TheFile_favicon_test.html
>>>>
>>>> I typically use netcdf4 through xarray, which in turn uses the
>>>> netcdf4-python backend.
>>>>
>>>> The python code below illustrates a typical problem we face where data
>>>> becomes align to an offset of 6 bytes, not very ideal in many circumstances
>>>> were performance is desired.
>>>>
>>>> Thank you very much for your help,
>>>>
>>>> Best,
>>>>
>>>> Mark
>>>>
>>>> ```
>>>> import xarray as xr
>>>> import numpy as np
>>>> import netCDF4
>>>> from pathlib import Path
>>>>
>>>> basic_filename = "basic_file_netcdf4.nc"
>>>> if Path(basic_filename).exists():
>>>>     Path(basic_filename).unlink()
>>>>
>>>> dataset = xr.DataArray(
>>>>     np.zeros((3072, 3072), dtype='uint8'),
>>>>     dims=("y", "x"),
>>>>     coords={
>>>>         "y": np.arange(3072, dtype=int),
>>>>         "x": np.arange(3072, dtype=int),
>>>>     },
>>>>     name='images').to_dataset()
>>>>
>>>> dataset.to_netcdf(basic_filename, format="NETCDF4", engine="netcdf4")
>>>>
>>>> import h5py
>>>> h5file = h5py.File(basic_filename)
>>>> h5dataset = h5file.get("images")
>>>> offset = h5dataset.id.get_offset()
>>>> print(offset % 4096)
>>>> print(offset % 2048)
>>>> print(offset % 1024)
>>>> print(offset % 512)
>>>> print(offset % 128)
>>>> print(offset % 64)
>>>>
>>>> """
>>>> 3206
>>>> 1158
>>>> 134
>>>> 134
>>>> 6
>>>> 6
>>>> """
>>>> ```
>>>>
>>>>
>>>> _______________________________________________
>>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>>> recorded in the Unidata inquiry tracking system and made publicly
>>>> available through the web.  Users who post to any of the lists we
>>>> maintain are reminded to remove any personal information that they
>>>> do not want to be made public.
>>>>
>>>>
>>>> netcdfgroup mailing list
>>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>>> For list information or to unsubscribe,  visit:
>>>> https://www.unidata.ucar.edu/mailing_lists/
>>>>
>>>
  • 2022 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: