NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Quincey Koziol wrote:
the motivation for me would be to use bit-packing as a storage format, not a data type. we would add an option to pack wider types (usually float/double) using a scale/offset. this can get you a factor of 2-4 or so, whereas compression may not get you anything.Hi Ed,Bitfields are a black sheep in the datatype family and aren't terribly well documented (which we're trying to work on). Say something if you think we've got a terrible gap about them somewhere.Well I know terrible gap about them in my brain...:-)Is there an example somewhere about using bitfields in HDF5?Hmm, you can look in the test/dtypes.c for some examples of using them. Search for "H5T_STD_B"...OK, here's what I'm seeing about creating a bitfield... hid_t st=-1, dt=-1; st = H5Tcopy(H5T_STD_B16LE); H5Tset_precision(st, 12); H5Tset_offset(st, 2); Does this pretty much sum it up? I H5TCopy an integer type big enough to hold it, and then set precision and offset?Yes, that's pretty much all.Or can you just tell me what functions would be used to create a bitfield?The H5Tset_precision() routine determines the number of bits in a datatype that are significant within it.Limits on number of bits?Up to the size of the datatype that contains it (which is defined for up to 64-bit datatypes currently).How are these stored then? Any sort of padding or what?We currently don't pack them, so a 13-bit field in a 32-bit datatype still takes up 4 bytes of space. Frankly, I think this is a bit of a bug, but it's a fairly complicated problem to pack the bits on disk (in light of using bitfields in compound, array and variable-length datatypes mostly) and noone has whined strongly about it, so its been the status quo for a while now. :-/Ah ha! That sounds important. I think storage (and transmission) efficiency is what this whole feature is about for Russ... Russ, is that correct? The goal here is to store and move large amounts of bitfield data efficiently? Otherwise, what is the point of a bitfield in C/C++ or fortran 77? I don't know about F90 - does it have a good way to deal with bitfields? Perhaps we should ask whether compression is a better thing to use to achieve storage efficiency?It would be fairly straightforward to implement a pipeline filter that "compressed" data by packing out the unused bits for bitfield datatypes. (At least for non-compound/array/variable-length combinations :-). Quincey
however, this would only work if it remains a valid hdf5 file. It would be most useful if we can do arbitrary bit widths, but still useful if we are limited to multiples of 8.
From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2004 Jul -0600 10:22:24
Message-ID: <wrxllhjrkfz.fsf@xxxxxxxxxxxxxxxxxxxxxxx> Date: 16 Jul 2004 10:22:24 -0600 From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx> In-Reply-To: <20040715150435.G3034@xxxxxxxxxxxxxxxxxxxxx> To: netcdf-hdf@xxxxxxxxxxxxxxxx Subject: Re: questions about compression... Received: (from majordo@localhost) by unidata.ucar.edu (UCAR/Unidata) id i6GGMP89009023 for netcdf-hdf-out; Fri, 16 Jul 2004 10:22:25 -0600 (MDT) Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu [128.117.140.88]) by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6GGMOaW009019 for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Fri, 16 Jul 2004 10:22:24 -0600 (MDT) Organization: UCAR/Unidata Keywords: 200407161622.i6GGMOaW009019 References: <wrxd62xcb1b.fsf@xxxxxxxxxxxxxxxxxxxxxxx> <20040715150435.G3034@xxxxxxxxxxxxxxxxxxxxx> Lines: 31 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx Precedence: bulk Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx "Robert E. McGrath" <mcgrath@xxxxxxxxxxxxx> writes:
Please check the Users Guide (chapter on 'datasets'). http://hdf.ncsa.uiuc.edu/HDF5/doc/UG/ Basically, there is a set/get pair for all the filters. The standard filters are: Deflate (GZIP), SZIP compression, Shuffle, and Fletcher Error Detection Code. To enable, you do a H5Pset_... on the Dataset Creation Property list, then create the dataset with H5Dcreate.
OK, then let me pose the following requirements' question: Is the requirement that we support one type of compression, both types of compression that currently exist in the library (gzip and szip), or that we support all compression filters that may be introduced in the future? Or is the requirement that we support file filters, including all the ones listed above? If yes to the last question, is it also a requirement that we allow the user to register callbacks, etc., and so add his own filters to netCDF-4, just as HDF5 does? Ed
From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2004 Jul -0600 10:33:12
Message-ID: <wrx8ydjrjxz.fsf@xxxxxxxxxxxxxxxxxxxxxxx> Date: 16 Jul 2004 10:33:12 -0600 From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx> In-Reply-To: <40F7FFE1.80302@xxxxxxxxxxxxxxxx> To: netcdf-hdf@xxxxxxxxxxxxxxxx Subject: Re: HDF5 bitfields... Received: (from majordo@localhost) by unidata.ucar.edu (UCAR/Unidata) id i6GGXFQ6010100 for netcdf-hdf-out; Fri, 16 Jul 2004 10:33:15 -0600 (MDT) Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu [128.117.140.88]) by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6GGXDaW010093 for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Fri, 16 Jul 2004 10:33:13 -0600 (MDT) Organization: UCAR/Unidata Keywords: 200407161633.i6GGXDaW010093 References: <200407160402.i6G42gaU005048@xxxxxxxxxxxxxxxxxxxxxx> <40F7FFE1.80302@xxxxxxxxxxxxxxxx> Lines: 21 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx Precedence: bulk Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx John Caron <caron@xxxxxxxxxxxxxxxx> writes:
the motivation for me would be to use bit-packing as a storage format, not a data type. we would add an option to pack wider types (usually float/double) using a scale/offset. this can get you a factor of 2-4 or so, whereas compression may not get you anything. however, this would only work if it remains a valid hdf5 file. It would be most useful if we can do arbitrary bit widths, but still useful if we are limited to multiples of 8.
Well this could be easily done by netCDF-4 using attributes to store the info needed. It would still be a valid HDF5 file, but readers would be mighty confused about how to read it unless they understood the conventions we'd use to store the scale/offset numbers for a dataset... However, I don't think we would use the HDF5 bitfield for this. Ed
netcdf-hdf
archives: