[netcdfgroup] Patch for netCDF4 file bit-for-bit reproducibility

Hello all from netCDF group,

Recently I started to work with netcdf in fortran, mainly changing f77
interface to more flexible f90 one.
And I love it! Fantastic API.

I am dealing with code that has huge testsuite for regression testing,
so I am trying to found compromise for size and speed.
Code was intended to output lots of diagnostics (~1Gb) for every test.

Lack of ncdiff tool made me to write my one, but while trying to
optimize it for time
I learned that half the time I am spending in my comparison loops,
other half in swap8b...
NETCDF4 features like compression and native endianess are very appealing
but lack of BFB (even with nccopy) just because of internal timestamping is sad.

http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2008/msg00003.html
Is this still valid?

I am attaching small patch that I made on netcdf-4.1.3.
Just with this patch and these configure options I successfully can
reproduce identical files
using nccopy not depending on system time or having to relay on some
hooks for get unix time.
CPPFLAGS="-I$(hdf}/include"
CFLAGS="-DBFB_MODE"
LDFLAGS="-L${hdf}/lib -ldl"
./configure --prefix=${netcdf} --enable-netcdf-4 --disable-hdf4
--disable-pnetcdf --enable-cdmremote=no --disable-dap --disable-v2
--disable-shared --with-pic

In source code there isn't more of H5P_[A-Z]+_CREATE calls (except for
ones in tests)

Is is safe enough to be used for reproducibility checks at least with
netcdf3/netcdf4 classic format?
All I need is to be able to use md5sum on repeated runs to speed up
the process with the same netcdf/hdf lib.

Best regards,
Rimvydas
diff -rupN a/libsrc4/nc4file.c b/libsrc4/nc4file.c
--- a/libsrc4/nc4file.c 2014-01-17 00:09:24.000000000 +0200
+++ b/libsrc4/nc4file.c 2014-02-04 17:37:41.676150939 +0200
@@ -347,6 +347,12 @@ nc4_create_file(const char *path, int cm
    num_plists++;
 #endif
 
+#ifdef BFB_MODE
+   /* RJ: this suppose to be FALSE that is defined in H5 private.h as 0 */
+   if (H5Pset_obj_track_times(fcpl_id,0)<0)
+      BAIL(NC_EHDFERR);
+#endif
+
    /* Set latest_format in access propertly list and
     * H5P_CRT_ORDER_TRACKED in the creation property list. This turns
     * on HDF5 creation ordering. */
diff -rupN a/libsrc4/nc4hdf.c b/libsrc4/nc4hdf.c
--- a/libsrc4/nc4hdf.c  2014-01-17 00:09:24.000000000 +0200
+++ b/libsrc4/nc4hdf.c  2014-02-04 17:38:47.484105156 +0200
@@ -1252,6 +1252,12 @@ var_create_dataset(NC_GRP_INFO_T *grp, N
       num_plists++;
 #endif
 
+#ifdef BFB_MODE
+   /* RJ: this suppose to be FALSE that is defined in H5 private.h as 0 */
+   if (H5Pset_obj_track_times(plistid,0)<0)
+      BAIL(NC_EHDFERR);
+#endif
+
    /* Find the HDF5 type of the dataset. */
    if ((retval = nc4_get_hdf_typeid(grp->nc4_info, var->xtype, &typeid, 
                                    var->type_info->endianness)))
@@ -1841,6 +1847,13 @@ create_group(NC_GRP_INFO_T *grp)
 #ifdef EXTRA_TESTS
       num_plists++;
 #endif
+
+#ifdef BFB_MODE
+      /* RJ: this suppose to be FALSE that is defined in H5 private.h as 0 */
+      if (H5Pset_obj_track_times(gcpl_id,0)<0)
+         BAIL(NC_EHDFERR);
+#endif
+
       if (H5Pset_link_creation_order(gcpl_id, 
H5P_CRT_ORDER_TRACKED|H5P_CRT_ORDER_INDEXED) < 0)
          BAIL(NC_EHDFERR);
       if (H5Pset_attr_creation_order(gcpl_id, 
H5P_CRT_ORDER_TRACKED|H5P_CRT_ORDER_INDEXED) < 0)
@@ -2161,6 +2174,13 @@ write_dim(NC_DIM_INFO_T *dim, NC_GRP_INF
 #ifdef EXTRA_TESTS
       num_plists++;
 #endif
+
+#ifdef BFB_MODE
+      /* RJ: this suppose to be FALSE that is defined in H5 private.h as 0 */
+      if (H5Pset_obj_track_times(create_propid,0)<0)
+         BAIL(NC_EHDFERR);
+#endif
+
       dims[0] = dim->len;
       max_dims[0] = dim->len;
       if (dim->unlimited) 
  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: