NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Hi Ed, On Aug 18, 2007, at 7:52 AM, Ed Hartnett wrote:
Howdy all! I am writing a test program which writes large files (well over 2 GB). I have some questions about HDF5 and very large files. I need to check out whether netCDF-4 has been correctly implemented for best performance. In the program below, I create 4 datasets, of type double. They are one-dimensional, with length 2147483644/4. (That is 17179869152 bytes of data.) Then I write the last value only in each dataset. Took a really long time - minutes. Is this expected? What is HDF5 doing in the background here? Is there something I can do with chunking here to improve the speed of this program? I am not setting a fill calue, so what is being written here? I naively expected that HDF5 would not write all the data I am skipping, but would find a way to write data only around the value that I am actually writing... The file that this program creates is 17179883735 bytes, which is 14583 bytes of HDF5 overhead. Is that about what is expected? Any comments welcome...
The problem is in your computation of the chunk size for the dataset, in libsrc4/nc4hdf.c, around lines 1059-1084. The current computations end up with a chunk of size equal to the dimension size (2147483644/4 in the code below), i.e. a single 4GB chunk for the entire dataset. This is not going to work well, since HDF5 always reads an entire chunk into memory, updates it and then writes the entire chunk back out to disk. ;-)
That section of code looks like it has the beginning of some heuristics for automatically tuning the chunk size, but it would probably be better to let the application set a particular chunk size, if possible.
Quincey
Thanks, Ed /* Copyright 2007, UCAR/Unidata See COPYRIGHT file for copying and redistribution conditions. This program (quickly, but not throughly) tests the large file features of netCDF-4. $Id: tst_large.c,v 1.3 2007/08/18 12:26:38 ed Exp $ */ #include <config.h> #include <nc_tests.h> #include <netcdf.h> #include <stdio.h> #include <string.h> /* This is the magic number for classic format limits: 2 GiB - 4 bytes. */ #define MAX_CLASSIC_BYTES 2147483644 /* This is the magic number for 64-bit offset format limits: 4 GiB - 4 bytes. */ #define MAX_64OFFSET_BYTES 4294967292 /* Handy for constucting tests. */ #define QTR_CLASSIC_MAX (MAX_CLASSIC_BYTES/4) /* We will create this file. */ #define FILE_NAME "tst_large.nc" int main(int argc, char **argv) {printf("\n*** Testing really large files in netCDF-4/HDF5 format, quickly.\n");printf("\n*** Testing create of simple, but large, file..."); { #define DIM_NAME "Time_in_nanoseconds" #define NUMDIMS 1 #define NUMVARS 4 int ncid, dimids[NUMDIMS], varid[NUMVARS];char var_name[NUMVARS][NC_MAX_NAME + 1] = {"England", "Scotland", "Ireland", "Wales"};size_t index[2] = {QTR_CLASSIC_MAX-1, 0}; int ndims, nvars, natts, unlimdimid; nc_type xtype; char name_in[NC_MAX_NAME + 1]; size_t len; double pi = 3.1459, pi_in; int i; /* Create a netCDF netCDF-4/HDF5 format file, with 4 vars. */ if (nc_create(FILE_NAME, NC_NETCDF4, &ncid)) ERR; if (nc_set_fill(ncid, NC_NOFILL, NULL)) ERR; if (nc_def_dim(ncid, DIM_NAME, QTR_CLASSIC_MAX, dimids)) ERR; for (i = 0; i < NUMVARS; i++) { if (nc_def_var(ncid, var_name[i], NC_DOUBLE, NUMDIMS, dimids, &varid[i])) ERR; } if (nc_enddef(ncid)) ERR; for (i = 0; i < NUMVARS; i++) if (nc_put_var1_double(ncid, i, index, &pi)) ERR; if (nc_close(ncid)) ERR; /* Reopen and check the file. */ if (nc_open(FILE_NAME, 0, &ncid)) ERR; if (nc_inq(ncid, &ndims, &nvars, &natts, &unlimdimid)) ERR;if (ndims != NUMDIMS || nvars != NUMVARS || natts != 0 || unlimdimid != -1) ERR;if (nc_inq_dimids(ncid, &ndims, dimids, 1)) ERR; if (ndims != 1 || dimids[0] != 0) ERR; if (nc_inq_dim(ncid, 0, name_in, &len)) ERR; if (strcmp(name_in, DIM_NAME) || len != QTR_CLASSIC_MAX) ERR; for (i = 0; i < NUMVARS; i++) {if (nc_inq_var(ncid, i, name_in, &xtype, &ndims, dimids, &natts)) ERR; if (strcmp(name_in, var_name[i]) || xtype != NC_DOUBLE || ndims ! = 1 ||dimids[0] != 0 || natts != 0) ERR; if (nc_get_var1_double(ncid, i, index, &pi_in)) ERR; if (pi_in != pi) ERR; } if (nc_close(ncid)) ERR; } SUMMARIZE_ERR; FINAL_RESULTS; } -- Ed Hartnett -- ed@xxxxxxxxxxxxxxxx _______________________________________________ netcdf-hdf mailing list netcdf-hdf@xxxxxxxxxxxxxxxxFor list information or to unsubscribe, visit: http:// www.unidata.ucar.edu/mailing_lists/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
netcdf-hdf
archives: