Setting the cache to 0 has solved the problem on the definition of the
file. Thanks a lot.
Unfortunately, I'm still not able to write efficiently to the file I
just created. It looks like every call to nc_put_vara takes memory
that is not released.
I attach a code snippet to illustrate this. It is very clear when
executing with num_var = 100 (makes the test faster),
num_elements_var=10000 and buffer_size=1.
If I increase buffer_size the problem is less obvious but it's still
there (set buffer_size = 10 and increase num_elemements_var=100000).
Does not seem to be related to num_var this time but the number of
times nc_put_vara is called.
Any ideas?
Thanks in advance,
Dani
On Fri, Apr 30, 2010 at 8:26 PM, Ed Hartnett <ed@xxxxxxxxxxxxxxxx> wrote:
> Dani <pressec@xxxxxxxxx> writes:
>
>> Hi,
>> I have to write and read data to/from a netcdf file that has 750
>> variables, all of them using unlimited dimensions (only one per
>> variable, some dimensions shared) and 10 fixed dimensions.
>>
>> I have use netcdf-4 (because of the multiple unlimited dimensions
>> requirement) and C API.
>>
>> I'm making some prototyping on my development machine (Linux 2GB RAM)
>> and found several performance issues that I hope someone can help me
>> fix/understand:
>>
>> (1) when i create a file and try to define 1000 variables (all int)
>> and a single shared unlimited dimension, the process takes all
>> available RAM (swap included) and fails with "Error (data:def closed)
>> -- HDF error" after a (long)while.
>>
>> If I do the same closing and opening the file again every 10 or 100
>> new definitions, it works fine. I can bypass this by creating the
>> file once (ncgen) and using a copy of it on every new file, but I
>> would prefer not to. Why does creating the variables take that much
>> memory?
>
> When you create a netCDF variable, HDF5 allocates a buffer for that
> variable. The default size of the buffer is 1 MB.
>
> I have reproduced your problem, but it can be solved be explicitly
> setting the buffer size for each variable to a lower value. I have
> checked in my tests in libsrc4/tst_vars3.c, but here's the part with the
> cache setting:
>
> for (v = 0; v < NUM_VARS; v++)
> {
> sprintf(var_name, "var_%d", v);
> if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET;
> if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET;
> }
>
> Note the call to nc_set_var_chunk_cache(), right after the call to
> nc_def_var.
>
> When I take this line out, I get a serious slowdown around 4000
> variables. (I have more memory available than you do.)
>
> But when I add the call to set_var_chunk_cache(), setting the chunk
> cache to zero, then there is no slowdown, even for 10,000 variables.
>
> Thanks,
>
> Ed
> --
> Ed Hartnett -- ed@xxxxxxxxxxxxxxxx
>
void tick(char* data) {
clock_t c2 = clock();
double millis = ((c2 - c1)* 1000)/ CLOCKS_PER_SEC;
printf("%s - elapsed %f ms \n", data, millis);
c1 = c2;
}
void testNetCDFLimits() {
int num_var = 100;
int num_elements_var = 100000;
size_t buffer_size = 1;
int ncid, udim;
int varids[num_var];
size_t start;
int buffer[buffer_size];
char varname[10];
char filename[100];
sprintf(filename, "%d-test.nc4", num_var);
// create the file //
if ( nc_create(filename, NC_CLOBBER | NC_NETCDF4, &ncid) ) NCERR;
tick("created");
if ( nc_def_dim(ncid, "udim1", 0, &udim) ) NCERR;
if ( nc_def_dim(ncid, "udim2", 0, &udim) ) NCERR;
tick("dimensions defined");
for (int j = 0; j < num_var; j++) {
sprintf(varname,"var-%d",j);
if ( nc_def_var(ncid, varname, NC_INT, 1, &udim, &varids[j]) ) NCERR;
if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
}
tick("variables defined");
if( nc_enddef(ncid) ) NCERR;
tick("endef");
if ( nc_close(ncid) ) NCERR;
tick("closed");
// open for writing //
if(nc_open(filename, NC_WRITE, &ncid)) NCERR;
tick("opened");
for (int j = 0; j < num_var; j++) {
sprintf(varname,"var-%d",j);
if ( nc_inq_varid(ncid, varname, &varids[j]) ) NCERR;
if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
}
tick("inquired variables");
// iterate to write on vars. On every loop buffer_size elements are written
on all variables //
char debug[100];
for (int k = 0; k < num_elements_var; k = k + buffer_size) {
for (int j = 0; j < num_var; j++) {
for (int l = 0; l < buffer_size; l++) {
buffer[l] = l * j;
}
start = k;
if ( nc_put_vara(ncid, varids[j], &start, &buffer_size, buffer) ) NCERR;
}
sprintf(debug, "%d", k);
tick( debug );
}
tick("variables written");
if( nc_close(ncid) ) NCERR;
tick("closed");
// open for reading //
if ( nc_open(filename, NC_NOWRITE, &ncid) ) NCERR;
tick("open for reading");
for (int j = 0; j < num_var; j++) {
sprintf(varname,"var-%d",j);
if( nc_inq_varid(ncid, varname, &varids[j]) ) NCERR;
if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
}
tick("inquired variables");
for (int k = 0; k < num_elements_var; k = k + buffer_size) {
for (int j = 0; j < num_var; j++) {
start = k;
if ( nc_get_vara(ncid, varids[j], &start, &buffer_size, buffer) ) NCERR;
}
sprintf(debug, "%d", k);
tick( debug );
}
tick("variables read");
if( nc_close(ncid) ) NCERR;
tick("closed");
}