Re: [netcdfgroup] File with large number of variables

Setting the cache to 0 has solved the problem on the definition of the
file. Thanks a lot.

Unfortunately, I'm still not able to write efficiently to the file I
just created. It looks like every call to nc_put_vara takes memory
that is not released.

I attach a code snippet to illustrate this. It is very clear when
executing with num_var = 100 (makes the test faster),
num_elements_var=10000 and buffer_size=1.
If I increase buffer_size the problem is less obvious but it's still
there (set buffer_size = 10 and increase num_elemements_var=100000).
Does not seem to be related to num_var this time but the number of
times nc_put_vara is called.

Any ideas?

Thanks in advance,

Dani


On Fri, Apr 30, 2010 at 8:26 PM, Ed Hartnett <ed@xxxxxxxxxxxxxxxx> wrote:
> Dani <pressec@xxxxxxxxx> writes:
>
>> Hi,
>> I have to write and read data to/from a netcdf file that has 750
>> variables, all of them using unlimited dimensions (only one per
>> variable, some dimensions shared) and 10 fixed dimensions.
>>
>> I have use netcdf-4 (because of the multiple unlimited dimensions
>> requirement) and C API.
>>
>> I'm making some prototyping on my development machine (Linux 2GB RAM)
>> and found several performance issues that I hope someone can help me
>> fix/understand:
>>
>> (1) when i create a file and try to define 1000 variables (all int)
>> and a single shared unlimited dimension, the process takes all
>> available RAM (swap included) and fails with "Error (data:def closed)
>> -- HDF error" after a (long)while.
>>
>> If I do the same closing and opening the file again every 10 or 100
>> new definitions, it works fine.  I can bypass this by creating the
>> file once (ncgen) and using a copy of it on every new file, but I
>> would prefer not to. Why does creating the variables take that much
>> memory?
>
> When you create a netCDF variable, HDF5 allocates a buffer for that
> variable. The default size of the buffer is 1 MB.
>
> I have reproduced your problem, but it can be solved be explicitly
> setting the buffer size for each variable to a lower value. I have
> checked in my tests in libsrc4/tst_vars3.c, but here's the part with the
> cache setting:
>
>      for (v = 0; v < NUM_VARS; v++)
>      {
>         sprintf(var_name, "var_%d", v);
>         if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET;
>         if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET;
>      }
>
> Note the call to nc_set_var_chunk_cache(), right after the call to
> nc_def_var.
>
> When I take this line out, I get a serious slowdown around 4000
> variables. (I have more memory available than you do.)
>
> But when I add the call to set_var_chunk_cache(), setting the chunk
> cache to zero, then there is no slowdown, even for 10,000 variables.
>
> Thanks,
>
> Ed
> --
> Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx
>
  void tick(char* data) {
    clock_t c2 = clock();
    double millis = ((c2 - c1)* 1000)/ CLOCKS_PER_SEC;
    printf("%s - elapsed %f ms \n", data, millis);
    c1 = c2;
  }

  void testNetCDFLimits() {

    int num_var = 100;
    int num_elements_var = 100000;
    size_t buffer_size = 1;

    int ncid, udim;
    int varids[num_var];
    size_t start;
    int buffer[buffer_size];
    char varname[10];
    char filename[100];

    sprintf(filename, "%d-test.nc4", num_var);

    // create the file //

    if ( nc_create(filename, NC_CLOBBER | NC_NETCDF4, &ncid) ) NCERR;
    tick("created");

    if ( nc_def_dim(ncid, "udim1", 0, &udim) ) NCERR;
    if ( nc_def_dim(ncid, "udim2", 0, &udim) ) NCERR;
    tick("dimensions defined");


    for (int j = 0; j < num_var; j++) {
        sprintf(varname,"var-%d",j);
        if ( nc_def_var(ncid, varname, NC_INT, 1, &udim, &varids[j]) ) NCERR;
        if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
    }
    tick("variables defined");

    if( nc_enddef(ncid) ) NCERR;
    tick("endef");
    if ( nc_close(ncid) ) NCERR;
    tick("closed");


    // open for writing //
    if(nc_open(filename, NC_WRITE, &ncid)) NCERR;

    tick("opened");

    for (int j = 0; j < num_var; j++) {
      sprintf(varname,"var-%d",j);
      if ( nc_inq_varid(ncid, varname, &varids[j]) ) NCERR;
      if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
    }
    tick("inquired variables");

    // iterate to write on vars. On every loop buffer_size elements are written 
on all variables //
    char debug[100];
    for (int k = 0; k < num_elements_var; k = k + buffer_size) {

      for (int j = 0; j < num_var; j++) {

        for (int l = 0; l < buffer_size; l++) {
          buffer[l] = l * j;
        }
        start = k;
        if ( nc_put_vara(ncid, varids[j], &start, &buffer_size, buffer) ) NCERR;
      }
      sprintf(debug, "%d", k);
      tick( debug );
    }
    tick("variables written");

    if( nc_close(ncid) ) NCERR;
    tick("closed");



    // open for reading //
    if ( nc_open(filename, NC_NOWRITE, &ncid) ) NCERR;
    tick("open for reading");
    
    for (int j = 0; j < num_var; j++) {
      sprintf(varname,"var-%d",j);
      if( nc_inq_varid(ncid, varname, &varids[j]) ) NCERR;
      if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
    }
    tick("inquired variables");

    for (int k = 0; k < num_elements_var; k = k + buffer_size) {
      for (int j = 0; j < num_var; j++) {
        start = k;
        if ( nc_get_vara(ncid, varids[j], &start, &buffer_size, buffer) ) NCERR;
      }
      sprintf(debug, "%d", k);
      tick( debug );
    }
    
    tick("variables read");
    if( nc_close(ncid) ) NCERR;
    tick("closed");

}
  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: