Re: [netcdfgroup] File with large number of variables

To: Dani <pressec@xxxxxxxxx>
Subject: Re: [netcdfgroup] File with large number of variables
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
Date: Fri, 30 Apr 2010 12:26:26 -0600

Dani <pressec@xxxxxxxxx> writes:

> Hi,
> I have to write and read data to/from a netcdf file that has 750
> variables, all of them using unlimited dimensions (only one per
> variable, some dimensions shared) and 10 fixed dimensions.
>
> I have use netcdf-4 (because of the multiple unlimited dimensions
> requirement) and C API.
>
> I'm making some prototyping on my development machine (Linux 2GB RAM)
> and found several performance issues that I hope someone can help me
> fix/understand:
>
> (1) when i create a file and try to define 1000 variables (all int)
> and a single shared unlimited dimension, the process takes all
> available RAM (swap included) and fails with "Error (data:def closed)
> -- HDF error" after a (long)while.
>
> If I do the same closing and opening the file again every 10 or 100
> new definitions, it works fine.  I can bypass this by creating the
> file once (ncgen) and using a copy of it on every new file, but I
> would prefer not to. Why does creating the variables take that much
> memory?

When you create a netCDF variable, HDF5 allocates a buffer for that
variable. The default size of the buffer is 1 MB. 

I have reproduced your problem, but it can be solved be explicitly
setting the buffer size for each variable to a lower value. I have
checked in my tests in libsrc4/tst_vars3.c, but here's the part with the
cache setting:

      for (v = 0; v < NUM_VARS; v++)
      {
         sprintf(var_name, "var_%d", v);
         if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET;
         if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET;
      }

Note the call to nc_set_var_chunk_cache(), right after the call to
nc_def_var.

When I take this line out, I get a serious slowdown around 4000
variables. (I have more memory available than you do.)

But when I add the call to set_var_chunk_cache(), setting the chunk
cache to zero, then there is no slowdown, even for 10,000 variables.

Thanks,

Ed
-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx

Follow-Ups:
- Re: [netcdfgroup] File with large number of variables
  - From: Dani

References:
- [netcdfgroup] File with large number of variables
  - From: Dani

2010 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: