Dani <pressec@xxxxxxxxx> writes:
> Hi,
> I have to write and read data to/from a netcdf file that has 750
> variables, all of them using unlimited dimensions (only one per
> variable, some dimensions shared) and 10 fixed dimensions.
>
> I have use netcdf-4 (because of the multiple unlimited dimensions
> requirement) and C API.
>
> I'm making some prototyping on my development machine (Linux 2GB RAM)
> and found several performance issues that I hope someone can help me
> fix/understand:
>
> (1) when i create a file and try to define 1000 variables (all int)
> and a single shared unlimited dimension, the process takes all
> available RAM (swap included) and fails with "Error (data:def closed)
> -- HDF error" after a (long)while.
>
> If I do the same closing and opening the file again every 10 or 100
> new definitions, it works fine. I can bypass this by creating the
> file once (ncgen) and using a copy of it on every new file, but I
> would prefer not to. Why does creating the variables take that much
> memory?
When you create a netCDF variable, HDF5 allocates a buffer for that
variable. The default size of the buffer is 1 MB.
I have reproduced your problem, but it can be solved be explicitly
setting the buffer size for each variable to a lower value. I have
checked in my tests in libsrc4/tst_vars3.c, but here's the part with the
cache setting:
for (v = 0; v < NUM_VARS; v++)
{
sprintf(var_name, "var_%d", v);
if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET;
if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET;
}
Note the call to nc_set_var_chunk_cache(), right after the call to
nc_def_var.
When I take this line out, I get a serious slowdown around 4000
variables. (I have more memory available than you do.)
But when I add the call to set_var_chunk_cache(), setting the chunk
cache to zero, then there is no slowdown, even for 10,000 variables.
Thanks,
Ed
--
Ed Hartnett -- ed@xxxxxxxxxxxxxxxx