Hi Dani, If you are really interesting in the program efficiency then HDF5 API should be considered. NetCDF4 API is actually a wrapper on top of HDF5 API providing interface familiar for NetCDF users. NetCDF4 provides "simpler" interface in one respect: the user doesn't worry to close objects opened or created earlier in the program. And this comes with a price: NetCDF API must keep in memory the whole file structure. That's why NetCDF API works much slower (and takes much more memory) than HDF5 API on files with complex structure. I have replaced NetCDF code with HDF5 in your example. The resulting code is shorter and it will run much faster: please try. Regards, Sergei -----Original Message----- From: netcdfgroup-bounces@xxxxxxxxxxxxxxxx [mailto:netcdfgroup-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Dani Sent: 03 May 2010 10:41 To: Ed Hartnett Cc: netcdfgroup@xxxxxxxxxxxxxxxx Subject: Re: [netcdfgroup] File with large number of variables Setting the cache to 0 has solved the problem on the definition of the file. Thanks a lot. Unfortunately, I'm still not able to write efficiently to the file I just created. It looks like every call to nc_put_vara takes memory that is not released. I attach a code snippet to illustrate this. It is very clear when executing with num_var = 100 (makes the test faster), num_elements_var=10000 and buffer_size=1. If I increase buffer_size the problem is less obvious but it's still there (set buffer_size = 10 and increase num_elemements_var=100000). Does not seem to be related to num_var this time but the number of times nc_put_vara is called. Any ideas? Thanks in advance, Dani On Fri, Apr 30, 2010 at 8:26 PM, Ed Hartnett <ed@xxxxxxxxxxxxxxxx> wrote: > Dani <pressec@xxxxxxxxx> writes: > >> Hi, >> I have to write and read data to/from a netcdf file that has 750 >> variables, all of them using unlimited dimensions (only one per >> variable, some dimensions shared) and 10 fixed dimensions. >> >> I have use netcdf-4 (because of the multiple unlimited dimensions >> requirement) and C API. >> >> I'm making some prototyping on my development machine (Linux 2GB RAM) >> and found several performance issues that I hope someone can help me >> fix/understand: >> >> (1) when i create a file and try to define 1000 variables (all int) >> and a single shared unlimited dimension, the process takes all >> available RAM (swap included) and fails with "Error (data:def closed) >> -- HDF error" after a (long)while. >> >> If I do the same closing and opening the file again every 10 or 100 >> new definitions, it works fine. I can bypass this by creating the >> file once (ncgen) and using a copy of it on every new file, but I >> would prefer not to. Why does creating the variables take that much >> memory? > > When you create a netCDF variable, HDF5 allocates a buffer for that > variable. The default size of the buffer is 1 MB. > > I have reproduced your problem, but it can be solved be explicitly > setting the buffer size for each variable to a lower value. I have > checked in my tests in libsrc4/tst_vars3.c, but here's the part with the > cache setting: > > for (v = 0; v < NUM_VARS; v++) > { > sprintf(var_name, "var_%d", v); > if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET; > if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET; > } > > Note the call to nc_set_var_chunk_cache(), right after the call to > nc_def_var. > > When I take this line out, I get a serious slowdown around 4000 > variables. (I have more memory available than you do.) > > But when I add the call to set_var_chunk_cache(), setting the chunk > cache to zero, then there is no slowdown, even for 10,000 variables. > > Thanks, > > Ed > -- > Ed Hartnett -- ed@xxxxxxxxxxxxxxxx > Click https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg== to report this email as spam. This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com
Attachment:
testlimits.c
Description: testlimits.c
netcdfgroup
archives: