Re: [netcdfgroup] File with large number of variables

To: "Dani" <pressec@xxxxxxxxx>
Subject: Re: [netcdfgroup] File with large number of variables
From: "Shibaev, Sergei" <Sergei.Shibaev@xxxxxxxxxx>
Date: Tue, 4 May 2010 10:27:59 +0100

Hi Dani,

If you are really interesting in the program efficiency then HDF5 API should be 
considered.
NetCDF4 API is actually a wrapper on top of HDF5 API providing interface 
familiar for NetCDF users.
NetCDF4 provides "simpler" interface in one respect: the user doesn't worry to 
close objects opened or created earlier in the program. And this comes with a 
price: NetCDF API must keep in memory the whole file structure. That's why 
NetCDF API works much slower (and takes much more memory) than HDF5 API on 
files with complex structure.
I have replaced NetCDF code with HDF5 in your example. The resulting code is 
shorter and it will run much faster: please try.

Regards,
Sergei

-----Original Message-----
From: netcdfgroup-bounces@xxxxxxxxxxxxxxxx 
[mailto:netcdfgroup-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Dani
Sent: 03 May 2010 10:41
To: Ed Hartnett
Cc: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: Re: [netcdfgroup] File with large number of variables

Setting the cache to 0 has solved the problem on the definition of the
file. Thanks a lot.

Unfortunately, I'm still not able to write efficiently to the file I
just created. It looks like every call to nc_put_vara takes memory
that is not released.

I attach a code snippet to illustrate this. It is very clear when
executing with num_var = 100 (makes the test faster),
num_elements_var=10000 and buffer_size=1.
If I increase buffer_size the problem is less obvious but it's still
there (set buffer_size = 10 and increase num_elemements_var=100000).
Does not seem to be related to num_var this time but the number of
times nc_put_vara is called.

Any ideas?

Thanks in advance,

Dani

On Fri, Apr 30, 2010 at 8:26 PM, Ed Hartnett <ed@xxxxxxxxxxxxxxxx> wrote:
> Dani <pressec@xxxxxxxxx> writes:
>
>> Hi,
>> I have to write and read data to/from a netcdf file that has 750
>> variables, all of them using unlimited dimensions (only one per
>> variable, some dimensions shared) and 10 fixed dimensions.
>>
>> I have use netcdf-4 (because of the multiple unlimited dimensions
>> requirement) and C API.
>>
>> I'm making some prototyping on my development machine (Linux 2GB RAM)
>> and found several performance issues that I hope someone can help me
>> fix/understand:
>>
>> (1) when i create a file and try to define 1000 variables (all int)
>> and a single shared unlimited dimension, the process takes all
>> available RAM (swap included) and fails with "Error (data:def closed)
>> -- HDF error" after a (long)while.
>>
>> If I do the same closing and opening the file again every 10 or 100
>> new definitions, it works fine.  I can bypass this by creating the
>> file once (ncgen) and using a copy of it on every new file, but I
>> would prefer not to. Why does creating the variables take that much
>> memory?
>
> When you create a netCDF variable, HDF5 allocates a buffer for that
> variable. The default size of the buffer is 1 MB.
>
> I have reproduced your problem, but it can be solved be explicitly
> setting the buffer size for each variable to a lower value. I have
> checked in my tests in libsrc4/tst_vars3.c, but here's the part with the
> cache setting:
>
>      for (v = 0; v < NUM_VARS; v++)
>      {
>         sprintf(var_name, "var_%d", v);
>         if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET;
>         if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET;
>      }
>
> Note the call to nc_set_var_chunk_cache(), right after the call to
> nc_def_var.
>
> When I take this line out, I get a serious slowdown around 4000
> variables. (I have more memory available than you do.)
>
> But when I add the call to set_var_chunk_cache(), setting the chunk
> cache to zero, then there is no slowdown, even for 10,000 variables.
>
> Thanks,
>
> Ed
> --
> Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx
>

Click https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg==  to report this 
email as spam.

This message has been scanned for viruses by BlackSpider MailControl - 
www.blackspider.com

Attachment: testlimits.c
Description: testlimits.c

References:
- Re: [netcdfgroup] File with large number of variables
  - From: Dani

2010 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: