[netcdfgroup] File with large number of variables

Hi,
I have to write and read data to/from a netcdf file that has 750
variables, all of them using unlimited dimensions (only one per
variable, some dimensions shared) and 10 fixed dimensions.

I have use netcdf-4 (because of the multiple unlimited dimensions
requirement) and C API.

I'm making some prototyping on my development machine (Linux 2GB RAM)
and found several performance issues that I hope someone can help me
fix/understand:

(1) when i create a file and try to define 1000 variables (all int)
and a single shared unlimited dimension, the process takes all
available RAM (swap included) and fails with "Error (data:def closed)
-- HDF error" after a (long)while.

If I do the same closing and opening the file again every 10 or 100
new definitions, it works fine.  I can bypass this by creating the
file once (ncgen) and using a copy of it on every new file, but I
would prefer not to. Why does creating the variables take that much
memory?

(2) writing and reading data to variables there's a huge performance
difference between writing/reading one record at a time and
writing/reading several records at a time (buffering). To keep the
logic of my program simple my first approach was to write one-on-one
(as the program works this way: reads 1 record on each variable,
processes and writes it down) and play with the chunk size and chunk
cache, but so far it hasn't helped much.

Should I build a custom "buffering" layer or the chunk cache can help
here? or should I simply get more ram :)?

(3) Even when buffering, I see a performance degradation (memory goes
down fast, and processing time increases) as the number of records per
variable processed (written or read) increase.

I really could use some "expert" advice on the best way to address this issues.

Thanks in advance.

Dani



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: