Hi,
I have to write and read data to/from a netcdf file that has 750
variables, all of them using unlimited dimensions (only one per
variable, some dimensions shared) and 10 fixed dimensions.
I have use netcdf-4 (because of the multiple unlimited dimensions
requirement) and C API.
I'm making some prototyping on my development machine (Linux 2GB RAM)
and found several performance issues that I hope someone can help me
fix/understand:
(1) when i create a file and try to define 1000 variables (all int)
and a single shared unlimited dimension, the process takes all
available RAM (swap included) and fails with "Error (data:def closed)
-- HDF error" after a (long)while.
If I do the same closing and opening the file again every 10 or 100
new definitions, it works fine. I can bypass this by creating the
file once (ncgen) and using a copy of it on every new file, but I
would prefer not to. Why does creating the variables take that much
memory?
(2) writing and reading data to variables there's a huge performance
difference between writing/reading one record at a time and
writing/reading several records at a time (buffering). To keep the
logic of my program simple my first approach was to write one-on-one
(as the program works this way: reads 1 record on each variable,
processes and writes it down) and play with the chunk size and chunk
cache, but so far it hasn't helped much.
Should I build a custom "buffering" layer or the chunk cache can help
here? or should I simply get more ram :)?
(3) Even when buffering, I see a performance degradation (memory goes
down fast, and processing time increases) as the number of records per
variable processed (written or read) increase.
I really could use some "expert" advice on the best way to address this issues.
Thanks in advance.
Dani