Re: [netcdfgroup] Fwd: Reading groups is very very slow. What am I doing wrong?

To: Paul van Delst <paul.vandelst@xxxxxxxx>
Subject: Re: [netcdfgroup] Fwd: Reading groups is very very slow. What am I doing wrong?
From: "David W. Pierce" <dpierce@xxxxxxxx>
Date: Wed, 30 Oct 2013 12:31:17 -0700

You know, I've never tried ragged arrays for any real purpose. I'd be
really interested to hear if they solve your efficiency problem.

Tell you what, if you do try them and they solve your problem,  I'll make
sure the files can be read in R. :)

Regards

Dave
 On Oct 30, 2013 11:55 AM, "Paul van Delst" <paul.vandelst@xxxxxxxx> wrote:

> Hi Dave,
>
> My data consists of atmospheric profiles (pressure, temperature, h2o, o3,
> trace gas absorbers..., etc..). Each profile can have a different number of
> atmospheric levels and the absorber data can be for a different number or
> set of absorbers. A single profile is about 25Kb.
>
> So one group would look like:
>
> group: atmprofile-1 {
>   dimensions:
>       n_Levels = 101 ;
>       n_Layers = 100 ;
>       n_Absorbers = 28 ;
>   variables:
>       double Level_Pressure(n_Levels) ;
>       double Level_Temperature(n_Levels) ;
>       double Level_Absorber(n_Absorbers, n_Levels) ;
>   ...etc...
>   } // group atmprofile-1
>
> and another like
>
> group: atmprofile-2 {
>   dimensions:
>       n_Levels = 91 ;
>       n_Layers = 90 ;
>       n_Absorbers = 2 ;
>   variables:
>   ...etc...
>   } // group atmprofile-2
>
> The dimensions within a group is what I meant by "base" dimensions. Each
> group has the same dimensions, but they have different values.
>
> With netCDF3, I always had to ensure the profile data was at the same set
> (or, at least, number) of pressure levels and with the same number of
> gaseous absorbers to pack all the data into arrays, e.g.
>
> netcdf ECMWF52.AtmProfile {
> dimensions:
>     n_levels = 101 ;
>     n_layers = 100 ;
>     n_absorbers = 2 ;
>     n_profiles = UNLIMITED ; // (52 currently)
> variables:
>     double level_pressure(n_profiles, n_levels) ;
>     double level_temperature(n_profiles, n_levels) ;
>     double level_absorber(n_profiles, n_absorbers, n_levels) ;
>     ...etc...
> }
>
> Adding individual profiles as a separate group allows me more freedom (and
> with less processing) to use profiles as they are delivered, but at the
> cost of long I/O times for large(ish) datasets.
>
> I guess I've fundamentally misinterpreted how groups in netCDF4 should be
> used. Your point about the multiplicative time of reading a single group
> makes sense. It just seemed to me that, since the data content is
> effectively the same (for my tests they are identical), the I/O time should
> be also.
>
> But I guess not. The overhead of reading lots of little groups of data (as
> in my dataset) is dominant. Bummer. :o(
>
> Is there a way of storing this type of dataset in netCDF4 in, e.g., ragged
> arrays?
>
> cheers,
>
> paulv
>
>
> On 10/30/13 14:28, David W. Pierce wrote:
>
>> Hi Paul,
>>
>> Well, you don't say what the size of each timestep is, but as the size of
>> each timestep becomes small (< 50 MB maybe?) I would think that doing each
>> timestep as a separate group (if that's what you're doing) would, for a
>> 5000 timestep array, take ~5000 times as long. That's since the set up time
>> is very considerable, and the incremental time for a second timestep after
>> you've set up for a first timestep is small (unless each timestep is quite
>> large).
>>
>> For someone who doesn't know just what you're doing this part is pretty
>> hard to parse:
>>
>>
>> "I did this so each group can have different "base" dimensions for the
>> data arrays."
>>
>> Maybe you could give the specific example? Not knowing the details it's
>> hard to see why it would be desirable to take the multi-group approach, or
>> to think about alternate approaches that would accomplish your goal but
>> might be more efficient.
>>
>> Regards,
>>
>> --Dave
>>
>>
>>
>> On Wed, Oct 30, 2013 at 8:00 AM, Paul van Delst 
>> <paul.vandelst@xxxxxxxx<mailto:
>> paul.vandelst@xxxxxxxx**>> wrote:
>>
>>     Hello,
>>
>>     I've just converted some of my netCDF writing code to write/read
>>     multiple groups rather than use an unlimited dimension. I did this
>>     so each group can have different "base" dimensions for the data
>>     arrays.
>>
>>     I have one data set where the unlimited dimension is 5000. The
>>     read/write of this data in netCDF3 format is almost instantaneous.
>>     When I use the netCDF4 approach (reading and writing 5000 separate
>>     groups) the reads and write can take upwards of 10minutes (I
>>     started the program at 10:33am. It is now 10:51am and the read of
>>     the created file is still going on).
>>
>>     I realise there's going to be additional overhead using the
>>     "groups" approach (defining dimensions and variables for each
>>     group) but I presume I'm doing something very wrong/stupid to
>>     cause the I/O to be as slow as it is. Before I start posting code
>>     snippets, does anyone have any experience hints as to what could
>>     be causing this supa slow I/O?
>>
>>     Thanks for any info.
>>
>>     cheers,
>>
>>     paulv
>>
>>     p.s. It's now 11:00am and the dataset reading is still going on...
>>
>>     ______________________________**_________________
>>     netcdfgroup mailing list
>>     netcdfgroup@xxxxxxxxxxxxxxxx 
>> <mailto:netcdfgroup@unidata.**ucar.edu<netcdfgroup@xxxxxxxxxxxxxxxx>
>> >
>>     For list information or to unsubscribe,  visit:
>>     
>> http://www.unidata.ucar.edu/**mailing_lists/<http://www.unidata.ucar.edu/mailing_lists/>
>>
>>
>>
>> --
>> David W. Pierce
>> Division of Climate, Atmospheric Science, and Physical Oceanography
>> Scripps Institution of Oceanography, La Jolla, California, USA
>> (858) 534-8276 <tel:%28858%29%20534-8276> (voice)  / (858) 
>> 534-8561<tel:%28858%29%20534-8561> (fax)
>> dpierce@xxxxxxxx <mailto:dpierce@xxxxxxxx>
>>
>>
>>
>> --
>> David W. Pierce
>> Division of Climate, Atmospheric Science, and Physical Oceanography
>> Scripps Institution of Oceanography, La Jolla, California, USA
>> (858) 534-8276 (voice)  /  (858) 534-8561 (fax) dpierce@xxxxxxxx <mailto:
>> dpierce@xxxxxxxx>
>>
>>
>> ______________________________**_________________
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/**mailing_lists/<http://www.unidata.ucar.edu/mailing_lists/>
>>
>
> ______________________________**_________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/**mailing_lists/<http://www.unidata.ucar.edu/mailing_lists/>

References:
- [netcdfgroup] Reading groups is very very slow. What am I doing wrong?
  - From: Paul van Delst
- [netcdfgroup] Fwd: Reading groups is very very slow. What am I doing wrong?
  - From: David W. Pierce
- Re: [netcdfgroup] Fwd: Reading groups is very very slow. What am I doing wrong?
  - From: Paul van Delst

2013 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: