Re: [netcdfgroup] abysmal performance

  • To: Burlen Loring <bloring@xxxxxxx>
  • Subject: Re: [netcdfgroup] abysmal performance
  • From: Dave Allured - NOAA Affiliate <dave.allured@xxxxxxxx>
  • Date: Thu, 2 Jun 2016 14:47:11 -0600
Burlen,

I am not a fan of advising people to re-write large datasets when there is
an alternate solution.  In your original description you said your
application must scan the dataset, comprised of many netcdf files to
determine the available time steps.

Can you change the application to determine the time range more
efficiently?  Presumably the file names include a reliable indication of
the dates or times?  How about just sampling the time coordinate in the
"first" and "last" files by date-sorted name?

--Dave


On Thu, Jun 2, 2016 at 2:35 PM, Burlen Loring <bloring@xxxxxxx> wrote:

> That sounds like it, ncdump on 1 file shows "time = UNLIMITED ; // (8
> currently)" it's kind of unexpected that these 8 values not be in a
> contiguous array! Oh well. Thanks for clarifying. This is simulation
> output, so our options may be limited. I will be sure to mention this to
> the scientists. Hopefully they can write them as a fixed dimension.
>
> On 06/02/2016 01:24 PM, Bowman, Kenneth P wrote:
>
> Hi Burlen,
>
> If time is your unlimited (record) dimension, then the time values are
> scattered through the 433 MB file.  That is true for any variables that
> have a time dimension.  To read the time variable, the netCDF library has
> to jump through the file and collect the values.
>
> The longitude variable is contiguous in memory and can be read quickly.
>
> If you know the number of time steps in the file before you write the
> file, you can change the unlimited time dimension to a fixed dimension.
> Then something dimensioned by (only) time will be contiguous in memory.
>
> Or you can rewrite the files with fixed dimensions.  That read performance
> penalty is one of the tradeoffs of having the flexibility of an unlimited
> dimension.
>
> Good luck!
>
> Ken
>
>
> Date: Thu, 2 Jun 2016 12:41:53 -0700
> From: Burlen Loring <bloring@xxxxxxx>
> To: Tom Fogal <tfogal@xxxxxxxxxxxx>,  <netcdfgroup@xxxxxxxxxxxxxxxx>
> netcdfgroup@xxxxxxxxxxxxxxxx
> Subject: Re: [netcdfgroup] abysmal performance
> Message-ID: <961631fd-2aad-d348-ce1d-8a70a9e67287@xxxxxxx>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> Hi Tom,
>
> That's not an option, and it has it's own issues. for example if file
> size exceeds the size of a tape drive we can't archive it. Beside it
> doesn't seem like a lustre metadata issue, open is relatively fast, like
> 0.096 sec. and wouldn't explain why reading the time dimension with only
> 8 values takes on the order of 1 sec while reading the lon dimension
> with 1152 values takes on the order of 1e-4 sec. ?
>
> Burlen
>
>
>
>
> -----------------------------------------------------------------------------
> Dr. Kenneth P. Bowman                                    1014A Eller
> Building
> David Bullock Harris Professor of Geosciences            979-862-4060
> Department of Atmospheric Sciences                       979-862-4466 fax
> Texas A&M University
> 3150 TAMU
> College Station, TX   77843-3150
>
> *http://atmo.tamu.edu/people/faculty/bowmankenneth.html
> <http://atmo.tamu.edu/people/faculty/bowmankenneth.html>*
>
>
>
>
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: