[netcdfgroup] FW: [DIWG] netCDF I/O performance concerns

Dear NetCDF Group,

Please refer to the discussion below as to the nature of the netCDF performance 
concerns identified by Bill Rossow, who has been CC'd here.

Please address any questions you have as to the nature of this issue with Bill 
directly.

Respectfully,
David

==================================================
David Moroni
Ocean Wind and Scatterometry Data Engineer
Physical Oceanography Distributed Active Archive Center
Jet Propulsion Laboratory
4800 Oak Grove Dr
M/S 158-242
Pasadena, CA 91109
Phone:  818.354.2038
Fax:  818.353.2718
==================================================

From: <Moroni>, David F Moroni 
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>>
Date: Friday, September 26, 2014 3:20 PM
To: William Rossow <wbrossow@xxxxxxxxx<mailto:wbrossow@xxxxxxxxx>>
Cc: 
"esdswg-interoperability@xxxxxxxxxxxxxx<mailto:esdswg-interoperability@xxxxxxxxxxxxxx>"
 
<esdswg-interoperability@xxxxxxxxxxxxxx<mailto:esdswg-interoperability@xxxxxxxxxxxxxx>>
Subject: Re: [DIWG] netCDF I/O performance concerns

Hi Bill,

To better diagnose your issue, it's good to know your baseline technical 
approach.

For starters, it seems like you are working with large volume data files, where 
as you put it "data size precludes the second" option of storing all of the 
data into an array. NetCDF allows generous flexibility to precisely read in 
specific data variables as well as their specific array elements (think of the 
concept of subsetting) without having to read in the entire file or the entire 
variable array into memory. You can go even further with this and use OPeNDAP 
servers, where you can remotely download and read into memory the very specific 
bits of data that are needed for processing without having to download the 
entire netCDF file.

I've worked with both flat binary files and netCDF files simultaneously with 
the various types of data processing code, and aside from memory consumption 
issues which can be addressed by either hardware or software configurations, 
I've never observed significant differences in processing speed based upon the 
source of the data.

This is why I'm intrigued by the issue you've observed and think it warrants 
further investigation into the technical side of your approach.

Cheers,
David

From: William Rossow <wbrossow@xxxxxxxxx<mailto:wbrossow@xxxxxxxxx>>
Date: Friday, September 26, 2014 3:03 PM
To: David F Moroni 
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>>
Subject: Re: netCDF I/O performance concerns

David, All too technical... the basic point is that both of the approaches you 
mention are impractical... data size precludes the second and performance 
really drags with the first.

On Fri, Sep 26, 2014 at 6:00 PM, Moroni, David F (398M) 
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>> wrote:
Bill,

So to understand more clearly, are you reading the data from the netCDF file 
iteratively using "for" or "do" loops, or are you first storing the netCDF data 
into either a static or dynamic array?

The first option above would conserve memory but would take longer to process 
due to more CPU cycles being required; the latter option would be much faster 
(i.e., should be more comparable to reading directly from a flat binary file) 
but would require more system memory.

Also, when you encountered these performance issues, were you using netCDF data 
following the "classic" (i.e., version 3 or below) or the "extended" (i.e., 
netCDF-4 following the hierarchical HDF-5 data model)?

There are distinct differences between these types of netCDF data models. The 
netCDF "classic" model is essentially a flat binary file wrapped with 
self-describing ASCII metadata and an ASCII header. The netCDF "extended" model 
is hierarchical, which uses groups to store the data arrays. I haven't examined 
this myself, but it could be that there are some performance differences 
between the "extended" and "classic" data models due the simplicity of a flat 
data structure versus a multi-tiered data structure.

Cheers,
David


From: William Rossow <wbrossow@xxxxxxxxx<mailto:wbrossow@xxxxxxxxx>>
Date: Friday, September 26, 2014 2:45 PM
To: David F Moroni 
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>>
Cc: 
"esdswg-interoperability@xxxxxxxxxxxxxx<mailto:esdswg-interoperability@xxxxxxxxxxxxxx>"
 
<esdswg-interoperability@xxxxxxxxxxxxxx<mailto:esdswg-interoperability@xxxxxxxxxxxxxx>>
Subject: Re: netCDF I/O performance concerns

David, This issue is not a strict computer performance or software interaction 
issue; it is an issue that arises for performing calculations involving many 
co-located, coincident variables that are extensive (large space-time scope). 
It is just that all of these "graphics" formats are arranged badly in this case.

On Fri, Sep 26, 2014 at 5:08 PM, Moroni, David F (398M) 
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>> wrote:
Hi Bill,

In response to the issue you raised with netCDF I/O performance with your 
Fortran code, I've already contacted Ethan Davis at Unidata to see if this has 
already been captured as a known issue and if there is a fix for this.

In the meantime, I thought it would be worthwhile to connect you with the 
ESDSWG Interoperability working group to ensure this matter is also on their 
radar screen. There may be a member within this group who has wrestled with the 
same issue you've encountered, so my hope is that a solution may already exist, 
but if not at least it could potentially be on the horizon.

Best Regards,
David

==================================================
David Moroni
Ocean Wind and Scatterometry Data Engineer
Physical Oceanography Distributed Active Archive Center
Jet Propulsion Laboratory
4800 Oak Grove Dr
M/S 158-242
Pasadena, CA 91109
Phone:  818.354.2038<tel:818.354.2038>
Fax:  818.353.2718<tel:818.353.2718>
==================================================





--
Dr. William B. Rossow
Distinguished Professor of Remote Sensing
CREST at The City College of New York
Steinman Hall (T-107)
140th Street and Convent Avenue
New York, NY 10031
1-212-650-5389<tel:1-212-650-5389>
wbrossow@xxxxxxxxxxxxx<mailto:wbrossow@xxxxxxxxxxxxx>



--
Dr. William B. Rossow
Distinguished Professor of Remote Sensing
CREST at The City College of New York
Steinman Hall (T-107)
140th Street and Convent Avenue
New York, NY 10031
1-212-650-5389
wbrossow@xxxxxxxxxxxxx<mailto:wbrossow@xxxxxxxxxxxxx>
  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: