Dear NetCDF Group,
Please refer to the discussion below as to the nature of the netCDF performance
concerns identified by Bill Rossow, who has been CC'd here.
Please address any questions you have as to the nature of this issue with Bill
directly.
Respectfully,
David
==================================================
David Moroni
Ocean Wind and Scatterometry Data Engineer
Physical Oceanography Distributed Active Archive Center
Jet Propulsion Laboratory
4800 Oak Grove Dr
M/S 158-242
Pasadena, CA 91109
Phone: 818.354.2038
Fax: 818.353.2718
==================================================
From: <Moroni>, David F Moroni
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>>
Date: Friday, September 26, 2014 3:20 PM
To: William Rossow <wbrossow@xxxxxxxxx<mailto:wbrossow@xxxxxxxxx>>
Cc:
"esdswg-interoperability@xxxxxxxxxxxxxx<mailto:esdswg-interoperability@xxxxxxxxxxxxxx>"
<esdswg-interoperability@xxxxxxxxxxxxxx<mailto:esdswg-interoperability@xxxxxxxxxxxxxx>>
Subject: Re: [DIWG] netCDF I/O performance concerns
Hi Bill,
To better diagnose your issue, it's good to know your baseline technical
approach.
For starters, it seems like you are working with large volume data files, where
as you put it "data size precludes the second" option of storing all of the
data into an array. NetCDF allows generous flexibility to precisely read in
specific data variables as well as their specific array elements (think of the
concept of subsetting) without having to read in the entire file or the entire
variable array into memory. You can go even further with this and use OPeNDAP
servers, where you can remotely download and read into memory the very specific
bits of data that are needed for processing without having to download the
entire netCDF file.
I've worked with both flat binary files and netCDF files simultaneously with
the various types of data processing code, and aside from memory consumption
issues which can be addressed by either hardware or software configurations,
I've never observed significant differences in processing speed based upon the
source of the data.
This is why I'm intrigued by the issue you've observed and think it warrants
further investigation into the technical side of your approach.
Cheers,
David
From: William Rossow <wbrossow@xxxxxxxxx<mailto:wbrossow@xxxxxxxxx>>
Date: Friday, September 26, 2014 3:03 PM
To: David F Moroni
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>>
Subject: Re: netCDF I/O performance concerns
David, All too technical... the basic point is that both of the approaches you
mention are impractical... data size precludes the second and performance
really drags with the first.
On Fri, Sep 26, 2014 at 6:00 PM, Moroni, David F (398M)
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>> wrote:
Bill,
So to understand more clearly, are you reading the data from the netCDF file
iteratively using "for" or "do" loops, or are you first storing the netCDF data
into either a static or dynamic array?
The first option above would conserve memory but would take longer to process
due to more CPU cycles being required; the latter option would be much faster
(i.e., should be more comparable to reading directly from a flat binary file)
but would require more system memory.
Also, when you encountered these performance issues, were you using netCDF data
following the "classic" (i.e., version 3 or below) or the "extended" (i.e.,
netCDF-4 following the hierarchical HDF-5 data model)?
There are distinct differences between these types of netCDF data models. The
netCDF "classic" model is essentially a flat binary file wrapped with
self-describing ASCII metadata and an ASCII header. The netCDF "extended" model
is hierarchical, which uses groups to store the data arrays. I haven't examined
this myself, but it could be that there are some performance differences
between the "extended" and "classic" data models due the simplicity of a flat
data structure versus a multi-tiered data structure.
Cheers,
David
From: William Rossow <wbrossow@xxxxxxxxx<mailto:wbrossow@xxxxxxxxx>>
Date: Friday, September 26, 2014 2:45 PM
To: David F Moroni
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>>
Cc:
"esdswg-interoperability@xxxxxxxxxxxxxx<mailto:esdswg-interoperability@xxxxxxxxxxxxxx>"
<esdswg-interoperability@xxxxxxxxxxxxxx<mailto:esdswg-interoperability@xxxxxxxxxxxxxx>>
Subject: Re: netCDF I/O performance concerns
David, This issue is not a strict computer performance or software interaction
issue; it is an issue that arises for performing calculations involving many
co-located, coincident variables that are extensive (large space-time scope).
It is just that all of these "graphics" formats are arranged badly in this case.
On Fri, Sep 26, 2014 at 5:08 PM, Moroni, David F (398M)
<David.F.Moroni@xxxxxxxxxxxx<mailto:David.F.Moroni@xxxxxxxxxxxx>> wrote:
Hi Bill,
In response to the issue you raised with netCDF I/O performance with your
Fortran code, I've already contacted Ethan Davis at Unidata to see if this has
already been captured as a known issue and if there is a fix for this.
In the meantime, I thought it would be worthwhile to connect you with the
ESDSWG Interoperability working group to ensure this matter is also on their
radar screen. There may be a member within this group who has wrestled with the
same issue you've encountered, so my hope is that a solution may already exist,
but if not at least it could potentially be on the horizon.
Best Regards,
David
==================================================
David Moroni
Ocean Wind and Scatterometry Data Engineer
Physical Oceanography Distributed Active Archive Center
Jet Propulsion Laboratory
4800 Oak Grove Dr
M/S 158-242
Pasadena, CA 91109
Phone: 818.354.2038<tel:818.354.2038>
Fax: 818.353.2718<tel:818.353.2718>
==================================================
--
Dr. William B. Rossow
Distinguished Professor of Remote Sensing
CREST at The City College of New York
Steinman Hall (T-107)
140th Street and Convent Avenue
New York, NY 10031
1-212-650-5389<tel:1-212-650-5389>
wbrossow@xxxxxxxxxxxxx<mailto:wbrossow@xxxxxxxxxxxxx>
--
Dr. William B. Rossow
Distinguished Professor of Remote Sensing
CREST at The City College of New York
Steinman Hall (T-107)
140th Street and Convent Avenue
New York, NY 10031
1-212-650-5389
wbrossow@xxxxxxxxxxxxx<mailto:wbrossow@xxxxxxxxxxxxx>