Given that there is essentially no information being given about the file (such
as a header dump - ncdump -hk filename - so as to get both the header info and
the filetype) and no information on how he is trying to read the data, it is
almost impossible to say what is going on.
-Roy
On Sep 26, 2014, at 3:26 PM, "Moroni, David F (398M)"
<David.F.Moroni@xxxxxxxxxxxx> wrote:
> Dear NetCDF Group,
>
> Please refer to the discussion below as to the nature of the netCDF
> performance concerns identified by Bill Rossow, who has been CC'd here.
>
> Please address any questions you have as to the nature of this issue with
> Bill directly.
>
> Respectfully,
> David
>
> ==================================================
> David Moroni
> Ocean Wind and Scatterometry Data Engineer
> Physical Oceanography Distributed Active Archive Center
> Jet Propulsion Laboratory
> 4800 Oak Grove Dr
> M/S 158-242
> Pasadena, CA 91109
> Phone: 818.354.2038
> Fax: 818.353.2718
> ==================================================
>
> From: <Moroni>, David F Moroni <David.F.Moroni@xxxxxxxxxxxx>
> Date: Friday, September 26, 2014 3:20 PM
> To: William Rossow <wbrossow@xxxxxxxxx>
> Cc: "esdswg-interoperability@xxxxxxxxxxxxxx"
> <esdswg-interoperability@xxxxxxxxxxxxxx>
> Subject: Re: [DIWG] netCDF I/O performance concerns
>
>> Hi Bill,
>>
>> To better diagnose your issue, it's good to know your baseline technical
>> approach.
>>
>> For starters, it seems like you are working with large volume data files,
>> where as you put it "data size precludes the second" option of storing all
>> of the data into an array. NetCDF allows generous flexibility to precisely
>> read in specific data variables as well as their specific array elements
>> (think of the concept of subsetting) without having to read in the entire
>> file or the entire variable array into memory. You can go even further with
>> this and use OPeNDAP servers, where you can remotely download and read into
>> memory the very specific bits of data that are needed for processing without
>> having to download the entire netCDF file.
>>
>> I've worked with both flat binary files and netCDF files simultaneously with
>> the various types of data processing code, and aside from memory consumption
>> issues which can be addressed by either hardware or software configurations,
>> I've never observed significant differences in processing speed based upon
>> the source of the data.
>>
>> This is why I'm intrigued by the issue you've observed and think it warrants
>> further investigation into the technical side of your approach.
>>
>> Cheers,
>> David
>>
>> From: William Rossow <wbrossow@xxxxxxxxx>
>> Date: Friday, September 26, 2014 3:03 PM
>> To: David F Moroni <David.F.Moroni@xxxxxxxxxxxx>
>> Subject: Re: netCDF I/O performance concerns
>>
>>> David, All too technical... the basic point is that both of the approaches
>>> you mention are impractical... data size precludes the second and
>>> performance really drags with the first.
>>>
>>> On Fri, Sep 26, 2014 at 6:00 PM, Moroni, David F (398M)
>>> <David.F.Moroni@xxxxxxxxxxxx> wrote:
>>>> Bill,
>>>>
>>>> So to understand more clearly, are you reading the data from the netCDF
>>>> file iteratively using "for" or "do" loops, or are you first storing the
>>>> netCDF data into either a static or dynamic array?
>>>>
>>>> The first option above would conserve memory but would take longer to
>>>> process due to more CPU cycles being required; the latter option would be
>>>> much faster (i.e., should be more comparable to reading directly from a
>>>> flat binary file) but would require more system memory.
>>>>
>>>> Also, when you encountered these performance issues, were you using netCDF
>>>> data following the "classic" (i.e., version 3 or below) or the "extended"
>>>> (i.e., netCDF-4 following the hierarchical HDF-5 data model)?
>>>>
>>>> There are distinct differences between these types of netCDF data models.
>>>> The netCDF "classic" model is essentially a flat binary file wrapped with
>>>> self-describing ASCII metadata and an ASCII header. The netCDF "extended"
>>>> model is hierarchical, which uses groups to store the data arrays. I
>>>> haven't examined this myself, but it could be that there are some
>>>> performance differences between the "extended" and "classic" data models
>>>> due the simplicity of a flat data structure versus a multi-tiered data
>>>> structure.
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>
>>>> From: William Rossow <wbrossow@xxxxxxxxx>
>>>> Date: Friday, September 26, 2014 2:45 PM
>>>> To: David F Moroni <David.F.Moroni@xxxxxxxxxxxx>
>>>> Cc: "esdswg-interoperability@xxxxxxxxxxxxxx"
>>>> <esdswg-interoperability@xxxxxxxxxxxxxx>
>>>> Subject: Re: netCDF I/O performance concerns
>>>>
>>>>> David, This issue is not a strict computer performance or software
>>>>> interaction issue; it is an issue that arises for performing calculations
>>>>> involving many co-located, coincident variables that are extensive (large
>>>>> space-time scope). It is just that all of these "graphics" formats are
>>>>> arranged badly in this case.
>>>>>
>>>>> On Fri, Sep 26, 2014 at 5:08 PM, Moroni, David F (398M)
>>>>> <David.F.Moroni@xxxxxxxxxxxx> wrote:
>>>>>> Hi Bill,
>>>>>>
>>>>>> In response to the issue you raised with netCDF I/O performance with
>>>>>> your Fortran code, I've already contacted Ethan Davis at Unidata to see
>>>>>> if this has already been captured as a known issue and if there is a fix
>>>>>> for this.
>>>>>>
>>>>>> In the meantime, I thought it would be worthwhile to connect you with
>>>>>> the ESDSWG Interoperability working group to ensure this matter is also
>>>>>> on their radar screen. There may be a member within this group who has
>>>>>> wrestled with the same issue you've encountered, so my hope is that a
>>>>>> solution may already exist, but if not at least it could potentially be
>>>>>> on the horizon.
>>>>>>
>>>>>> Best Regards,
>>>>>> David
>>>>>>
>>>>>> ==================================================
>>>>>> David Moroni
>>>>>> Ocean Wind and Scatterometry Data Engineer
>>>>>> Physical Oceanography Distributed Active Archive Center
>>>>>> Jet Propulsion Laboratory
>>>>>> 4800 Oak Grove Dr
>>>>>> M/S 158-242
>>>>>> Pasadena, CA 91109
>>>>>> Phone: 818.354.2038
>>>>>> Fax: 818.353.2718
>>>>>> ==================================================
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dr. William B. Rossow
>>>>> Distinguished Professor of Remote Sensing
>>>>> CREST at The City College of New York
>>>>> Steinman Hall (T-107)
>>>>> 140th Street and Convent Avenue
>>>>> New York, NY 10031
>>>>> 1-212-650-5389
>>>>> wbrossow@xxxxxxxxxxxxx
>>>
>>>
>>>
>>> --
>>> Dr. William B. Rossow
>>> Distinguished Professor of Remote Sensing
>>> CREST at The City College of New York
>>> Steinman Hall (T-107)
>>> 140th Street and Convent Avenue
>>> New York, NY 10031
>>> 1-212-650-5389
>>> wbrossow@xxxxxxxxxxxxx
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
**********************
"The contents of this message do not reflect any position of the U.S.
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: Roy.Mendelssohn@xxxxxxxx www: http://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.