Re: [netcdf-java] fastest way to determine feature type

Hi Tom-

On 6/21/12 3:13 PM, Tom Kunicki wrote:
Do your files happen to have an unlimited dimension when it is not
required?

Probably most have an unlimited time dimension - in some cases the files are still being appened to, in others not. For context, I'm looking at this in the RAMADDA context, but I would assume that TDS has the same issues, since they use (essentially) the same code.

In the past we've had a performance issues dealing with static data
sets only to later realize the slow load times where due to the
reading of data associated with an unlimited dimension (i.e. "time").
When a dimension is unlimited the values associated with it are
stored sparsely though out the the file. Converting the unlimited
dimension to fixed significantly increased time-to-open these files
(i.e. the values for the "time" axis are stored contiguously, no
longer sparsely).  You'll want unlimited if you intend to append data
along that dimension to the file in the future, otherwise make sure
it's fixed if you are concerned about performance on initial open.

That makes sense and if it has to seek far into the 3.2 GB file, I can see where that would matter. However, I still think most of the time is related to OS caching. For example, on my 3.2 GB file (with an unlimited dimension), the first time I run my sample program, it takes ~50 seconds to open the file using either method (FeatureDataSet or GridDataset). I exit the program (so there's no VM/netCDF caching) and run it again and it takes < .5 seconds.

Don

Tom Kunicki Center for Integrated Data Analytics U.S. Geological
Survey 8505 Research Way Middleton, WI  53562

On Jun 21, 2012, at 4:13 PM, Don Murray wrote:

Just as a followup, the attached program tests the speed of opening
a file using the method in FeatureScan vs. GridDataset.open.  In my
test, the latter is actually faster by a few milliseconds.  The
real slowdown is the initial os caching of the file (in this case a
3.3 GB file). Once the file is in the OS cache, both methods are
pretty quick.

Thanks to John (and Roland) for their help.

Don

On 6/20/12 8:14 PM, John Caron wrote:
On 6/19/2012 3:19 PM, Don Murray wrote:
Hi-

I have a bunch of netCDF files and I want to quickly determine
whether they are grids, trajectories, or point features.  For
grids, I've been using GridDataset gds = GridDataset.open(path)
and catch the exception if it's not a grid, but for a 3.3 GB
file, that can take 2 minutes (or longer) to open and create
the dataset if it is a grid.  I was wondering if there's a
quicker method of determining the feature type of a netCDF
file.

Thanks for your help.

Don

Hi Don:

The most convenient thing is to use ToolsUI / FeatureTypes /
FeatureScan, and give it a file or directory. It will try to
figure out the type and report on what it finds.

The code is in ucar.nc2.ft.scan.FeatureScan.java, you can copy
the parts you need.

Its an ongoing process, i think im not doing it as well as it can
be done. Send me reports on files it misidentifies.

John

_______________________________________________ netcdf-java
mailing list netcdf-java@xxxxxxxxxxxxxxxx For list information or
to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/

-- Don Murray NOAA/ESRL/PSD and CIRES 303-497-3596
http://www.esrl.noaa.gov/psd/people/don.murray/


<TestOpen.java>_______________________________________________
netcdf-java mailing list netcdf-java@xxxxxxxxxxxxxxxx For list
information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/




--
Don Murray
NOAA/ESRL/PSD and CIRES
303-497-3596
http://www.esrl.noaa.gov/psd/people/don.murray/




  • 2012 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: