Hi Tom-
On 6/21/12 3:13 PM, Tom Kunicki wrote:
Do your files happen to have an unlimited dimension when it is not
required?
Probably most have an unlimited time dimension - in some cases the files
are still being appened to, in others not. For context, I'm looking at
this in the RAMADDA context, but I would assume that TDS has the same
issues, since they use (essentially) the same code.
In the past we've had a performance issues dealing with static data
sets only to later realize the slow load times where due to the
reading of data associated with an unlimited dimension (i.e. "time").
When a dimension is unlimited the values associated with it are
stored sparsely though out the the file. Converting the unlimited
dimension to fixed significantly increased time-to-open these files
(i.e. the values for the "time" axis are stored contiguously, no
longer sparsely). You'll want unlimited if you intend to append data
along that dimension to the file in the future, otherwise make sure
it's fixed if you are concerned about performance on initial open.
That makes sense and if it has to seek far into the 3.2 GB file, I can
see where that would matter. However, I still think most of the time is
related to OS caching. For example, on my 3.2 GB file (with an
unlimited dimension), the first time I run my sample program, it takes
~50 seconds to open the file using either method (FeatureDataSet or
GridDataset). I exit the program (so there's no VM/netCDF caching) and
run it again and it takes < .5 seconds.
Don
Tom Kunicki Center for Integrated Data Analytics U.S. Geological
Survey 8505 Research Way Middleton, WI 53562
On Jun 21, 2012, at 4:13 PM, Don Murray wrote:
Just as a followup, the attached program tests the speed of opening
a file using the method in FeatureScan vs. GridDataset.open. In my
test, the latter is actually faster by a few milliseconds. The
real slowdown is the initial os caching of the file (in this case a
3.3 GB file). Once the file is in the OS cache, both methods are
pretty quick.
Thanks to John (and Roland) for their help.
Don
On 6/20/12 8:14 PM, John Caron wrote:
On 6/19/2012 3:19 PM, Don Murray wrote:
Hi-
I have a bunch of netCDF files and I want to quickly determine
whether they are grids, trajectories, or point features. For
grids, I've been using GridDataset gds = GridDataset.open(path)
and catch the exception if it's not a grid, but for a 3.3 GB
file, that can take 2 minutes (or longer) to open and create
the dataset if it is a grid. I was wondering if there's a
quicker method of determining the feature type of a netCDF
file.
Thanks for your help.
Don
Hi Don:
The most convenient thing is to use ToolsUI / FeatureTypes /
FeatureScan, and give it a file or directory. It will try to
figure out the type and report on what it finds.
The code is in ucar.nc2.ft.scan.FeatureScan.java, you can copy
the parts you need.
Its an ongoing process, i think im not doing it as well as it can
be done. Send me reports on files it misidentifies.
John
_______________________________________________ netcdf-java
mailing list netcdf-java@xxxxxxxxxxxxxxxx For list information or
to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
-- Don Murray NOAA/ESRL/PSD and CIRES 303-497-3596
http://www.esrl.noaa.gov/psd/people/don.murray/
<TestOpen.java>_______________________________________________
netcdf-java mailing list netcdf-java@xxxxxxxxxxxxxxxx For list
information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
--
Don Murray
NOAA/ESRL/PSD and CIRES
303-497-3596
http://www.esrl.noaa.gov/psd/people/don.murray/