Hi Kevin,
Sorry for the delay in responding–I was busy with the release of 4.6.6–but
I have some time to work on this issue now. A couple questions:
1. What does your webapp do? It sounds like it takes a user-defined subset
of the data in a NetCDF file and returns it in JSON format. How similar is
it to our NetCDF Subset Service (example
<http://thredds.ucar.edu/thredds/ncss/grib/NCEP/NAM/Alaska_11km/Best/dataset.html>
)?
2. What version of NetCDF-Java are you using. I suspect that much of the
slowness you're encountering was already fixed
<https://github.com/cwardgar/thredds/commit/075e9a819ee10714d53b355481a7cccac88b1fb9#diff-99981060deed76f1a9ddedc4362acd7fL155>
in v4.6.5.
Cheers,
Christian
On Wed, Jun 8, 2016 at 4:17 PM, Kevin Off - NOAA Affiliate <
kevin.off@xxxxxxxx> wrote:
> Hi all,
>
> I am trying to understand caching when it comes to the file and the actual
> data. The application that I am working on will provide data from 133
> NetCDF files that range in size from 50 MB to 400 MB. These are weather
> forecast files that contain about 22 variables that we are interested in.
> Each variable has between 1 and 55 or so time steps as dimensions.
>
> This is a Spring web application running in an embedded tomcat instance.
> All of the files on disk amount to about 22GB of data.
>
> When I receive a request I:
>
> 1. Re-project the lat lon to the dataset's projection (Lambert
> Convormal)
> 2. Lookup the index of the data from the coordinate variabls
> 3. loop through every variable
> 4. Perform the Array a = var.read()
> 5. Loop through every time step and retrieve the value at the
> specified point
> 6. Return it all in a JSON document.
>
> This application needs to be extremely fast. We will be serving thousands
> of requests per second (in production on a scaled system) depending on
> weather conditions.
>
> I have been told that hardware is not an obstacle and that I can use as
> much memory as I need.
> During my coding and debugging I have been able to achieve a response time
> of about 200ms - 400ms on average (this does not include any network time).
> As I add timers to every part of the application I find that most of the
> time is spent in the Variable.read() function.
>
> Here is a summary of the the configuration of the app.
>
> NetcdfDataset.initNetcdfFileCache(100, 200, 0);
> NetcdfDataset nc = NetcdfDataset.acquireDataset(filename, null)
> for each coverage{
> Variable v = ds.findVariable(name)
> Array d = v.read()
> for each time step {
> value = d.read(time, y, x)
> }
> }
> nc.close()
>
> I have several questions.
>
> 1. I noticed that when the NetcdfDataset.close() function is called it
> detects that I am using caching and performs releases. This causes the
> IOServiceProvider (AbstractIOServiceProvider).release() to be called which
> closes and nulls the RandomAccessFile. Then, next time that
> NetcdfDataset.acquireDataset() is called it causes the
> FileCache.acquireCacheOnly() to return null because the cached
> NetcdfDataset.raf (RandomAccessFile) is null so it makes the lastModified =
> 0. Am I missing something or is there no way to reuse the NetcdfDataset
> after you call close()?
> 2. What does NetcdfDataset.acquireDataset() actually cache? Is it just
> the metadata or does it actually read in the data to all of the variables?
> 3. Can I avoid having to do a Variable.read() for every request?
> Shouldn't this data be cached inside of the netcdf file.
> 4. I see that there are caching functions on the Variable object.
> Should I be using those caching options and just storing those Variable
> objects in memory in my own cache instead.
> 5. Would it be a better option to use NetcdfFile.openInMemory().
>
> I know this is a bit long winded but I just want to make sure to explore
> all of my options. I have spent a lot of time stepping through the ucar
> library and have already learned a lot. I just need a little guidance
> regarding some of the more abstract caching functionality. Thanks for your
> help.
>
> --
> Kevin Off
> Internet Dissemination Group, Kansas City
> Shared Infrastructure Services Branch
> National Weather Service
> Software Engineer / Ace Info Solutions, Inc.
> <http://www.aceinfosolutions.com>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>