Hi Eric, Nathan:
On 1/4/2011 1:29 PM, Eric Nienhouse wrote:
Hi Roland, John, Ethan,
I'm sorry for not posting this to the thredds list, which I am happy
to do. However, I thought I would raise this to you first as it
relates to the TDS and performance in our environment here in
CISL/VETS at NCAR.
I like to post to that group so others can follow along and see if it
also applies to them, so Im cc'ing there.
We're close to the 2 million file mark in one of our production ESG
TDS servers (which supports www.earthsystemgrid.org.) I can get you
the specs on our machine running this service (~2 year old AMD
multi-core CentOS). In our experience, it takes about 5 minutes to
initialize the TDS from the underlying thredds catalogs. There are
many catlog refs, all for local catalog files, which represent about
3200 datasets over ~2800 catalog files. (I can provide more detail if
you would like. The service is at: tds.ucar.edu/thredds)
Could you send me a typical config catalog, so I get a sense of what you
are doing?
This service requires ~30Gb of JVM memory to successfully initialize,
which is a scalability concern for us.
yes indeed
We re-init the TDS often during a new data publication process. We
find after some number of re-inits (likely 50 - 200) the TDS will
re-initialize *very slowly*, often taking hours to re-init. I
speculate this is due to memory resources and perhaps "perm gen" space
with the tomcat / JVM process and/or GC thrashing.
yes, you need to restart Tomcat when/before that happens. Apparently,
Tomcat 7 may be better, but we havent tested yet.
BTW, in the latest TDS 4.2 reinit is a little flaky, though I expect it
will work for your case. Let me know if you see problems (besides the
permgen problem).
How often do you reinit?
We're anticipating at least double the number of files will be served
at NCAR due to CMIP5 modeling efforts over the next 18 months.
We've considered some possible solutions to the eventual, slow load
such as:
1) Restarting the TDS routinely.
2) "partitioning" TDS instances and thereby the files over multiple
processes or hosts.
We're curious, too, if there may be some tuning we could do w.r.t. the
TDS that may help the situation (so far we've only increased JVM heap
memory.) Do you have any initial recommendations?
At the moment we dont have any tuning for this, but I think a quick fix
is to add the ability to not cache the catalogs, but read them each
time, maybe by setting the "expires" attribute or adding a "cache"
attribute. Better would be to use an LRU cache like ehcache, but that
will take longer to implement.
This wont help the startup time that much (it will help some), mostly
the memory use.
To improve startup time we need caching of the info in catalogs that
dont change. Do all your catalogs get rewritten, or only the ones that
change (ie can we use lastModified on the OS File to detect changes) ?
John