Just to add a few notes. Having very large and many catalogs is one of
the things we want to handle in a refactor. While we do need to find all
"data roots", we dont need to cache the catalogs, thats a performance
optimisation that should be configurable.
The main problem is the memory used by caching bloated catalog objects.
We have the start of a catalog refactor (thredds.catalog2 if anyone
wants to have a look) in which the catalog objects are much lighter
weight and generally more better. Probably we would use ehcache for
caching. This is "scheduled" for the 4.3 release.
From another POV, we have always tried to obviate large/many catalogs
with things like datasetScan, and now featureCollection elements. But
there are obviously good reasons for users to generate them.
Anyway, I would welcome experience reports and advice.
On 1/4/2011 9:35 AM, Roland Schweitzer wrote:
Thanks John. Among the groups we collaborate with there are some
folks that are quite concerned about the scaling issue. Personally,
my direct experience at this point that indicates that the performance
is just fine (at least so far) even with our largest catalogs.
What's the experience of the list? Are folks seeing unacceptable TDS
initialization because of time spend reading catalogs? The thread
from John Maurer about aggregation access issues notwithstanding.
Roland
On 01/03/2011 07:34 PM, John Caron wrote:
On 1/3/2011 10:53 AM, Roland Schweitzer wrote:
Hi,
We're starting to put together some "big" server-side configuration
catalogs (both with "lots" of dataset elements and "lots" of
catalogRef elements). We are wondering about the process TDS goes
through to read the catalog when is starts. What gets cached? Does
it have a way to know a referenced catalog is unchanged? When do
referenced catalogs get scanned? And so on.
Is there some documentation or a flow chart on how TDS initializes
itself?
Thanks,
Roland
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
Hi Roland:
The sad answer is theres not much documentation. Weve been on the
verge of redoing the initialization sequence for a few years now, so
weve been waiting so we can document the clean, cool refactor instead
of the crufty, lame current one.
Anyway, the TDS reads in all the config catalogs at startup. It
caches all of them, and uses the "expires" attribute on the catalog
to decide if/when it needs to reread a catalog. It needs to read all
catalogs, including catalogRef, because it has to know what the
possible dataset URLs are, and there is no contract that a client has
to read a catalog first.
Obviously this doesnt scale forever. Ethan can probably fill in some
details.
see:
http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/v1.0.2/InvCatalogSpec.html#catalog
John
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/