Re: [thredds] TDS initialization

Just to add a few notes. Having very large and many catalogs is one of the things we want to handle in a refactor. While we do need to find all "data roots", we dont need to cache the catalogs, thats a performance optimisation that should be configurable.

The main problem is the memory used by caching bloated catalog objects. We have the start of a catalog refactor (thredds.catalog2 if anyone wants to have a look) in which the catalog objects are much lighter weight and generally more better. Probably we would use ehcache for caching. This is "scheduled" for the 4.3 release.

From another POV, we have always tried to obviate large/many catalogs with things like datasetScan, and now featureCollection elements. But there are obviously good reasons for users to generate them.

Anyway, I would welcome experience reports and advice.

On 1/4/2011 9:35 AM, Roland Schweitzer wrote:
Thanks John. Among the groups we collaborate with there are some folks that are quite concerned about the scaling issue. Personally, my direct experience at this point that indicates that the performance is just fine (at least so far) even with our largest catalogs.

What's the experience of the list? Are folks seeing unacceptable TDS initialization because of time spend reading catalogs? The thread from John Maurer about aggregation access issues notwithstanding.

Roland

On 01/03/2011 07:34 PM, John Caron wrote:
On 1/3/2011 10:53 AM, Roland Schweitzer wrote:
Hi,

We're starting to put together some "big" server-side configuration catalogs (both with "lots" of dataset elements and "lots" of catalogRef elements). We are wondering about the process TDS goes through to read the catalog when is starts. What gets cached? Does it have a way to know a referenced catalog is unchanged? When do referenced catalogs get scanned? And so on.

Is there some documentation or a flow chart on how TDS initializes itself?

Thanks,
Roland

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/

Hi Roland:

The sad answer is theres not much documentation. Weve been on the verge of redoing the initialization sequence for a few years now, so weve been waiting so we can document the clean, cool refactor instead of the crufty, lame current one.

Anyway, the TDS reads in all the config catalogs at startup. It caches all of them, and uses the "expires" attribute on the catalog to decide if/when it needs to reread a catalog. It needs to read all catalogs, including catalogRef, because it has to know what the possible dataset URLs are, and there is no contract that a client has to read a catalog first.

Obviously this doesnt scale forever. Ethan can probably fill in some details.

see:
http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/v1.0.2/InvCatalogSpec.html#catalog

John

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/



  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: