On 9/22/2011 8:18 AM, Roland Schweitzer wrote:
On 09/22/2011 09:13 AM, Roland Schweitzer wrote:
Hi,
Some folks at NCAR have put together a THREDDS catalog
(http://tds.prototype.ucar.edu/thredds/esgcet/catalog.xml) which I
would like read to prepare configuration information for LAS. The
catalog consists of 3000+ catalogRef elements that point to other
local catalogs. When running through this catalog doing the obvious
thing:
List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator();
iterator.hasNext();) {
InvDataset invDataset = (InvDataset) iterator.next();
System.out.println("\t"+invDataset.getName());
}
Addendum:
Of course, you have to actually look at the datasets in the
sub-catalogs to have the dataset in the catalogRef read... Like this:
List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator();
iterator.hasNext();) {
InvDataset invDataset = (InvDataset) iterator.next();
System.out.println("\t"+invDataset.getName());
List<InvDataset> subDatasets = invDataset.getDatasets();
for (Iterator<InvDataset> subIt =
subDatasets.iterator(); subIt .hasNext();) {
InvDataset subDataset = (InvDataset) subIt.next();
System.out.println("\t\t"+subDataset.getName());
}
}
But the point is the same.
the JVM heap gets larger when each successive dataset (catalogRef) is
read as observed by setting the options to log the garbage collection
on the JVM. This makes sense in that the catalogRef gets read and
the information gets kept in memory. The problem is that eventually
you will run out of heap. When you run out depends on how much
memory you give the JVM.
If folks are going to be publishing catalogs this large, we need some
way to read them in a memory efficient way. I know that once I reach
the bottom of the loop I'm finished with that dataset and it would be
ok with me to boot it out of memory, but I haven't figured out a
clever way to do that.
What are the options for reading such a large catalog using the
Java-netCDF tools?
Roland
TDS >= 4.2.8 has a new option to turn off catalog caching, added to
support ESG catalogs. Use it when you have a large number of static
catalogs to minimize memory use.
It seems to work AFAICT, with minor performance penalty.
http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/ThreddsConfigXMLFile.html#CatalogCaching