Hi,
Some folks at NCAR have put together a THREDDS catalog
(http://tds.prototype.ucar.edu/thredds/esgcet/catalog.xml) which I would
like read to prepare configuration information for LAS. The catalog
consists of 3000+ catalogRef elements that point to other local
catalogs. When running through this catalog doing the obvious thing:
List<InvDataset> datasets = catalog.getDatasets();
for (Iterator<InvDataset> iterator = datasets.iterator();
iterator.hasNext();) {
InvDataset invDataset = (InvDataset) iterator.next();
System.out.println("\t"+invDataset.getName());
}
the JVM heap gets larger when each successive dataset (catalogRef) is
read as observed by setting the options to log the garbage collection on
the JVM. This makes sense in that the catalogRef gets read and the
information gets kept in memory. The problem is that eventually you
will run out of heap. When you run out depends on how much memory you
give the JVM.
If folks are going to be publishing catalogs this large, we need some
way to read them in a memory efficient way. I know that once I reach
the bottom of the loop I'm finished with that dataset and it would be ok
with me to boot it out of memory, but I haven't figured out a clever way
to do that.
What are the options for reading such a large catalog using the
Java-netCDF tools?
Roland