Hi Bas,
I'm going to respond to some questions from several of your earlier emails.
1. How do I decide if a URL is a "Collection" or "Atomic"? It seems I
can not count on the "trailing slash", as it is always removed by
thredds.crawlabledataset.CrawlableDatasetFactory.normalizePath(). I
have a nasty solution for now, that involves checking the URL for
known file extensions (like .html, .hdf, .nc, .bz2 etc.). If the
extension is not in my list, the URL is a "Collection" and can be
crawled further.
Well that is kind of a problem. I was trying to keep the paths nice and
clean but there isn't really a good way to tell if an OPeNDAP URL is a
collection. Generally, if they end in "/" they are collections but my
cleaning of the paths screws that up. One problem is that the OPeNDAP
spec doesn't define the dods_dir response as well as it could. Which
leads to another problem, different OPeNDAP server implementations deal
with dods_dir a bit differently. But if I recall correctly the servers
you are looking at are both OPeNDAP C++ servers.
I would stick with the extension test and maybe add a test to see if the
url is a real OPeNDAP dataset by adding the ".dds" extension and seeing
if the value of the HTTP header "Content-Description" is "dods_dds" or
"dods_error". If it is "dods_dds", you don't have a collection.
Sorry this isn't a pretty solution. I'll have to rethink the normalize
stuff but that may be awhile.
One thing I'm sorry I didn't mention earlier and depending on your time
frame. The OPeNDAP folks are working on the Server 4 architecture which
includes automatic generation of THREDDS catalogs. I'm not sure what
their time frame is or what the time frame would be for various server
sites to upgrade. But thought I should mention this.
2. I am using thredds.cataloggen.config.DodsURLExtractor (like the
original code from thredds.cataloggen.config.DodsDirDataSource, from
which I have used parts). You had mentioned that you do not like this
very much. However, it works well. Can I keep using this? Or did you
have something else in mind?
I don't have anything else in mind. Please feel free to continue using.
3. In thredds.examples.MockOpendapDSP, there is an assumption that
only CrawlableDatasetFile exists (in other cases an
SC_INTERNAL_SERVER_ERROR is generated). Considering the java-package,
this seems just example code, so the problem is not critical. Do you
think there are other places where a CrawlableDataset other than
CrawlableDatasetFile is unexpected?
That is correct, the MockOpendapDSP is just example code.
In terms of generating catalogs, I believe there shouldn't be any other
locations where CrawlableDatasetFile is assumed.
In terms of serving datasets from the TDS, we do currently assume
CrawlableDatasetFile in some places (but do plan on getting rid of that
assumption). I had been assuming that you would be setting things up so
that the generated catalogs would point to the remote server for OPeNDAP
access. Do I have that wrong?
I have looked a bit further for possible mistakes in my Java code and
in my catalog.xml . In the tag <serviceName> of <datasetScan> I have
seen that I need to put something useful. However, I did not discover
yet how this field is used, thus whether I should choose a Compound,
OpenDAP or HTTPServer service, or something else, or that it does not
matter as long as it is != null.
This kind of relates to my comment just above about how you want your
catalog to point to the dataset. The <serviceName> needs to reference a
<service> that is at the top of your config catalog. The content of
<serviceName> must match an existing <service> element name attribute,
e.g., <service name="remoteOPeNDAP" ... /> ...
<serviceName>remoteOPeNDAP</serviceName>. The service that is referenced
gives information about how to access datasets, the base URL is combined
with the dataset urlPath. Here's a reference on how to build access URLs
from THREDDS catalogs,
http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html#constructingURLs.
Unfortunately, so far I have no progress. I get the same problem as
last Friday, which is that I can not get my new class
CrawlableDatasetDods to be called.
If you have remote management set up on your TDS
(http://motherlode.ucar.edu:8080/thredds/docs/RemoteManagement.html),
you can go to the http://your.server:port/thredds/debug and set the
log4j logging levels for select packages. You might bump all the
thredds.cataloggen stuff up to "ALL" and see if you get any hints.
I'm out of the office again (sorry, kind of a hectic summer vacation
schedule :-) until next Wednesday. Feel free to send more questions.
Just wanted to let you know my response might be a bit delayed.
Enjoy your weekend.
Ethan
--
Ethan R. Davis Telephone: (303) 497-8155
Software Engineer Fax: (303) 497-8690
UCAR Unidata Program Center E-mail: edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO 80307-3000 http://www.unidata.ucar.edu/
---------------------------------------------------------------------------