Hi Bas,
I'm afraid it isn't as easy as getting a working config file. The catGen
servlet used to support crawling an OPeNDAP/DODS server but no one was
using it and the code
(src/thredds/cataloggen/config/DodsDirDatasetSource.java and related
files) did not evolve with the rest of the catGen servlet and so it no
longer works.
Basically, what DodsDirDataSource did was take a DODS_dir URL (e.g.,
http://reason.gsfc.nasa.gov/opendap-bin/nph-dods/FTP_DATA/Giovanni/OPS/TOMS/EP/)
and scrape the returned HTML for further URLs to crawl. The scraping was
done by thredds.cataloggen.config.DodsURLExtractor. Not terribly elegant
but it got the job done. One problem is that dods_dir responses are not
very standardized so it might need munging for different OPeNDAP servers.
This document might be useful when you start on implementing a
CrawlableDataset to deal with dods_dir pages:
http://www.unidata.ucar.edu/projects/THREDDS/tech/cataloggen/devel/
I hope this is helpful. Sorry the existing code isn't really working.
Let me know if I can be of further assistance.
Ethan
Bas Retsios wrote:
Hello Ethan,
Valentijn Venus asked me to assist him in getting the remote DODS
catalog generator running. He forwarded me your recent email
communication (see below).
We have spent alot of time trying to accomplish this with the
cataloggen servlet, in Thredds 3.10. As far as I understood it is a
matter of adding a correct XML file as a new task with
http://<servername>:8080/thredds/cataloggen/admin/ , and have
"Period(minutes)" > 0.
Unfortunately I do not have a proper working example of an XML file
that crawls a remote DODS server. The existing example (under
config/examples/cetGenConf.exampleDods.xml) does nothing, perhaps
because the links in that file are outdated. "Does nothing" means that
after the "Initial Delay", the "Resulting Catalog" is an XML file of 9
lines, which (to my opinion) does not seem to contain any useful
information.
Could you please send me a working version of
config/examples/cetGenConf.exampleDods.xml that I could try? (I'll
also be happy with another example, as long it is about remode DODS
data). Or if you suspect there is another problem in our approach,
give another hint?
Our intention was to follow your suggestion and transform the
DodsDirDataSource into a CrawlableDataset, but first I would like to
understand how the original "cataloggen servlet" code works.
Best regards,
Bas
--
Ir. V. (Bas) Retsios
Software Developer
IT Department, Sector Remote Sensing & GIS
International Institute for Geo-information Science and Earth Observation (ITC)
P.O. Box 6, 7500 AA Enschede, The Netherlands
Phone +31 (0)53 4874 573, telefax +31 (0)53 4874 335
E-mail retsios@xxxxxx, Internet http://www.itc.nl
Valentijn Venus wrote:
-----Original Message-----
From: Ethan Davis [mailto:edavis@xxxxxxxxxxxxxxxx]
Sent: Wednesday, June 07, 2006 12:29 AM
To: Valentijn Venus
Cc: support-thredds@xxxxxxxxxxxxxxxx
Subject: Re: Catalog generator crawling remote OPenDAP/DODS servers
Hi Valentijn,
Actually, the CatalogGen servlet is included as part of the TDS. If you
are writing configuration files that include <datasetScan> elements, you
are using the newer TDS configuration. The URLs for this stuff are of
this form: "http://<server>:<port>/thredds/...". If you are writing
configuration files that include <catalogGenConfig> elements, you are
using the old catGen framework. The CatalogGen servlet is at
"http://<server>:<port>/thredds/cataloggen/...".
The 3.8 code (and now the 3.10 code) are available at
ftp://ftp.unidata.ucar.edu/pub/thredds/tmp/
The code for both the old and new style catalog generation are included.
The old implementation for crawling OPeNDAP/DODS servers is in
src/thredds/cataloggen/config/DodsDirDatasetSource.java. In the new
framework, an implementation of
src/thredds/crawlabledataset/CrawlableDataset.java would have to be
written to handle crawling an OPeNDAP/DODS server. The old stuff might
be easy to get up and running quickly but since the plan is to move
everything to the CrawlableDataset framework, I would encourage you to
go that route.
Let me know what you decide.
Ethan
Valentijn Venus wrote:
Thanks for you response. I guess with thredds.war 3.8 i'm using the
newer TDS configuration which also generates catalogs?
Can you send me a pointer to that old code please? Thanks again...
Valentijn
-----Original Message-----
From: Ethan Davis [mailto:edavis@xxxxxxxxxxxxxxxx]
Sent: Mon 6/5/2006 19:58
To: Valentijn Venus
Cc: support-thredds@xxxxxxxxxxxxxxxx
Subject: Re: Catalog generator crawling remote OPenDAP/DODS servers
Hi Valentijn,
Valentijn Venus wrote:
Hi Ethan,
First a fundamental questions related to THREDDS: the idea for the
catalog generator is to crawl whatever source of data you have (local
or
remote) and generate a catalog, correct?
That is the idea but unfortunately crawling a local file system is all
that is currently implemented.
Now I'm looking for some
directions on how to explore all sub-directories on a remote DODS
server. The following servers I have in mind:
MODIS AQUA/TERRA:
http://g0dup05u.ecs.nasa.gov/OPENDAP_DP/long_term/MOAA/MYD04_L2.004/2
002
.07.16/
TOMS Ozone:
http://reason.gsfc.nasa.gov/opendap-bin/nph-dods/FTP_DATA/Giovanni/OP
S/T
OMS/EP/
Recently they moved these servers, and since then only the DODS
Dataset Access Form for TOMS seems to work. When accessing one of
these HDF files (i.e.
http://reason.gsfc.nasa.gov/opendap-bin/nph-dods/FTP_DATA/Giovanni/OP
S/T
OMS/EP/TOMS-EP_L3-TOMSEPL3_1996m0725_v8.HDF.html) and once the form
is setup, I can see the resulting "get"/"post" parameters trailing
the "?"
in a DODS url (...?Ozone[0:1:179][0:1:287]) How does this work once
the catalog is generated, and a client (IDV, Matlab, etc.) accesses
it in real-time? I guess THREDDS supports geographic/paramter
subsetting in a similair manner, but how does it work technically?
Generally, the THREDDS catalog provides the base DODS URL and the
client needs to understand the OPeNDAP/DODS protocol (the ".dds",
".das", ".dods" extensions and all the stuff after the "?"). THREDDS
allows for metadata to indicate the geographic coverage of a dataset
but it doesn't support subsetting a particular dataset. The actual
subsetting all dependent on the protocol being used to access the
dataset, in this case OPeNDAP.
Back to the catalog generation. At one point I did have OPeNDAP server
crawling capabilities in the catalog generator but no one was using it
and I don't think it is still working after a few major modifications.
Are you using the stand-alone CatalogGen application, the catalog
generation servlet interface, or the newer TDS configuration which
also generates catalogs? The TDS config is a completely different
framework than that used by the app and servlet. The code (non-TDS
framework) for crawling OPeNDAP servers is still around if you want to
look into getting it running again. However, I'm planning on moving
the CatGen app and servlet over to the new TDS framework at some point
but that may be awhile.
Hope this helps.
Ethan
Cheers, Valentijn
whom they are addressed.
--
Ethan R. Davis Telephone: (303) 497-8155
Software Engineer Fax: (303) 497-8690
UCAR Unidata Program Center E-mail: edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO 80307-3000 http://www.unidata.ucar.edu/
---------------------------------------------------------------------------