Hi Ethan & Rich,
Yes - there are just a couple of "trivial" (they really are :-)) changes to the
HTML scraper (thredds.util.DodsURLExtractor) to remove invalid collection level
things and to thredds,cataloggen.CollectionLevelScanner to fix problems with
crawling through collections in the catalog that is created.
Works fine on the few OPeNDAP servers I know about - eg. marine.csiro.au and
the Argo GDAC server - but obviously needs a try on others from Ethan's
comments. Note: This inexperience was also why I asked in the original message
if there was a better way to do this! :-)
I'll email the changed .java files (for the netcdf-4.0 jar) through separately
so you can try them if still interested Rich/Pauline.
Argo GDAC OPeNDAP server is a public one that does work with the changes, so
here is a simple catalog you could try with the changes if you still want to
use this:
<?xml version="1.0" encoding="UTF-8"?>
<catalog name="ARGO GDAC Test Catalog"
xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
xmlns:xlink="http://www.w3.org/1999/xlink">
<service name="argofloatdata" serviceType="OpenDAP"
base="http://www.ifremer.fr/cgi-bin/nph-dods/data/in-situ/"/>
<datasetScan name="argodac" path="argo/dac"
location="http://www.ifremer.fr/cgi-bin/nph-dods/data/in-situ/argo/dac">
<serviceName>argofloatdata</serviceName>
<crawlableDatasetImpl
className="thredds.crawlabledataset.CrawlableDatasetDods" />
</datasetScan>
</catalog>
Cheers,
Simon
________________________________________
From: thredds-bounces@xxxxxxxxxxxxxxxx [thredds-bounces@xxxxxxxxxxxxxxxx] On
Behalf Of Ethan Davis [edavis@xxxxxxxxxxxxxxxx]
Sent: Saturday, 11 April 2009 8:20 AM
To: thredds@xxxxxxxxxxxxxxxx
Subject: Re: [thredds] Running THREDDS on top of old OPeNDAP servers
Hi all,
It has been awhile since I worked on the CrawlableDataset code. There is
some documentation, its a bit rough and was never really announced or
linked in to the rest of the documentation.
A quick summary:
Behind the scenes of the datasetScan element, the
thredds.cataloggen.CatalogBuilder interface is used to build catalogs.
It uses the thredds.crawlabledataset.CrawlableDataset interface to scan
for datasets. The default CrawlableDataset implementation is
thredds.crawlabledataset.CrawlableDatasetFile. The only other
implementation in the TDS distribution is
thredds.crawlabledataset.CrawlableDatasetDods.
Simon, I'm curious about the changes you've made as well. Are they in
CrawlableDatasetDods? Since the old OPeNDAP servers don't have a
standard directory interface, the code ends up scraping HTML and that
just gets ugly and hard to be general. So, I wouldn't be surprised if
that code might need tweaking depending on the OPeNDAP server it is
crawling.
Anyway, here some of the docs we have:
http://www.unidata.ucar.edu/projects/THREDDS/tech/cataloggen/devel/architecture.html
http://www.unidata.ucar.edu/projects/THREDDS/tech/cataloggen/devel/userImplementation.html
http://www.unidata.ucar.edu/projects/THREDDS/tech/cataloggen/devel/datasetScanElement.html
also javadoc for thredds.cataloggen and thredds.crawlabledataset are
available in our complete javadoc:
http://www.unidata.ucar.edu/software/netcdf-java/v4.0/javadocAll/index.html
Ethan
Rich Signell wrote:
> Simon,
>
> This sounds extremely useful and I'd love to give it a try.
>
> Can you please tell us what the "trivial" changes are to NetCDF-Java?
>
> And do you have a real-life example of the catalog below that works
> with publicly available OpenDAP data?
>
> Thanks,
> Rich
>
> On Wed, Apr 8, 2009 at 8:28 PM, <Simon.Pigot@xxxxxxxx> wrote:
>> Hi Pauline,
>>
>> The following works ok for us (as an example - non-essential details
>> removed):
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <catalog name="YOUR SITE OPeNDAP Catalog"
>> xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
>> xmlns:xlink="http://www.w3.org/1999/xlink">
>>
>> <service name="yoursiteopendap" serviceType="OpenDAP"
>> base="http://www.yoursite.com/dods/nph-dods/dods-data/"/>
>> <datasetScan name="climatology-netcdf" path="climatology-netcdf"
>> location="http://www.yoursite.com/dods/nph-dods/dods-data/climatology-netcdf">
>> <serviceName>yoursiteopendap</serviceName>
>> <crawlableDatasetImpl
>> className="thredds.crawlabledataset.CrawlableDatasetDods" />
>> </datasetScan>
>> <datasetScan name="bluelink" path="bluelink"
>> location="http://www.yoursite.com/dods/nph-dods/dods-data/bluelink">
>> <serviceName>yoursiteopendap</serviceName>
>> <crawlableDatasetImpl
>> className="thredds.crawlabledataset.CrawlableDatasetDods" />
>> </datasetScan>
>> </catalog>
>>
>> I'm not sure if its all documented somewhere - I worked it out the slow way
>> by poking around in the netcdf java code and hunting through the archives of
>> the thredds mailing list. There are also some trivial changes you need to
>> make to the code (in netcdf-java) to filter out some unwanted artifacts
>> created when the scan picks through the html from the OpenDAP server -
>> otherwise you end up with some strange, non-functional things in your
>> catalog. Maybe there is a better way to do the above?
>>
>> By way of introduction, we want this sort of catalog to work as part of a
>> thredds metadata harvester I'm adding to GeoNetwork which produces ISO19115
>> metadata records and ISO19119 records for thredds services. Its nearly at
>> the stage where it is working reliably but there are a few more issues I
>> need to solve and I'm still learning about Thredds :-)
>>
>> Cheers and I hope this helps,
>> Simon
>>
>> ________________________________________
>> From: thredds-bounces@xxxxxxxxxxxxxxxx [thredds-bounces@xxxxxxxxxxxxxxxx] On
>> Behalf Of Pauline Mak [Pauline.Mak@xxxxxxxxxxx]
>> Sent: Thursday, 9 April 2009 8:56 AM
>> To: thredds@xxxxxxxxxxxxxxxx
>> Subject: [thredds] Running THREDDS on top of old OPeNDAP servers
>>
>> Hi all,
>>
>> I'm figuring out ways to serve data using THREDDS on top of old OPeNDAP
>> servers. I'm aware that you can configure datasets based on a URL, but
>> that's for a single file... (correct me if I'm wrong!) However, are
>> there ways to apply to an directory? Sort of like a datasetScan +
>> filters for a directory URL? When poking through the THREDDS catalog
>> XSD, there's a crawlableDatasetImpl element. Is that the sort of things
>> I need to look at?
>>
>> Thanks,
>>
>> -Pauline.
>>
>> --
>> Pauline Mak
>>
>> ARCS Data Services
>> Ph: (03) 6226 7518
>> Email: pauline.mak@xxxxxxxxxxx
>> Jabber: pauline.mak@xxxxxxxxxxx
>> http://www.arcs.org.au/
>>
>> TPAC
>> Email: pauline.mak@xxxxxxxxxxx
>> http://www.tpac.org.au/
>>
>>
>>
>> _______________________________________________
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>> _______________________________________________
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>
>
>
--
Ethan R. Davis Telephone: (303) 497-8155
Software Engineer Fax: (303) 497-8690
UCAR Unidata Program Center E-mail: edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO 80307-3000 http://www.unidata.ucar.edu/
---------------------------------------------------------------------------
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/