John Caron wrote:
>
> ----- Original Message -----
> From: "Peter Cornillon" <pcornillon@xxxxxxxxxxx>
> To: "John Caron" <caron@xxxxxxxxxxxxxxxx>
> Cc: <thredds@xxxxxxxxxxxxxxxx>
> Sent: Friday, December 14, 2001 11:30 AM
> Subject: Re: THREDDS/DLESE Connections slides
>
> > Hi John,
> >
> > An interesting presentation. I have a couple of comments.
> >
> > 1) Slide 6 you say that there is no efficient way to discover data sets
> within
> > the system. First, I think that you should be careful to separate out
> issues
> > related to the data access protocol and other services. The DODS DAP does
> > not provide an efficient way to locate data. That was not it's intent.
> There
> > are however sites that are beginning to provide for this. For example, the
> > GCMD lists a subset of DODS accessible data sets (~2/3 of the total) and
> > one can use any of the GCMD search mechanisms to locate a data set of
> > interest. Furthermore, Dan Holloway has developed a Java program that
> > will do this search as well as searches in other directories as they come
> > on line. The important point here is that this search capability can be
> > launched from within a user's application; e.g., from within Matlab and
> > through the suite of associated interfaces can drill down to the data
> > themselves and request that they be moved into the application. This
> > capability is currently under developement with the search portion in
> > beta.
>
> Yes, services can be built that allow data discovery with DODS servers, but
> they seem to need to know some info specific to the server (eg root URL
> directories). As far as I know its impossible to write a data discovery
> service that would work for any DODS server, without having some extra
> information specific to each server.
We have built a prototype crawler that crawls a DODS site given the URL
for the site and finds all DODS files at the site. The problem is that
it has no way at present of differentiating between files in a data set
and the data set itself. At our site (a satellite archive) there are
currently in excess of 50,000 files and will soon be in excess of 100,000.
This makes sorting out the information returned by the crawler difficult
at best. (In situ archives can have 100,000s to millions of files - one
per xbt depending on the organization of the site.) Steve Hankin's group
is working adding the ability to group files into data sets. I believe
that he is working with the GCMD on this.
> The situation with ADDE servers is somewhat different. You can (more or
> less) query the server to find out whats available, but this collection of
> information takes a while (eg 8 minutes for complete image data on unidata's
> ADDE server), too long for interactive (eg MetApp) access.
But you still have to know the URL for the server itself. I assume that
there is more than one server? If that is the case there needs to be a
high level list somewhere of server sites. This high level list could
just as well be a list of data set URLs (where there might be a number
at a given site - back to the DODS data set list). Is your concern
currency at the directory? This is the issue that we hope to address
with the crawler. In fact, we hope to take the crawler one step farther
by adding a web page in the htdocs directory that says "I'm a DODS
server, here I am". A crawler can then not only crawl a given site
but when combined with a network crawler, crawl the entire network.
Well, not really, the way I see such a harvester is that it would use
existing repositories (dogpile, yahoo,...) to find server sites and
then direct the site crawler to crawl the sites.
> In both cases the role of THREDDS might be the middleware that does data
> discovery on the servers to provide fast responses to the clients.
Yes, this is a possibility.
> I am looking forward to hearing more about data search and discovery at the
> developers meeting. We hope to corner Dan and see how his work might relate
> to THREDDS catalogs, and if we can merge our efforts in some way.
Sounds good to me.
> > 2) I could find very little information on APPE. Do you have a URL for a
> > site where it is described?
>
> Its ADDE (Abstract Data Distribution Environment, I think). Hopefully Tom
> Yoksas or Tom Whittaker can provide a good URL.
Sorry for the typo; I looked up ADDE when I did the web search.
> >
> > It might also be of interest to the group that the DODS effort will be
> funded
> > to look at a SOAP implementation associated with the core. This is a
> > prototype effort and should be complete within a year.
>
> That will be interesting.
>
> Thanks for your comments.
--
Peter Cornillon
Graduate School of Oceanography - Telephone: (401) 874-6283
University of Rhode Island - FAX: (401) 874-6728
Narragansett RI 02882 USA - Internet: pcornillon@xxxxxxxxxxx