[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New Catalog XML Draft



----- Original Message -----
From: "Joe Wielgosz" <address@hidden>
To: "John Caron" <address@hidden>
Cc: <address@hidden>
Sent: Tuesday, May 14, 2002 3:52 PM
Subject: Re: New Catalog XML Draft


> John,
>
> I like it.
>
> My suggestions:
>
> 1) Don't restrict service types to known values. It is certain that
> people will want to add new service types, so the catalog format should
> be extensible in this area.
>
> In order to prevent ambiguity (does "dods" equal "DODS" equals
> "distributed oceanographic data system"?) perhaps these types could
> somehow resolve to the url of the service's home page (e.g.
> DODS->http://unidata.ucar.edu/packages/dods). I don't know the most
> XML-savvy way to do this but perhaps the known mappings can be included
> in the DTD?
>
> 2) Same suggestion for metadata types.

In both these cases, the XML thing to do is to use a URI as a unique
identifier. The options are eg:

1.  xlink:arcrole="http://unidata.ucar.edu/packages/dods";

vs

2.  metadataType="DODS"

Pros of 1: allows services to be added by anyone, URI optionally point to
explanation
Pros of 2: compact, explicitly documents allowable types

>
> 3) How about a catalogNS attribute, that can be added to <catalog>,
> <collection>, or <dataset>?  This would specify a namespace in which
> dataset ID's can be considered unique.
>
> For example, if a THREDDS-crawler found two catalogs with
> catalogNS="http://cola.iges.org/thredds";, and both contained a dataset
> with ID="avn0300", then it could consider these two datasets identical.
>
> This would make it possible to uniquely identify datasets in multiple
> catalogs (in fact across the entire THREDDS web).

really good idea, i'd probably use "datasetNamespace" as tag.

semantics are that if datasetNamespace exists for a dataset, then
datasetNamespace/ID must be globally unique, and the same dataset at
multiple locations should have the same datasetNamespace/ID.

>
> 4) COARDS and CF should go on the list of known metadata types.
>
> 5) it would probably be clearer if the "DatasetDesc" metadataType was
> renamed to "THREDDS".

ok, but if we use URI, the list will just be on some web document, rather
than listed in the DTD


>
> 5) IMHO, the dataType attribute is metadata. Thus it should be part of
> the THREDDS/DatasetDesc metadata file, rather than the catalog itself.

It did occur to me to put it in the DatasetDesc when I made it optional. The
use case is if a client can only read GRID files, you'd like to eliminate
the non-GRID datasets without having to dereference a whole lot of other XML
files. So it could be thought of as a keyword for fast filtering.

As for "metadata", I'd say everything in a Catalog is metadata, but i agree
its more like DatasetDesc metadata than Catalog metadata.


>
> 6) How about allowing inline dataset metadata, via a <metadata> tag?
> Then it would be possible for a site to completely describe its holdings
> in a single file if necessary.

that's a good idea, could make it like <documentation> which can be a
reference and/or inline. In fact, perhaps <documentation> should becode
<metadata> of type "documentation" ?