-------- Original Message --------
Organization: UCAR/Unidata
CC: dmurray@xxxxxxxxxxxxxxxx, Ethan Davis <edavis@xxxxxxxx>
References: <406B0D0B.5040107@xxxxxxxxxxxxxxxx>
<406DF175.9090902@xxxxxxxxxxxxxxxx>
Jeff McWhirter wrote:
John Caron wrote:
A proposed new version of the THREDDS Dataset Inventory Catalog is
ready for your comments. Please send them to
thredds@xxxxxxxxxxxxxxxx, or to me.
John,
Here are some comments about the catalog specification.
First of all it would be great if there was a full blown example
catalog that
shows all of the different pieces of the specification in one place.
(Or am I just missing it?) I'd really like to see some examples of how
the metadata, coherent tags, variables, vocabulary, etc., all fit
together.
yes, ill get a decent example out this week.
Under the changes document you have:
access
remove serviceType (no anonymous service)
What does the "no anonymous service" mean.
it used to be you could define a service by adding a serviceType
attribute to an access element.
we are withdrawing that feature to make things simpler
I 'm a bit confused about how to use an alias. An example would help.
ill add an example
You say:
"For more complicated situations, use nested access elements."
What is the difference between the nested access element and
simply having the serviceName, etc., right in the data set. When
and why would I choose one or the other approach.
Can you have multiple contained access elements?
use explicit access elements when there is more than one way to access
the dataset. ive tried to rewrite that section to be clearer:
"The serviceName and urlPath attributes on the dataset element are used
for the common case that a dataset has a single access. The serviceName
refers to the unique name of a service element. The urlPath is appended
to the service's base to get the dataset URL. (see constructing URLs).
Logically the use of these two attributes creates an access element for
this dataset. When you have more than one way to access a dataset,
explicitly define them using more than one nested access elements. "
Maybe I missed it but I assume the serviceName of "this" implies that
is is relative to the url where we got the catalog from?
formally there is no semantics to naming a service "this". in the case
that a catalog is written to describe the datasets from a particular
data server, we use the idiom of naming that service "this". For the
aggServer, we have told people to make it a reletive URL, because of
various reasons about the aggserver implementation.
metadata: Your example shows a metadata tag pointing to an ncml file.
You also
have ncml as a data format type. Why would you use the ncml as metadata?
thanks for catching that. the data portal people are using Ncml in a way
that i wouldnt, although its legal from a catalog POV. i will change the
example to avoid confusion.
as an aside, a recent conversation with the ESML group reveals that we
probably would point to ESML as metadata, so that the data URL can point
to the actual data file.
Can you give an example of how a client would use the variables tag.
The main purpose of <variables> is for digital libraries, in particular
we need it for GCMD, who requires a list of available "parameters" from
their controlled vocabulary. A client might want to show those
"alternative names" to the users. Perhaps we whould automatically add
them to the netcdf data model so it can be done in a standard way?
Would you have a variables tag in a composite data set.
I assume you mean collection dataset?
Yes, it would make the most sense if the collection was a group of
datasets with the same variables (eg a time series), and so youd put an
inherit=true tag on it to convey that info.
even if that wasnt the case, it may still make sense as a high-level
description of a dataset for a digital library.
Can you give an example of how a client would use the vocabulary?
hmmm, again its main point if for DL, but in some cases it might be
helpful for the user to know what vocabulary was being used. If you had
more than one vocabulary (which i think will happen) the use might want
to select which s/he prefers.
What are the semantics behind the data types? e.g., what does Grid mean?
Or Station? Would a shapefile be classified as a Trajectory?
Yeah baby! Now those are the good questions! ;^)
Currently im thinking of letting people use the vocabularies they are
used to (eg Grid, Swath, Point for HDF-EOS), then clarify their mappings
into a "common data model" and visad. I have some vague notions what
that means, so i dont think weve made the task impossible, but theres a
lot of work to be done. im hoping this is one of the main foci for
THREDDS/IDV collaboration.
Shapefiles are probably a degenerate Trajectory (because shapefile has
no no time dimension); id probably use "Feature" or something.
I like the "coherent" data set attribute. As we talked earlier perhaps
there can be
a further elaboration on this that describes whether the sub-datasets
of a coherent
data set can be views/accessed individually or should a UI just show
the parent.
im thinking that the UI would allow a user to select:
1) a direct dataset
2) a coherent dataset parent
3) a sub-collection of a coherent dataset (which would also be a
coherent dataset).
The collectionType attribute says that this dataset is coherent, and
adds enough semantics (time series, station collection, what else?) that
the client knows how to deal with the collection.
So this design puts the decision in the hands of the user if they want
to view the entire collection or individual elements of it. Seems like
both would be reasonable, not sure of a use case where it wouldnt.
The coherent flag addresses some of the issues the IDV has had about
when to treat a
collection of dataset urls as a whole.
yes, they are pretty much a direct resonse to your and Dons ideas
(just because were slow it doesnt mean we arent listening ;^)
thanks for your input! do you mind if i forward this to thredds email group?