Hi John,

I didn't respond directly to all the questions you asked but I hope that
what I wrote is sufficient...

John Caron wrote:


>>Thus I agree with benno that there is not a very
>>meaningful distinction between them (and reconsider my listing of them
>>as orthogonal concepts in my previous message).
>>I wonder if it would be a good idea to merge these concepts and use a
>>less loaded word, say "entry", to refer to an entity that has meaning to
>>THREDDS and to end users, but not to a data access protocol, i.e.
>><service name="X"/>
>><service name="Y"/>
>><entry name="my_dataset">
>>    <metadata name="global-metadata" url="..."/>
>>    <access name="global-X-access"/>
>>    <entry name="monthly-data">
>>      <metadata name="monthly-metadata" url="..."/>
>>      <access name="X-with-COARDS" serviceType="X" url="..."/>
>>      <access name="X-with-no-COARDS" serviceType="X" url="..."/>
>>      <access name="X-flattened-to-2D" serviceType="X" url="http://..."/>
>>      <access name="Y" serviceType="Y" url="..."/>
>>      ....
>>    </entry>
> Ok so an "entry" meets meaning 1), while an "access" meets meaning 3) (we
> dont need to worry about meaning 2) here).
> Some questions:
> 1) Should we understand that all the access elements within an entry are
> different versions of the same dataset? Should we disallow:
>      <entry name="monthly-data">
>        <metadata name="monthly-metadata" url="..."/>
>        <access name="monthly-data from MARS" serviceType="X" url="..."/>
>        <access name="monthly-data from VENUS" serviceType="X" url="..."/>
>      </entry>

No, I was not implying that for an <entry> tag. I would allow your example.

 > 2) is there any relationship between peer elements, in your example
 >      <access name="global-X-access"/>
 >      <entry name="monthly-data">

Not necessarily.

I think what I am trying to suggest is while it may be useful for humans
to think of some consistent object being accessed via different
services, this really does not translate it to anything meaningful at
the machine level.

Unless we actually try to define some machine-readable relationship
between the accesses (e.g. Type 1 aggregation, etc - which gets into the
whole data model can of worms) the only thing a machine can understand
is a named and described hierarchy of access objects.

Of course, something is being lost here from the human's point of view.
Humans seem to want to make a distinction that is not significant to

"a collection of accesses to some single underlying object"
"a collection of accesses to different underlying objects, that share
some common theme"

Is this is what <dataset> and <collection> have been intended to mean?

If this is the case then I would suggest that

a) this distinction be preserved by allowing both tags to be
used(possibly renamed if it would clarify things); and

b) data providers should be encouraged to mark up their catalogs
appropriately using the two tags, so that THREDDS client UI's can take
advantage of this to present catalogs in an intuitive way; but

c) these tags should be completely interchangeable in all other ways
(i.e. same type in the DTD/Schema, and same API calls, any tag that can
go in a dataset can also go in a collection), since they are
semantically equivalent at a machine level.

Does that make any sense? Benno, would that satisfy you?

- Joe (ready for a checkup with my ontologist)


Joe Wielgosz
joew@xxxxxxxxxxxxx / (707)826-2631
Center for Ocean-Land-Atmosphere Studies (COLA)
Institute for Global Environment and Society (IGES)

Hi John and Joe,

Since I was asked, I am answering, not that I am adding anything useful.

Yes, if collections and datasets are completely interchangable in all
machine-type ways, that works for me.   I think John gives the definitive
summary below.  Of course, if I have a dataset that temporarily does not have
any functioning access methods on a particular server, one may not always feel
the need to relabel it a collection...


Quoting John Caron and Joe <caron@xxxxxxxxxxxxxxxx>:

> If this is the case then I would suggest that
> a) this distinction be preserved by allowing both tags to be
> used(possibly renamed if it would clarify things); and
> b) data providers should be encouraged to mark up their catalogs
> appropriately using the two tags, so that THREDDS client UI's can take
> advantage of this to present catalogs in an intuitive way; but
> c) these tags should be completely interchangeable in all other ways
> (i.e. same type in the DTD/Schema, and same API calls, any tag that can
> go in a dataset can also go in a collection), since they are
> semantically equivalent at a machine level.
> Does that make any sense? Benno, would that satisfy you?
> - Joe (ready for a checkup with my ontologist)

Quoting John:
Actually Im inclined to take it a bit further.

Currently a collection is just some collection of datasets that share some
common theme. If we allow it also to be a dataset (meaning it has a URL,
be selected, etc) then I think it should have the meaning that contained
datasets are subsets or specializations of it. Because if they are not it
seems to me that you might as well just represent the collection-as-dataset
as a contained dataset element. [Maybe in this whole discussion I have been
trying to convince myself of that :^] Does everyone agree with that meaning
of nested datasets inside of collection-as-dataset?

PS: There are still semantic difference between collections and datasets: A
dataset has one or more access elements, a collection 0 or more.
contain datasets and nested collections.
OTOH, datasets and collections look so similar already in the XML, its
tempting to combine them (which i was playing with earlier in

