Re: orthogonality (was Re: New attempt)

I think the word dataset is causing trouble. There are at least three potential meanings for this word in the context of THREDDS:

1) an entity that is considered as a unit by human beings

2) an entity that can be operated on as a unit by the THREDDS API

3) an entity that can be operated on as a unit by a data access protocol

Right now, only the entities described by "access" tags meet all of 1, 2, and 3.

The tags "dataset" and "collection" both describe entities that only meet 1 and 2. Thus I agree with benno that there is not a very meaningful distinction between them (and reconsider my listing of them as orthogonal concepts in my previous message).

I wonder if it would be a good idea to merge these concepts and use a less loaded word, say "entry", to refer to an entity that has meaning to THREDDS and to end users, but not to a data access protocol, i.e.

<service name="X"/>
<service name="Y"/>

<entry name="my_dataset">

   <metadata name="global-metadata" url="..."/>
   <access name="global-X-access"/>

   <entry name="monthly-data">
     <metadata name="monthly-metadata" url="..."/>
     <access name="X-with-COARDS" serviceType="X" url="..."/>
     <access name="X-with-no-COARDS" serviceType="X" url="..."/>
     <access name="X-flattened-to-2D" serviceType="X" url="http://..."/>
     <access name="Y" serviceType="Y" url="..."/>


- Joe

Daniel Holloway wrote:

Benno Blumenthal wrote:

John Caron wrote:

Much harder question is the distinction between a dataset and a
since a dataset is a collection of data. I have conceptualized it as

follows: a dataset is something that can be selected, and then it is

processed in a protocol-dependent way. A collection is a
protocol-independent mechanism for grouping datasets.

I think this is what is getting us into trouble.    The concept of a
dataset should be independent of the services available for it:  a
dataset served from two different servers could very well have
different services/protocols available, depending on the server.  (the
aggregation server converts collections to datasets, for example).
Yet from the THREDDS/educational point of view, it is the same object.

I agree with this as well.   I've been trying to reconcile how a catalog
might look for a
particular multifile 'dataset' which has both WMS and DODS access
available for it.   For WMS (for multifile) datasets the access point
would be at the
collection level, while for 'non-aggregated' datasets the DODS access
be lower than the collection level, at the THREDDS dataset level.   It
seems that
the concept of a dataset resides more at the collection level, maybe the
access binding is too tightly coupled to the dataset concept in the
current draft.



Dr. M. Benno Blumenthal          benno@xxxxxxxxxxxxxxxx
International Research Institute for climate prediction
Lamont-Doherty Earth Observatory of Columbia University
Palisades NY 10964-8000                  (845) 680-4450

Joe Wielgosz
joew@xxxxxxxxxxxxx / (707)826-2631
Center for Ocean-Land-Atmosphere Studies (COLA)
Institute for Global Environment and Society (IGES)