John,
John Caron wrote:
Roland Schweitzer wrote:
John Caron wrote:
Roland Schweitzer wrote:
John,
I have a question about the THREDDS Dataset Inventory Catalog XML.
I don't intend this as a criticism, but rather I'm curious about
the choices and trade-offs. All of us that are messing around with
XML are wrestling with similar issues.
In general, it seems that relationships between elements in the XML
are done via attributes. For example, a <service> element is
referred to in the document via the serviceName attribute in the
<dataset> element. And a <dataset> element can be repeated by
referencing the name of another <dataset> element via the alias
attribute.
It seems to me that using this technique then requires that client
code must be written to follow these connections. By contrast, it
seems that the XML community has attempted to create languages
(like XPointer) that would "standardize" these sorts of
references. Admittedly, even though the XPointer recommendation is
a year old, I have not found (m)any implementations in general
purpose XML software.
Can you please comment on these choices and trade-offs for defining
the internal connections between bit of XML that went into
developing the Inventory Catalog?
Thanks,
Roland
Hi Roland:
<excuse> Sorry its taken me so long to answer this </excuse>
Anyway, its not clear that the XPointer spec will become an official
standard. XPath seems useable though, and i am open to it. Both the
serviceName and the alias = dataset ID are more or less the simple
case of XPath using IDs. I think using IDs for datasets is so useful
that it should probably be required. Which I would do if we could do
so and still allow the minimal datasets like the DODS File Server.
This ID reference is so simple that even DTDs have it.
So Id say full XPath is a bit of overkill right now, but i am open
to using it in the future. Do you forsee any new features that might
need it?
No excuses needed and no worries.
I don't have any particular features in mind that require full XPath,
but my question was directed at the idea that we should get the most
bang for the buck that we can out of the validation of documents.
In the new catalog schema, every attribute (except name) is optional
on the dataset element. This means, simple catalogs are possible.
But, I think it also means that there is no way from simply
validating the XML to guarantee that the alias references are
available in the document. This is a valid document (according to
the schema and XML Spy):
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="blah blah blah">
<dataset name="billy" ID="b1"/>
<dataset name="pointer to nothing" alias="sam"/>
</catalog>
even though the dataset named "pointer to nothing" does just that.
I'll be the first to admit I'm not even sure if what I'm thinking
about is possible, but I think if there were some way to use the
"standard" constructs of XML to enforce the relationship between
dataset elements with alias attributes and the dataset elements to
which they refer it would somehow be "better". I assume when you
"validate" a document with your client library you enforce this
relationship, but it seems it might be "better" if an off the shelf
validation code (like XML Spy) could enforce this relationship. As I
said, I don't know if it is possible and I'm trying to figure this
out for XML I'm designing so I'm hoping to benefit from our
discussion and your experience designing these catalogs.
Thanks,
Roland
i agree with you on all this; we continue to try to use standard
validation as much as possible.
on this particular example, we actually now can validate this, (with
the latest version of the schema put out about a week ago and cleverly
not announced to anyone yet ;^) at
http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.xsd
the way it works is using the "keyref" constraint:
<!--
Enforce dataset ID references:
1) Each dataset ID must be unique in the document.
2) Each dataset alias must reference a dataset ID in the
document.
-->
- <xsd:unique name="datasetID">
<xsd:selector xpath=".//dataset" />
<xsd:field xpath="@ID" />
</xsd:unique>
- <xsd:keyref name="datasetAlias" refer="datasetID">
<xsd:selector xpath=".//dataset" />
<xsd:field xpath="@alias" />
</xsd:keyref>
interestingly enough, it appears that Xerces is not yet handling this
constraint, but XMLSpy seems to. I havent yet tracked this down, or
found out if i need a more current version of Xerces. (i didnt get a
chance to try this on your example, let me know if you do...)
I tried XML Spy on my little example and indeed it was found to be
invalid under the new schema. Cool!
IMO, schemas are still bleeding-edge; im hoping they get more mature
soon. theres a lot of sentiment against W3C Schema; i toyed with
Relax-NG as an alternative. Just have to keep trying different stuff
for now....
I understand. I too have been considering Relax NG because it's
"easier" to specify ideas like an element should have either this set of
attributes or this other set of attributes, but not both sets of
attributes. However, nothing is obvious.
Thanks,
Roland