Roland Schweitzer wrote:
John,
John Caron wrote:
Roland Schweitzer wrote:
John,
I have a question about the THREDDS Dataset Inventory Catalog XML.
I don't intend this as a criticism, but rather I'm curious about the
choices and trade-offs. All of us that are messing around with XML
are wrestling with similar issues.
In general, it seems that relationships between elements in the XML
are done via attributes. For example, a <service> element is
referred to in the document via the serviceName attribute in the
<dataset> element. And a <dataset> element can be repeated by
referencing the name of another <dataset> element via the alias
attribute.
It seems to me that using this technique then requires that client
code must be written to follow these connections. By contrast, it
seems that the XML community has attempted to create languages (like
XPointer) that would "standardize" these sorts of references.
Admittedly, even though the XPointer recommendation is a year old, I
have not found (m)any implementations in general purpose XML software.
Can you please comment on these choices and trade-offs for defining
the internal connections between bit of XML that went into
developing the Inventory Catalog?
Thanks,
Roland
Hi Roland:
<excuse> Sorry its taken me so long to answer this </excuse>
Anyway, its not clear that the XPointer spec will become an official
standard. XPath seems useable though, and i am open to it. Both the
serviceName and the alias = dataset ID are more or less the simple
case of XPath using IDs. I think using IDs for datasets is so useful
that it should probably be required. Which I would do if we could do
so and still allow the minimal datasets like the DODS File Server.
This ID reference is so simple that even DTDs have it.
So Id say full XPath is a bit of overkill right now, but i am open to
using it in the future. Do you forsee any new features that might
need it?
No excuses needed and no worries.
I don't have any particular features in mind that require full XPath,
but my question was directed at the idea that we should get the most
bang for the buck that we can out of the validation of documents.
In the new catalog schema, every attribute (except name) is optional
on the dataset element. This means, simple catalogs are possible.
But, I think it also means that there is no way from simply validating
the XML to guarantee that the alias references are available in the
document. This is a valid document (according to the schema and XML
Spy):
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="blah blah blah">
<dataset name="billy" ID="b1"/>
<dataset name="pointer to nothing" alias="sam"/>
</catalog>
even though the dataset named "pointer to nothing" does just that.
I'll be the first to admit I'm not even sure if what I'm thinking
about is possible, but I think if there were some way to use the
"standard" constructs of XML to enforce the relationship between
dataset elements with alias attributes and the dataset elements to
which they refer it would somehow be "better". I assume when you
"validate" a document with your client library you enforce this
relationship, but it seems it might be "better" if an off the shelf
validation code (like XML Spy) could enforce this relationship. As I
said, I don't know if it is possible and I'm trying to figure this out
for XML I'm designing so I'm hoping to benefit from our discussion and
your experience designing these catalogs.
Thanks,
Roland
i agree with you on all this; we continue to try to use standard
validation as much as possible.
on this particular example, we actually now can validate this, (with the
latest version of the schema put out about a week ago and cleverly not
announced to anyone yet ;^) at
http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.xsd
the way it works is using the "keyref" constraint:
<!--
Enforce dataset ID references:
1) Each dataset ID must be unique in the document.
2) Each dataset alias must reference a dataset ID in the document.
-->
- <xsd:unique name="datasetID">
<xsd:selector xpath=".//dataset" />
<xsd:field xpath="@ID" />
</xsd:unique>
- <xsd:keyref name="datasetAlias" refer="datasetID">
<xsd:selector xpath=".//dataset" />
<xsd:field xpath="@alias" />
</xsd:keyref>
interestingly enough, it appears that Xerces is not yet handling this
constraint, but XMLSpy seems to. I havent yet tracked this down, or
found out if i need a more current version of Xerces. (i didnt get a
chance to try this on your example, let me know if you do...)
IMO, schemas are still bleeding-edge; im hoping they get more mature
soon. theres a lot of sentiment against W3C Schema; i toyed with
Relax-NG as an alternative. Just have to keep trying different stuff for
now....