netCDF Identifiers and Character Escape Mechanisms (sigh!)
Ideally, netCDF should allow any printable UTF-8 character
to be used in an identifier. Currently, that is almost the case,
with forward slash being the exception because of the syntax of HDF5
identifiers.
More and more, the netCDF API is being used as wrapper for a wide
variety of other formats: HD5, HDF4, GRIB, BUFR, DAP2, DAP4, etc.
During the process of defining translations to/from netCDF and these
other format, it is necessary to implicitly or explicitly define
netCDF identifiers from the schemas of these other formats.
The canonical example is HDF5.
In HDF5, many API functions take a path, which is a
sequence of identifiers separated by '/'.
A path may be absolute ("/g1/g2/x") or relative ("y").
It appears to be the case that there is no way in HDF5 to specify
an identifier containing '/', such cases are always interpreted as
paths. So, if one naively defined, thru the netcdf-4 API, a variable
named "/x/y", there is no apparent way to actually get this defined
properly in HDF5. It is this fact that has led to the current,
IMO undesirable, restriction that netCDF identifiers may not contain '/'.
Super Escapes
This situation is going to recur as the netcdf API is used to wrap
other data formats. What we will need is a mechanism by which we can
convert an identifer containing arbitrary UTF-8 characters into
another identifier in some rather restricted set of legal
identifier characters. In addition, I would impose the rule
that the conversion is invertible.
This kind of "super-escaping" is very hard because in the worst
case, we are likely to encounter the situation where legal
identifier characters are restricted to something like
the alphanumerics plus underscore.
Posted by $entry.creator.screenName