Unidata Developer's Blog

Showing entries tagged [dap2]

netCDF Identifiers and Character Escape Mechanisms (sigh!)

04 April 2012

netCDF Identifiers and Character Escape Mechanisms (sigh!)

Ideally, netCDF should allow any printable UTF-8 character to be used in an identifier. Currently, that is almost the case, with forward slash being the exception because of the syntax of HDF5 identifiers.

More and more, the netCDF API is being used as wrapper for a wide variety of other formats: HD5, HDF4, GRIB, BUFR, DAP2, DAP4, etc. During the process of defining translations to/from netCDF and these other format, it is necessary to implicitly or explicitly define netCDF identifiers from the schemas of these other formats.

The canonical example is HDF5. In HDF5, many API functions take a path, which is a sequence of identifiers separated by '/'. A path may be absolute ("/g1/g2/x") or relative ("y"). It appears to be the case that there is no way in HDF5 to specify an identifier containing '/', such cases are always interpreted as paths. So, if one naively defined, thru the netcdf-4 API, a variable named "/x/y", there is no apparent way to actually get this defined properly in HDF5. It is this fact that has led to the current, IMO undesirable, restriction that netCDF identifiers may not contain '/'.

Super Escapes

This situation is going to recur as the netcdf API is used to wrap other data formats. What we will need is a mechanism by which we can convert an identifer containing arbitrary UTF-8 characters into another identifier in some rather restricted set of legal identifier characters. In addition, I would impose the rule that the conversion is invertible.

This kind of "super-escaping" is very hard because in the worst case, we are likely to encounter the situation where legal identifier characters are restricted to something like the alphanumerics plus underscore.

Posted by $entry.creator.screenName