[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #ZYO-239963]: Characters allowed in netcdf variable and attribute names



Hi James!

> We are using the following for netCDF identifiers: 
> string allowed = 
> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-+_.@" ;
> // string of allowed first characters in netcdf naming
> // convention
> string first = 
> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_" ;
> 
> Are those character sets close to being correct? Anything missing? How about 
> '%' characters?

You may be thinking of names allowed in CF-compliant netCDF files,
which are fairly restricted:

  2.3. Naming Conventions

  Variable, dimension and attribute names should begin with a letter
  and be composed of letters, digits, and underscores. Note that this
  is in conformance with the COARDS conventions, but is more
  restrictive than the netCDF interface which allows use of the hyphen
  character. The netCDF interface also allows leading underscores in
  names, but the NUG states that this is reserved for system use.

NetCDF names are considerably more flexible, since we added Unicode
name support for both netCDF-3 and netCDF-4 libraries and file
formats.  The details for names are in the format spec in the NetCDF
User's Guide, "Appendix C.1 The NetCDF Classic Format Specification",

  http://www.unidata.ucar.edu/netcdf/docs/netcdf.html#NetCDF-Classic-Format

and are also included in the NASA ESDS standard at

  http://www.esdswg.com/spg/rfc/esds-rfc-011/ESDS-RFC-011v2.00.pdf

Here's the description in English:

  Note on names: Earlier versions of the netCDF C-library reference
  implementation enforced a more restricted set of characters in
  creating new names, but permitted reading names containing arbitrary
  bytes. This specification extends the permitted characters in names
  to include multi-byte UTF-8 encoded Unicode and additional printing
  characters from the US-ASCII alphabet. The first character of a name
  must be alphanumeric, a multi-byte UTF-8 character, or '_' (reserved
  for special names with meaning to implementations, such as the
  “_FillValue” attribute). Subsequent characters may also include
  printing special characters, except for '/' which is not allowed in
  names. Names that have trailing space characters are also not
  permitted.

  Implementations of the netCDF classic and 64-bit offset format must
  ensure that names are normalized according to Unicode NFC
  normalization rules during encoding as UTF-8 for storing in the file
  header. This is necessary to ensure that gratuitous differences in
  the representation of Unicode names do not cause anomalies in
  comparing files and querying data objects by name.

The regular expression for netCDF names (for dimensions, attributes,
variables, groups, user-defined types, compound type members, and
enumeration labels) is:

  ([a-zA-Z0-9_]|{MUTF8})([^\x00-\x1F/\x7F-\xFF]|{MUTF8})

where "{MUTF8}" means any multibyte, UTF-8 encoded, NFC-normalized
Unicode character.

The Unicode/UTF-8 stuff was added in versions 3.6.3 and 4.0, in June 2008.

Note that the CDL notation has to escape some characters in names, for
example leading numeric characters, so that a variable named "5DegAvg"
would appear in CDL as "\5DegAvg".

Finally, the question has come up about whether adding Unicode name
support violated our commitment to backwards compatibility.  It
doesn't, as the FAQ answer here explains:

  http://www.unidata.ucar.edu/netcdf/docs/faq.html#fv22

Too much information? :-)

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: ZYO-239963
Department: Support netCDF
Priority: Normal
Status: Closed