Re: fixed strings in NetCDF

> We have tried to keep the C and Fortran interfaces parallel.  Occasionally
> this means using a lowest-comon-denominator approach (e.g. character arrays)
> that is not ideal for either C or Fortran.  Another goal is to permit the
> same netCDF files to be written from either language interface or read from
> either interface.  If I understand it correctly, the primitives for fixed
> strings you propose would be useful solely for the Fortran interface, and
> the C interface would have to change to be able to access such data.  I
> think a more detailed interface proposal is necessary to permit us to
> evaluate the benefits and costs of such a change.

The concept of a fixed string primitive does not exclude its use by c.
Variable & attribute reading/writing routines always map from/to netcdf
data to/from language variables consecutively using the fixed width of the
datatype in question (1 byte for character, 8 for double, etc.).  A fixed
string data type with a size potentially greater than one can be dealt with
just as any current data type.  The extra work in defining variable length
strings for fortran (page 93-94) is needed to include this concept that is
really more a part of c.  Fixed strings can be directly used by either
language.

The key is in the definition since one additional piece of information, namely
the data size, is needed.  The ncvardef/NCVDEF, for example, passes only
the conventional dimensions.  The data size is fixed and implicit in the
the data type.  One horrid way around this, with the current interface, is
to use the NC_STRING (or whatever it would be called) type as a flag to
use the first value passed in the dimension array as the intrinsic variable
width.  The ncvarinq/NCVINQ routines then would pass the same information
back.  The problem with this is that it really is redefining, for a special
case, the meaning of the dimensions.  To be consistent, one must keep the
dimension information separate so that calls to nctyplen/NCTLEN return
the fixed length of this datum.  I don't see how to be completely consistent
since the type length I am proposing could be different for each variable
or attribute.  The call for nctyplen/NCTLEN would have to key off of a
variable id.

A less horrid way to include this information in the definition is to
extend the scalar nc_type/VARTYP to be a vector with the first entry
being the usual variable type and the second being the byte count
corresponding to this.

>> If the documentation (V1.11, page 23-24) is any indication, there are
>> plans afoot for "a new type for multibyte characters".  Perhaps someone
>> knows if this new anticipated primitive is motivated by multinational
>> character sets (often two bytes) and if it will look like a "fixed string"
>> data type as I have discussed.

> This was anticipated for the wide characters required for
> internationalization.

One could use the current CHAR primitive with a width of "2" for
international character sets.  One could use the current INT primitive
with the width of "8" instead of hyperlong.

Another possibility is to allow the definition of a datum size only with
added primitives.  So, for example, one could add vint, vfloat and vchar
that could each be an arbitrary number of bytes long.  Extension of
integers to arbitrary length is well defined.  Extension of characters
to arbitrary length is also well defined, but in the case of multi-national
character sets, one needs to know the language as well.  I don't know of
a multi-byte standard for floating point, but one could be adopted.  From
the standpoint of netcdf itself, it is not necessary to know the details
of the contents of each datum.  Only the operation of ncdump/ncgen are
affected.



  • 1991 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: