I'm trying to understand how netcdf handles the alignment of compound
types. The empirical evidence suggests that the library assumes that
the data is aligned according to the default alignment of the C
compiler, regardless of what the user specifies for the offsets of the
compound fields. Consider the following C program:
#include <stdlib.h>
#include <stdio.h>
#include "netcdf.h"
int
main()
{
int ncid,typeid,varid,dimid,ndims,natts,nfields;
size_t offset,size;
char name[NC_MAX_NAME + 1];
int dimids[] = {0};
struct s1
{
short i;
int j;
} __attribute__ ((__packed__));
struct s1 data[1];
/* Create some phony data. */
data[0].i = 20000;
data[0].j = 300000;
/* Create a file with a compound type. Write a little data. */
nc_create("test.nc", NC_NETCDF4, &ncid);
printf("size of compound %d\n",sizeof(struct s1));
nc_def_compound(ncid, sizeof(struct s1), "cmp1", &typeid);
printf("offset 1 %d\n",NC_COMPOUND_OFFSET(struct s1,i));
nc_insert_compound(ncid, typeid, "i",
NC_COMPOUND_OFFSET(struct s1, i), NC_SHORT);
printf("offset 2 %d\n",NC_COMPOUND_OFFSET(struct s1,j));
nc_insert_compound(ncid, typeid, "j",
NC_COMPOUND_OFFSET(struct s1, j), NC_INT);
nc_def_dim(ncid, "phony_dim", 1, &dimid);
nc_def_var(ncid, "phony_var", typeid, 1, dimids, &varid);
nc_put_var(ncid, varid, data);
nc_close(ncid);
/* Reopen the file and read back info about compound type */
/* Note that the size and the offsets are different than */
/* what was specified above */
nc_open("test.nc", NC_NOWRITE, &ncid);
nc_inq_varid(ncid, "phony_var", &varid);
nc_inq_var (ncid, varid, name, &typeid, &ndims, dimids, &natts);
nc_inq_compound_size(ncid, typeid, &size);
printf("size of compound %d\n",size);
nc_inq_compound_fieldoffset(ncid, typeid, 0, &offset);
printf("offset 1 %d\n",offset);
nc_inq_compound_fieldoffset(ncid, typeid, 1, &offset);
printf("offset 2 %d\n",offset);
nc_close(ncid);
}
When I run this I get (on Mac OS 10.5 using the June 1 netcdf-4.1 snapshot)
size of compound 6
offset 1 0
offset 2 2
size of compound 8
offset 1 0
offset 2 4
Note that my data is packed (no padding), and I specified the offsets
consistent with that packing, but when I read the data back in I find
that the library actually used a different alignment (with padding
consistent with the default compiler alignment).
When I raised this issue before, Ed said that the packed data is not
allowed, and you must use the default alignment of the C compiler.
There are at least three big problems with this policy:
1) It's very confusing that the library ignores the offsets you provide.
2) If you are compiling a program with a C compiler with a different
default alignment that the one netcdf assumes, you will get suprising
(and wrong) results.
3) There is actually no way to know what alignment netcdf is actually
going to use, short of creating a compound type and then reading it back
in. This means since the user can't count on specifying the offsets,
there's no way to know how to provide the data to nc_put_var ahead of time.
Shouldn't netcdf respect what the user provides for offsets, even if it
doesn't agree with the default compiler alignment? I know this makes it
hard to read and write C structs - in that case the user must intrepret
the data on his or her own using the specified offsets. This is how the
HDF5 library behaves, and it seems to me that if netcdf deviates from
this it will cause all kinds of problems for users down the line when
trying to read and write data that doesn't conform to the default
alignment expected by the netcdf library. It certainly has caused a lot
of headaches for me already.
-Jeff