Re: [netcdfgroup] How to find the full dimension names (pathswithgroups) for a variable?

ok, I think I found the solution...

The fact that group dimension IDs are in fact unique makes possible to match 
them with dimension IDs for variables...

But only if I have a list of

1) The full path of all dimensions in the file
2) The full path of all dimensions for each variable

I already had this. I constructed my "path only" model by recursively iterating 
the file, starting at root, 
and for every group I store the current path passed as a parameter to the 
recursive function.

The API gets me all local info for variables, for the current group, including 
dimensions for variables and dimensions for groups.

The additional step is to store for each group, the dimension ID, and for every 
variable dimension, its ID.
Then match them.

So, I take back my comment that "IDs are a recipe for disaster", for dimensions 
they are actually the solution.

I was thinking more of variable IDs, that can have duplicated values for each 
group, somehow I missed this dimension ID issue.

Here's my output with this patch applied

ncks: INFO nco_bld_dmn_ids_trv() traversing variable </g16/g16g2/lon1_var>
match <8> for var dim </g16/lon1> and group dim </g16/lon1>

In summary

1) the API does not get me the full dimension path for each variable, but it's 
possible to construct them.
2) I don't need variable IDs and group IDs


Pedro



------
Pedro Vicente, Earth System Science
University of California, Irvine
http://www.ess.uci.edu/


  ----- Original Message ----- 
  From: Pedro Vicente 
  To: netcdfgroup@xxxxxxxxxxxxxxxx 
  Sent: Monday, March 04, 2013 1:40 AM
  Subject: Re: [netcdfgroup] How to find the full dimension names 
(pathswithgroups) for a variable?



  hmm.. another correction

  >> that is, if I compare the nc_inq_vardimid variable dimension IDs with the 
nc_inq_dimid dimension group IDs for *all* groups in the file 


  I did a little experiment that does just this , and if I had a match of 
variable dimension ID with group dimension ID I printed a message like

  ncks: INFO nco_bld_dmn_ids_trv() traversing variable </g16/g16g1/lon1_var>
  match <8> for var dim <lon1> and group dim <lon1>

  this tells me that for the variable  </g16/g16g1/lon1_var> 
  I have a dimension with a *relative* name  <lon1> and a ID <8>
  and that the IDs for variable dimensions and group dimensions are the same 

  Furthermore, I do not have duplicated dimension IDs, they are nicely ordered 
from 0 to the number of unique dimensions in my file
  ...so, it seems that dimension IDs are unique.

  If this is the case, this is good ...

  Is this the case ?

  but.. back to the original problem,  the API  is not telling me the absolute 
location of the dimension

  that output just tells me that for the variable  </g16/g16g1/lon1_var>, 
absolute location ,

  it tells me that *somewhere* in scope I have a dimension called "lon1" and 
that has an ID=8

  Pedro


  ------
  Pedro Vicente, Earth System Science
  University of California, Irvine
  http://www.ess.uci.edu/


    ----- Original Message ----- 
    From: Pedro Vicente 
    To: Pedro Vicente ; netcdfgroup@xxxxxxxxxxxxxxxx 
    Sent: Sunday, March 03, 2013 8:42 PM
    Subject: Re: [netcdfgroup] How to find the full dimension names (paths 
withgroups) for a variable?



    Correction,

    "nc_inq_vardimid" is the function to get the dimension IDs for the 
*variable*

    It is part of the "nc_inq_var" family


    http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c.html

    this page does not mention the function in the top index , in the list of 
Variables functions


    From the Manual:


    int nc_inq_vardimid (int ncid, int varid, int dimids[]);

    ncid NetCDF ID, from a previous call to nc_open or nc_create.
    varid Variable Id
    dimids Returned vector of *ndimsp dimension IDs corresponding to the 
variable dimensions
    The caller must allocate enough space for a vector of at least *ndimsp 
integers to be returned. 
    The maximum possible number of dimensions for a variable is given by the 
predefined constant NC_MAX_VAR_DIMS.


    But this does not mention anything about the *location* of these dimensions 
(location in the group hierarchy).


    Can these IDs obtained with nc_inq_vardimid  be compared with the IDs 
obtained with the function that gets dimension IDs for groups ? 

    int nc_inq_dimid (int ncid, const char *name, int *dimidp);
    ncid
        NetCDF ID, from a previous call to nc_open or nc_create.
    name
        Dimension name.
    dimidp
        Pointer to location for the returned dimension ID. 



    that is, if I compare the nc_inq_vardimid variable dimension IDs with the 
nc_inq_dimid dimension group IDs for  *all* groups in the file (making sure 
that I get only the dimesion IDs for that
    group at each iteration, not ancestor groups) ,
    then if they have the same numerical value, that means that the matched 
dimension ID from the variable is located in that particular group ?

    Do I have to traverse all the file , or just the groups that are "in scope" 
of the variable ?

    I already have a list of all the objects in the file, that has full group 
name for objects (groups and variables), this might be easier
    than to build a "in scope" function, based in name comparison (Basically 
what I did in that code I sent)

    Thanks

    Pedro

    ------
    Pedro Vicente, Earth System Science
    University of California, Irvine
    http://www.ess.uci.edu/


    PS: here's the function in its full glory :)


    void                          
    nco_bld_dmn_trv                       /* [fnc] Build dimension info for all 
variables */
    (const int nc_id,                     /* I [ID] File ID */
     trv_tbl_sct * const trv_tbl)         /* I/O [sct] GTT (Group Traversal 
Table) */
    {
      /* Purpose: a netCDF4 variable can have its dimensions located anywhere 
below *in the group path*
      Construction of this list *must* be done after traversal table is build 
in nco_grp_itr(),
      where we know the full picture of the file tree
      */

      char dmn_nm_var[NC_MAX_NAME];/* [sng] Dimension name for variable */ 
      char dmn_nm_grp[NC_MAX_NAME];/* [sng] Dimension name for group */ 

      const int flg_prn=1;         /* [flg] Dimensions in all parent groups 
will also be retrieved */ 

      int dmn_id_grp[NC_MAX_DIMS]; /* [id] Dimensions IDs array for group */
      int dmn_id_var[NC_MAX_DIMS]; /* [id] Dimensions IDs array for variable */

      int nbr_dmn_grp;             /* [nbr] Number of dimensions for group  */
      int nbr_dmn_var;             /* [nbr] Number of dimensions for variable */
      int var_id;                  /* [id] ID of variable  */
      int grp_id;                  /* [id] ID of group */

      char *ptr_chr;               /* [sng] Pointer to character '/' in full 
name */
      int psn_chr;                 /* [nbr] Position of character '/' in in 
full name */

      /* Loop *object* traversal table */
      for(unsigned uidx=0;uidx<trv_tbl->nbr;uidx++){
        if(trv_tbl->lst[uidx].nco_typ == nco_obj_typ_var){
          trv_sct trv=trv_tbl->lst[uidx];  

          /* Obtain group ID using full group name */
          (void)nco_inq_grp_full_ncid(nc_id,trv.grp_nm_fll,&grp_id);

          /* Obtain variable ID using group ID */
          (void)nco_inq_varid(grp_id,trv.nm,&var_id);

          /* Get number of dimensions for variable */
          (void)nco_inq_varndims(grp_id,var_id,&nbr_dmn_var);

          /* Get dimension IDs for variable */
          (void)nco_inq_vardimid(grp_id,var_id,dmn_id_var);

          /* Obtain dimension IDs for group. NB: go to parents */
          (void)nco_inq_dimids(grp_id,&nbr_dmn_grp,dmn_id_grp,flg_prn);

          /* Loop over dimensions of variable */
          for(int dmn_idx_var=0;dmn_idx_var<nbr_dmn_var;dmn_idx_var++){

            /* Get dimension name */
            (void)nco_inq_dimname(grp_id,dmn_id_var[dmn_idx_var],dmn_nm_var);

            /* Now the exciting part; we have to locate where "dmn_var_nm" is 
located
            1) Dimensions are defined in *groups*: find group where variable 
resides
            2) Most common case is for the dimension to be defined in the same 
group where variable is
            3) If not, we have to traverse the group back until the dimension 
name is found

            From: "Dennis Heimbigner" <dmh@xxxxxxxxxxxxxxxx>
            Subject: Re: [netcdfgroup] defining dimensions in groups
            1. The inner dimension is used. The rule is to look up the group 
tree
            from innermost to root and choose the first one that is found
            with a matching name.
            2. The fact that it is a dimension for a coordinate variable is not 
relevant for the
            choice.
            However, note that this rule is only used by ncgen when 
disambiguating a reference
            in the CDL.  The issue does not come up in the netcdf API because
            you have to specifically supply the dimension id when defining the 
dimension
            for a variable.

            4) Use case example: /g5/g5g1/rz variable and rz(rlev), where 
dimension "rlev" resides in /g5/rlev 
            */

            /* Loop over dimensions of group *and* parents */
            for(int dmn_idx_grp=0;dmn_idx_grp<nbr_dmn_grp;dmn_idx_grp++){

              /* Get dimension name for group */
              (void)nco_inq_dimname(grp_id,dmn_id_grp[dmn_idx_grp],dmn_nm_grp);

              /* Does dimension name for *variable* match dimension name for 
*group* ? */ 
              if(strcmp(dmn_nm_var,dmn_nm_grp) == 0){

                /* Now...we know that *somewhere* for all this group dimensions 
one is the real deal 
                Attempt to construct a *possible* full dimension name and 
compare with the table dimension list
                until a full name match is found ... */

                /* Was the dimension found?: handy in all this *tortured* 
logic; needs revision, but works ! */
                nco_bool dmn_was_found=False;

                /* Construct *possible* dimension full name */
                char 
*dmn_nm_fll=(char*)nco_malloc(strlen(trv.grp_nm_fll)+strlen(dmn_nm_var)+2L);
                strcpy(dmn_nm_fll,trv.grp_nm_fll);
                if(strcmp(trv.grp_nm_fll,"/")) strcat(dmn_nm_fll,"/");
                strcat(dmn_nm_fll,dmn_nm_var);

                /* Brute-force approach to find valid "dmn_nm_fll":
                Start at grp_nm_fll/dmn_nm_var and build all possible paths 
with dmn_nm_var. 
                Use cases are:
                Real life output of: ncks --get_grp_info  ~/nco/data/in_grp.nc
                /g1/lon: 1 dimensions: /lon : 
                /g5/g5g1/rz: 1 dimensions: /g5/rlev : 
                /g10/three_dmn_rec_var: 3 dimensions: /time : /lat : /lon :     
      
                */

                /* Find last occurence of '/' */
                ptr_chr=strrchr(dmn_nm_fll,'/');
                psn_chr=ptr_chr-dmn_nm_fll;

                /* While there is a possible dimension path */
                while(ptr_chr && !dmn_was_found){

                  /* Search table dimension list */
                  for(unsigned int 
dmn_lst_idx=0;dmn_lst_idx<trv_tbl->nbr_dmn;dmn_lst_idx++){
                    dmn_fll_sct dmn_fll=trv_tbl->lst_dmn[dmn_lst_idx];  

                    /* Does the *possible* dimension full name match a *real* 
dimension full name ? */
                    if(strcmp(dmn_fll.nm_fll,dmn_nm_fll) == 0){

                      /* Store full dimension name  */
                      
trv_tbl->lst[uidx].var_dmn[dmn_idx_var].dmn_nm_fll=strdup(dmn_nm_fll);

                      /* The relative dimension name was already stored   */
                      
assert(strcmp(trv_tbl->lst[uidx].var_dmn[dmn_idx_var].dmn_nm,dmn_nm_var) == 0);

                      /* Store full group name where dimension is located. 
NOTE: using member "grp_nm_fll" of dimension  */
                      
trv_tbl->lst[uidx].var_dmn[dmn_idx_var].grp_nm_fll=strdup(dmn_fll.grp_nm_fll);

                      /* Free allocated */
                      dmn_nm_fll=(char *)nco_free(dmn_nm_fll);

                      /* Found */
                      dmn_was_found=True;

                      /* Exit table dimension list loop */
                      break;
                    } /* End Does the *possible* dimension full name match a 
*real* dimension full name */
                  } /* End Search table dimension list loop */

                  /* Keep on trying... Re-add dimension name to shortened path 
*/ 

                  /* If a valid (pointer) name here, then the constructed name 
was not found */
                  if(dmn_nm_fll) {
                    dmn_nm_fll[psn_chr]='\0';
                    ptr_chr=strrchr(dmn_nm_fll,'/');
                    if(ptr_chr){
                      psn_chr=ptr_chr-dmn_nm_fll;
                      dmn_nm_fll[psn_chr]='\0';
                      if(strcmp(dmn_nm_fll,"/")) strcat(dmn_nm_fll,"/");
                      strcat(dmn_nm_fll,dmn_nm_var);
                      ptr_chr=strrchr(dmn_nm_fll,'/');
                      psn_chr=ptr_chr-dmn_nm_fll;
                    } /* !ptr_chr */
                  } /* If dmn_nm_fll */
                } /* End While there is a possible dimension path */ 

                /* Free allocated (this should never happen here; a dimension 
must always be found) */
                if(dmn_nm_fll) dmn_nm_fll=(char *)nco_free(dmn_nm_fll);

              } /* End Does dimension name for variable match dimension name 
for group ?  */
            } /* End Loop over dimensions of group *and* parents */
          } /* End Loop over dimensions of variable */
        } /* End object is variable nco_obj_typ_var */
      } /* End Loop *object* traversal table  */


    } /* end nco_blb_dmn_trv() */



      ----- Original Message ----- 
      From: Pedro Vicente 
      To: netcdfgroup@xxxxxxxxxxxxxxxx 
      Sent: Sunday, March 03, 2013 5:10 PM
      Subject: [netcdfgroup] How to find the full dimension names (paths 
withgroups) for a variable?



      Hi netCDF team

      This email is rather long, so please bear with me ...

      The short read and main question is:

      How to find the full dimension names (paths with groups) for all 
dimensions that a  variable has?

      Example:
      Note: Incomplete CDL syntax

      group: g16 { 
          dimensions:
          lon1=4;  //dimension that has a coordinate variable down in scope at 
/g16/g16g1/lon1(lon1) 
          
          group: g16g1 { 
           variables:
           float lon1(lon1);  //coordinate variable /g16/g16g1/lon1 that has 
dimension (/g16/lon1) in scope
           float lon1_var(lon1); // variable /g16/g16g1/lon1_var that has 
dimension (/g16/lon1) in scope *and* coordinate (/g16/g16g1/lon1) in scope
           
           data:
           lon1=0.,1.,2.,3.;
           lon1_var=0.,1.,2.,3.;  
           
           
      Note that coordinate variables can share dimensions; here's a case of a 
"parallel" group /g16/g16g2/ of /g16/g16g1/
      where variables have their own local coordinate variable that share the 
ancestor dimension (/g16/lon1)

           group: g16g2 { 
           variables:
           //coordinate variable (/g16/lon1)
           float lon1(lon1); 
           float lon1_var(lon1);   

          
      It is possible to construct other  cases, variables with n dimensions, 
each one defined in different groups (and each one of these dimensions can have 
coordinate 
      variables in *other* different groups )


      More broadly, I am trying to construct a model for ncks of a netCDF4 file 
that includes :

      1) A list of all "objects" in the file

      I call an "object" what I call an object in HDF5: either a group or a 
variable (a variable is commonly called in HDF5 a "dataset" ).

      2) netCDF4 has dimensions. HDF5 does not (Let's ignore HDF5 dimension 
scales for now, to keep this simple... Coincidently netCDF4 *happens* to use 
HDF5
      dimension scales in its inner model, but my understanding is that it did 
not had to be that way... I think. Imagine for example that HDF5 dimension 
scales did not exist...
      It would be perfectly possible for netCDF4 to use HDF5 as the underlying 
format... HDF5 dimension scales are not part of the HDF5 format, they
      are just an abstraction layer build above HDF5 with a so called "High 
Level" API.... At the time the requirement was for HDF5 to have 
      the equivalent of HDF(4) "coordinate variables", that could be shared 
between HDF5 datasets)

      excellent article about dimension scales 

      
http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions


      Let's call these netCDF4 dimensions, "unique dimensions".
      These are defined in groups.

      3) This model stores *full names* of things: full names, for groups, 
variables and unique dimensions. Also, full names for coordinate variables.

      4) Coordinate variables.

      From the netCDF manual

      "It is legal for a variable to have the same name as a dimension. Such 
variables have no
      special meaning to the netCDF library. However there is a convention that 
such variables
      should be treated in a special way by software using this library.
      A variable with the same name as a dimension is called a coordinate 
variable."


      Dimensions and coordinate variables are used by variables. So, variables 
must know where dimensions and coordinate variables (if existent for that 
variable) are.

      Example of an output, that prints either a dimension or a coordinate 
variable for any variable

      /g16/g16g1/lon1   ---> coordinate variable 
      lon1[0]=0 
      lon1[1]=1 
      lon1[2]=2 
      lon1[3]=3

      /g16/g16g1/lon1_var ---> variable with coordinate variable 
      lon1[0]=0 lon1_var[0]=0 
      lon1[1]=1 lon1_var[1]=1 
      lon1[2]=2 lon1_var[2]=2 
      lon1[3]=3 lon1_var[3]=3 

      The API function that returns a dimension name for a variable is

      From the netCDF C manual

      
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/nc_005finq_005fdim-Family.html#nc_005finq_005fdim-Family


      int nc_inq_dimname (int ncid, int dimid, char *name);
      ncid NetCDF ID, from a previous call to nc_open or nc_create.
      dimid Dimension ID, from a previous call to nc_inq dimid or nc_def_dim.
      name Returned dimension name.

      Note: here "ncid" is actually a "location" ID (either a group or the main 
netCDF file ID), so I think you should change this in the documentation

      The "dimid" parameter is an ID of a dimension.

      This is obtained with the API function

      int nc_inq_dimid (int ncid, const char *name, int *dimidp);
      ncid NetCDF ID, from a previous call to nc_open or nc_create.
      name Dimension name.
      dimidp Pointer to location for the returned dimension ID.

      From the manual:
      "When searching for a dimension, the specified group is searched, and 
then its parent group,
      and then its grandparent group, etc., up to the root group."

      Ok, great, the dimension ID "dimidp" can be in a ancestor group, but how 
to know where?


      My understanding is that netCDF4 group IDs are "unique"; dimension IDs 
are not, they can have duplicated values in several groups.

      In the above call nc_inq_dimid, dimension IDs in ancestor groups are 
returned, but duplicates may happen. 

      I think storing IDs, even unique group IDs, in the model above is a 
recipe for disaster. 
      I see IDs as an equivalent of the paper ticket number I am given when I 
take the train and want to keep my luggage at a station for a while.
      When I get my bags back, I dispose the ticket number. 
      That ticket is helpful for the person that has to identify my bags only. 

      As a developer, for debugging purposes, or even as a netCDF4 user, it is 
also much easier to identify something by name than by ID.

      Possible ways to solve this (to get full dimension name for a variable):

      1) Iterate ancestor groups, get all variables for each group, get 
variables dimension IDs, and compare with group dimension Ids ?
      2) Iterate ancestor groups, try to construct a possible full dimension 
name and match ?

      Below is some code sample that tries to solve this using option 2) above, 

      But as a netCDF API user, I don't think that I should have to do this, 
mainly because it could just be wrong (it could not cover all cases, for 
example).

      What I think is needed here is a new  API function that returns the 
*full* dimension names for all dimensions used by a variable, instead of an ID 
and relative name only.
      With information if that dimension "name" is a coordinate variable or 
just a dimension. 

      Would it be possible for the netCDF group to supply this function?

      There is a similar function for groups:

      
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/nc_005finq_005fgrpname_005ffull.html#nc_005finq_005fgrpname_005ffull

      int nc_inq_grpname_full(int ncid, size_t *lenp, char *full_name);
      ncid The group id for this operation.
      full_name
      Pointer to allocated space of correct length.

      That returns the *full name* of the group from the group ID, this is one 
of the most helpful functions to construct this "full path" model

      Thanks for your help

      Pedro


      ------
      Pedro Vicente, Earth System Science
      University of California, Irvine
      http://www.ess.uci.edu/


      PS: here's the code that tries to find full dimension names


      /* Loop *object* traversal table */
        for(unsigned uidx=0;uidx<trv_tbl->nbr;uidx++){
          if(trv_tbl->lst[uidx].nco_typ == nco_obj_typ_var){
            trv_sct trv=trv_tbl->lst[uidx];  

            /* Obtain group ID using full group name */
            (void)nco_inq_grp_full_ncid(nc_id,trv.grp_nm_fll,&grp_id);

            /* Obtain variable ID using group ID */
            (void)nco_inq_varid(grp_id,trv.nm,&var_id);

            /* Get number of dimensions for variable */
            (void)nco_inq_varndims(grp_id,var_id,&nbr_dmn_var);

            /* Get dimension IDs for variable */
            (void)nco_inq_vardimid(grp_id,var_id,dmn_id_var);

            /* Obtain dimension IDs for group. NB: go to parents */
            (void)nco_inq_dimids(grp_id,&nbr_dmn_grp,dmn_id_grp,flg_prn);

            /* Loop over dimensions of variable */
            for(int dmn_idx_var=0;dmn_idx_var<nbr_dmn_var;dmn_idx_var++){

              /* Get dimension name */
              (void)nco_inq_dimname(grp_id,dmn_id_var[dmn_idx_var],dmn_nm_var);

              /* Now the exciting part; we have to locate where "dmn_var_nm" is 
located
              1) Dimensions are defined in *groups*: find group where variable 
resides
              2) Most common case is for the dimension to be defined in the 
same group where variable is
              3) If not, we have to traverse the group back until the dimension 
name is found

              From: "Dennis Heimbigner" <dmh@xxxxxxxxxxxxxxxx>
              Subject: Re: [netcdfgroup] defining dimensions in groups
              1. The inner dimension is used. The rule is to look up the group 
tree
              from innermost to root and choose the first one that is found
              with a matching name.
              2. The fact that it is a dimension for a coordinate variable is 
not relevant for the
              choice.
              However, note that this rule is only used by ncgen when 
disambiguating a reference
              in the CDL.  The issue does not come up in the netcdf API because
              you have to specifically supply the dimension id when defining 
the dimension
              for a variable.

              4) Use case example: /g5/g5g1/rz variable and rz(rlev), where 
dimension "rlev" resides in /g5/rlev 
              */

              /* Loop over dimensions of group *and* parents */
              for(int dmn_idx_grp=0;dmn_idx_grp<nbr_dmn_grp;dmn_idx_grp++){

                /* Get dimension name for group */
                
(void)nco_inq_dimname(grp_id,dmn_id_grp[dmn_idx_grp],dmn_nm_grp);

                /* Does dimension name for *variable* match dimension name for 
*group* ? */ 
                if(strcmp(dmn_nm_var,dmn_nm_grp) == 0){





--------------------------------------------------------------------------


      _______________________________________________
      netcdfgroup mailing list
      netcdfgroup@xxxxxxxxxxxxxxxx
      For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/


------------------------------------------------------------------------------


  _______________________________________________
  netcdfgroup mailing list
  netcdfgroup@xxxxxxxxxxxxxxxx
  For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/
  • 2013 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: