[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Suggested LDM updates



Hi Bob,

Thanks for the clarification - I see now what you are trying to do with the
filtered products.

> We have tried the sequence number approach and it didn't work...
> In pqinsert.c you'll see that the MD5 signature, which is used
> to identify duplicates, is calculated on the product contents
> only, and does not depend on any part of the prod_info
> structure.
>

The sequence number is also part of the product contents.   (I assume that's 
where
the ldm gets it in order to put it in the prod_info.)  Here are the first few
bytes of a typical product.  You'll see that the sequence number starts at the 
5th
byte of the product.

001  \r  \r  \n   5   4   1      \r  \r  \n   Y   U   I   E   5

I was suggesting that you modify those bytes, not the info in prod_info.

I assume there are products for which this isn't the case.  The truth is that I
don't yet know enough about the products to say which ones have a sequence 
number
in this location and which don't.

Anyway, for the products to which this applies, there is some variation in the
format of the sequence number - I think some ingest sites pad it with blanks, 
some
don't, I'm not sure what all the variations are.  Below is some code I wrote 
that
I use to determine whether to classify a product as a wmo product or not.  It 
does
this by looking for two particular strings in the first few bytes of a product.
The characters in between those two strings are considered the sequence number.
I use this function to skip over all those chars in calculating the checksum if
the user invokes the -5 option to do so.  That way products that are the same
except for their sequence numbers will be considered duplicates if the option is
invoked.

I hope this helps.  If you'd like more info let me know.

Anne

-----

/*
 * Determine if a product starts with the string
 * "\r\r\n<sequnceNumber>\r\r\n".  If it doesn't, return.  If it does,
 * return a pointer to the start of the product, skipping over those
 * leading control chars.
 *
 * A sequence number is expected to be any string of at most MAX_SEQ_NUM_LEN
 * digits with possibly leading or trailing blanks included in that count.
 * However, the only check done here is to see that the sequence number
 * consists of MAX_SEQ_NUM_LEN or fewer characters. No other checks are
 * performed.
 */
char *
wmo_prod(const char *prod)
{
#define PART1_SIZE 4
#define PART2_SIZE 4
#define MAX_SEQ_NUM_LEN 4

  char part1[PART1_SIZE] = {'', '\r', '\r', '\n'};
  char part2[PART2_SIZE] = {'\r', '\r', '\n', '\0'}; /* '\0' is for strstr */
  char *startPart2;
  int seqNumLength;

  /*
   * If part1 is not at start of product, return
   */
  if ((strncmp(part1, prod, PART1_SIZE)) != 0)
    return 0;

  /*
   * If part2 doesn't occur somewhere after part1, return
   */
  if ((startPart2 = strstr (prod+PART1_SIZE, part2)) == 0)
    return 0;

  /*
   * Pick out substring between part1 and part2 that contains the
   * sequence number
   */
  seqNumLength = startPart2 - (prod+PART1_SIZE);

  /*
   * Sanity check: if the length of the sequence number string is
   * too big, return
   */
  if (seqNumLength > MAX_SEQ_NUM_LEN)
    return 0;

  /*
   * If we got here, we've classified it as a wmo product.
   * Return a pointer to the beginning of the product.
   */
  return startPart2 + PART2_SIZE - 1;  /* exclude trailing '\0'  */
}

--
***************************************************
Anne Wilson                     UCAR Unidata Program
address@hidden                  P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************