[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #TVK-349198]: Files dupping out



I will go ahead an reply with some scenarios to drive these into our support
archive in case others ever have this problem.

I will provide a couple of cases, with at least one solution for each.

In the first case study, you have a file that has a simple heartbeat message
to send via LDM to NCO for inclusion into the messages distributed to the
broader NOAA.

The file just has the following (/data/static/templates/planes_stationary.txt):

-- cut here --
ADMNHC

All hurricane hunters are on station, fueled for operations.
-- cut here --

And you have a script that checks that no new active status messages have
been sent out, so you send this heartbeat file with WMO prepended for current
date and time.

/bin/bash
.....
createHeader > /tmp/current_status.txt
cat /data/static/templates/planes_stationary.txt >> /tmp/current_status.txt
.....

Such that you end up with a bulletin that looks similar to this:

-- cut here --
NOUS87 KNHC 190915
ADMNHC

All hurricane hunters are on station, fueled for operations.
-- cut here --

And you inject that with a call in your script to pqinsert:

pqinsert -s 999 -f IDS -p "NOUS87 KNHC 190915" /tmp/current_status.txt

And the default MD5 checksum calculation will be the data *after* the header,
creating a situation where subsequent inserts will result in the product being
dumped on the floor as a duplicate.

In the first potential solution, the end of the file is salted with a high
resolution time stamp to force a different MD5 checksum on the entire product:

/bin/bash
.....
createHeader > /tmp/current_status.txt
cat /data/static/templates/planes_stationary.txt >> /tmp/current_status.txt
echo " " >> /tmp/current_status.txt
date -u "+%Y%d%m_%H%M_%S_%s" >> /tmp/current_status.txt
.....

The hazard with using this solution is the disruption of downstream decoders
that may focus on the bulletin content to automatically process the data.

The second potential solution is to use the "-i" option with the pqinsert:

man pqinsert
.....
OPTIONS
   -i Compute the MD5 signature from the product-identifier rather than from
      the product’s data.  You should only use this option if computing the
      MD5 signature from the product’s data takes too long and if the
      product-identifier is unique.
.....

The change in the pqinsert call would then result with:

pqinsert -i -s 999 -f IDS -p "NOUS87 KNHC 190915" /tmp/current_status.txt

The hazard with using this solution is potential for multiple bulletins going
out with the same potential WMO header.  Additional characters could, however,
be appended to the product ID to further differentiate and create a different
MD5 for each insert:

timestamp=`date -u "+%Y%m%d%H%M%S"`
pqinsert -i -s 999 -f IDS -p "NOUS87 KNHC 190915 /pADMNHC /t${timestamp}" 
/tmp/current_status.txt

For the second case, you wish to send a server disposition from NHC to NCO, in
a simple file:

/bin/bash
......
hostname -f > /tmp/nhc_server.txt
df >> /tmp/nhc_server.txt
w >> /tmp/nhc_server.txt
pqinsert -f EXP /tmp/nhc_server.txt
......

If the server is pretty stable, you may end up with files that are essentially
identical.  And thus deduplicated.

As in the first case, adding a time stamp to the file body or end would help
resolve a unique MD5 checksum to allow multiple messages with similar, but not
exactly the same, data to pass.  The key here - do not prepend the file with
a unique string . . . the first 30 bytes or so are ignored in the MD5 checksum
calculation.  Put it in the middle, or the end.

One more solution using this simple file insert.  As had been done with the
WMO header insert example, salt the product ID slightly to lend itself to be
unique:

timestamp=`date -u "+%Y%m%d%H%M%S"`
pqinsert -i -f EXP -p "/tmp/nhc_server.txt ${timestamp}" /tmp/nhc_server.txt

These examples are not necessarily meant to be the solution specific to your
situation, but help us have a starting point in our conversation in January.
-
Stonie Cooper, PhD
Software Engineer III
NSF Unidata Program Center
University Corporation for Atmospheric Research
I acknowledge that the land I live and work on is the traditional territory of 
The Pawnee, The Omaha, and The Otoe.

Ticket Details
===================
Ticket ID: TVK-349198
Department: Support LDM
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with NSF Unidata User Support are recorded in the 
Unidata inquiry tracking system and then made publicly available through the 
web.  If you do not want to have your interactions made available in this way, 
you must let us know in each email you send to us.