[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

"pbuf_flush: time elapsed" problem (was: Problem with LDM 6.3.0)



Justin,

>Date: Thu, 15 Sep 2005 08:22:48 -0400
>From: Justin Cooke <address@hidden>
>Organization: NOAA
>To: address@hidden
>Subject: Problem with LDM 6.3.0

The above message contained the following:

> We have recently installed version 6.3.0 of LDM and are seeing 
> occasional errors with two of our PIPE processes.  I have included an 
> excerpt from the ldmd.log of one of the errors:
> 
> ---
> Sep 14 21:42:17 b2n1 eldm4[1171480]:   452967 20050914214215.998    PCWS 
> 000  FSL.CompressedNetCDF.MADIS.acars.20050914_2100.gz
> Sep 14 21:42:17 b2n1 pqact[1511588]:   452967 20050914214215.998    PCWS 
> 000  FSL.CompressedNetCDF.MADIS.acars.20050914_2100.gz
> Sep 14 21:42:17 b2n1 pqact[1511588]:                pipe: -close 
> /home/decdev/bin/run_dctamd.sh 
> /dcomdev/us007003/ldmdata/obs/upperair/tamdar 20050914_2100.gz
> Sep 14 21:44:17 b2n1 pqact[1511588]: pbuf_flush 2: time elapsed 120.000054
> Sep 14 21:44:17 b2n1 pqact[1511588]: pbuf_flush (2) Timed out
> Sep 14 21:44:17 b2n1 pqact[1511588]: pipe_put: 
> -close/home/decdev/bin/run_dctamd.sh/dcomdev/us007003/ldmdata/obs/upperair/tamdar20050914_2100.gz
>  
> write error

The error messages above mean that the pqact(1) process was unable to
flush the pipe to the script /home/decdev/bin/run_dctamd.sh.  The pipe
was open but the script wouldn't read from it within the allotted time
interval.  The command in the script that reads from the pipe is

    gzip -d > ${1}/$$.${2}

It's possible (though unlikely) that the gzip(1) process encountered a
problem with the data-product that caused it to terminate reading from
the standard input stream.

In any case, a definitive diagnosis is impossible unless a mechanism for
reporting errors is added to the script.  I suggest adding the command

    exec >> $HOME/logs/run_dctamd.log 2>&1

to the top of the script to help determine the cause of the problem.

Please contact me if you have any questions or discover something.

> Sep 14 21:44:17 b2n1 pqact[1511588]:                file: 
> /dcomdev/us007003/ldmdata/test/acars.20050914_2100.gz_214215
> ---
> 
> Throughout the day we receive hundreds of these acars messages but only 
> a couple will result in a time out and then the write error.  After this 
> error occurs the script that was acted on by LDM remains in the process 
> table and has to be purged with a kill -9.  We are also receiving this 
> feed to a different system but we are not seeing these errors.  On that 
> system the only difference is the version of LDM, 6.0.15, the pqact.conf 
> and script are the same for this datatype.
> 
> We tried version 6.4.1 and the same errors occurred, we also recompiled 
> 6.3.0 and increased DEFAULT_PIPE_TIMEO to 120 in pqact.c
> 
> #define DEFAULT_PIPE_TIMEO 120
> 
> again the errors still occurred.
> 
> I have attached the /home/decdev/bin/run_dctamd.sh script, it basically 
> unzips the stdin and puts the resulting data into a decoder.
> 
> Any ideas?
> 
> Thanks,
> 
> Justin Cooke
> NCEP Central Operations
> 
> --------------020706000607090506080009
> Content-Type: text/x-sh;
>  name="run_dctamd.sh"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
>  filename="run_dctamd.sh"
> 
> #!/bin/sh -vx
> 
> #
> #  This script is EXECed directly by DBNet in order to run the
> #  dctamd decoder on the data file given in the first argument.
> #
> #    Usage: ./run_dctamd.sh <tamdar_filename>
> #
> #  Once this is done, the data file itself is then compressed
> #  within its native directory for more efficient short-term
> #  storage.
> #
> 
> # The gzip line must be the first, noncomment line in this script
> # so that stdin is processed correctly
> 
> gzip -d > ${1}/$$.${2}
> madisfilename=${1}/`echo ${2} | cut -c1-13`
> hhmm=`date -u +%H%M`
> decoderfilename=${madisfilename}.${hhmm}
> mv ${1}/$$.${2} ${decoderfilename}
> 
> . /ioddev/dbndev/.profile
> 
> export MADIS_STATIC=$DCDROOT/lib/sorc/madis-2.5/static
> export MADIS_DATA=/dcomdev/us007003/ldmdata
> 
> ln -sf ${decoderfilename} ${madisfilename}
> 
> nice $DCDROOT/bin/decod_dctamd -v 2 \
>   -d /dcomdev/us007003/decoder_logs/decod_dctamd.log \
>   ${decoderfilename} /dcomdev/us007003/bufrtab.004
> 
> rm -f ${madisfilename}
> 
> #
> #  Compress the decoder input file within its native directory,
> #  in order to conserve disk space for these large files!
> #
> 
> gzip ${decoderfilename}
> 
> #
> #  Explicitly set the script return code to 0, in order to prevent
> #  the "compress" return code from becoming the script return code
> #  (and thereby prevent DBNet from re-running the script for this
> #  particular data file if there is a problem with the compress!)
> #
> 
> exit 0
> 
> --------------020706000607090506080009--

Regards,
Steve Emmerson