[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030902: 20030828: 20030815: corrupted GEMPAK sounding/upperair files (fwd)



Harry,

Below is the snippet from our ldmd.conf that exec's the various pqact
processes to split the processing load as well as manage
the maximum number of open streams that pqact may have open.

Note that the pqact.gempak_* files are provided in
$NAWIPS/ldm/etc/templates.

Steve Chiswell


exec    "pqact -f ANY-CRAFT-NNEXRAD-CONDUIT"
#
# Exec GEMPAK specific pqact processing
exec    "pqact -f NNEXRAD /opt/ldm/etc/pqact.gempak_nexrad"
exec    "pqact -f ANY-NNEXRAD-CRAFT-NIMAGE /opt/ldm/etc/pqact.gempak_decoders"
exec    "pqact -f MCIDAS|NIMAGE /opt/ldm/etc/pqact.gempak_images"
exec    "pqact -f WMO /opt/ldm/etc/pqact.gempak_nwx"
exec    "pqact -f WMO|SPARE|CONDUIT /opt/ldm/etc/pqact.gempak_upc"
#
# The following 3 entries split the pqact processing of CRAFT so that
# the number of open streams is less than the MAX (currently MAX=32).
exec    "pqact -f CRAFT -p BZIP2/K[A-E] /opt/ldm/etc/GEMPAK/pqact.gempak_craft"
exec    "pqact -f CRAFT -p BZIP2/K[F-L] /opt/ldm/etc/GEMPAK/pqact.gempak_craft"
exec    "pqact -f CRAFT -p BZIP2/K[M-Z] /opt/ldm/etc/GEMPAK/pqact.gempak_craft"






>From: Harry Edmon <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200309022235.h82MZfLd001198

>Can you send me an example ldmd.conf file that shows the seperate pqact lines?
>  
>Currently I have divisions between NEXRAD, HDS, and eveything else.
>
>>>>>> Forwarded message from David Ovens <address@hidden>
>
>Harry,
>
>Our upper-air GEMPAK files (/home/disk/data/gempak/upperair/*gem) have
>been getting corrupted.  Here's what Steve Chiswell suggests we look
>at.  Can you please try/look into these things he suggests?
>
>David
>
>Forwarded message:
>> From address@hidden  Tue Sep  2 13:45:06 2003
>> Message-Id: <address@hidden>
>> Organization: UCAR/Unidata
>> Keywords: 200309022044.h82KiuLd016228
>> To: David Ovens <address@hidden>
>> cc: address@hidden (Unidata Support),
>>    address@hidden (Lynn McMurdie)
>> Subject: 20030828: 20030815: corrupted GEMPAK sounding/upperair files 
>> In-reply-to: Your message of "Thu, 28 Aug 2003 13:17:06 PDT."
>>              <address@hidden> 
>> Date: Tue, 02 Sep 2003 14:44:55 -0600
>> From: Unidata Support <address@hidden>
>> 
>> 
>> David,
>> 
>> >From looking at your logs, it appears that the likely problem is more 
>> than one instance of the dcuair decoder writing to the file. Your
>> log file shows a small number of bulletins being decoded by process
>> ids interspersed with other invocations. This would appear to be corrupting
>> your output file.
>> 
>> I have found that the -O flag to SUNWspro 7.0 creates buggy executables too,
>> but you mentioned before that you weren't at that level, so it is
>> likely not the culprit here.
>> 
>> One cause could be that the IO from pqact is falling behind and the PIPE is 
>> filled up causing a new instance of the PIPE to be created.
>> 
>> Does it appear that data takes a while to show up in the decoded file
>> when you know the LDM received it in a timely fashion?
>> 
>> Is air.atmos.washington.edu  doing lots of pqact processing or have you 
>> noticed IO wait times increasing on the system? You can issue a "kill -USR2"
>> twice to the pqact processes to enter into debug logging. If you see
>> "Delay" messages (eg the time between when the data arrives at the
>> LDM product queue and when pqact gets around to processing the
>> action) more than a few seconds, then a tune up is probably needed.
>> Issue  the kill one more time to cycle back to silent (otherwise 
>> you will really fill up your ldmd.log file).
>> 
>> To make things easier for upgrading here, I run several pqact processes 
>> from ldmd.conf, each handling specific pqact.conf files to sepate
>> decoder proceses from filing processes, gempak separate from Mcidas etc.
>> The separate pieces of $NAWIPS/ldm/etc/templates for example run from
>> different pqact processes here (makes upgrading the distribution easier
>> since I don't have to edit large pqact.conf files between packages
>> and other users).
>> 
>> I have seen that pqact will fall behind when large number of FILE actions
>> (like filing every NEXRAD product) are pending. This usually
>> appears as NEXRAD products being written to disk an hour or more
>> after they are received....as a result of high IO wait times
>> which slows down other decoders running from pqact. Other things to look 
>> for would be dcmetr output lagging behind current data. 
>> 
>> If none of these seem to fit your situation let me know.
>> 
>> Steve Chiswell
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> >From: David Ovens <address@hidden>
>> >Organization: UCAR/Unidata
>> >Keywords: 200308282017.h7SKHCLd001081
>> 
>> >Steve,
>> >
>> >Well, we are definitely still having problems, in fact the files are
>> >even more corrupt now -- snlist will bomb out on 00Z as well as 12Z
>> >times.  We are decoding under Solaris (SunOS air.atmos.washington.edu
>> >5.9).  I have tried snlist on the same 5.9 machine and on my 5.8
>> >machine and they both are giving the
>> >
>> >*** TERMINATING  snlist
>> >*** Received signal 11 SIGSEGV
>> >Segmentation fault
>> >
>> >problem.  dbx shows this:
>> >
>> >27 air% dbx -I. snlist
>> >Reading snlist
>> >Reading ld.so.1
>> >Reading libF77.so.4
>> >Reading libM77.so.2
>> >Reading libsunmath.so.1
>> >Reading libm.so.1
>> >Reading libc.so.1
>> >Reading libdl.so.1
>> >Reading libc_psr.so.1
>> >(/usr/local/SUNWspro/bin/../WS5.0/bin/sparcv9/dbx) run
>> >Running: snlist 
>> >(process id 4733)
>> > GEMPAK-SNLIST>snfile = $UPA/20030828_upa.gem
>> > GEMPAK-SNLIST>area = dset
>> > GEMPAK-SNLIST>dattim = 0000
>> > GEMPAK-SNLIST>snparm = dset
>> > GEMPAK-SNLIST>output = f//tmp/junk
>> > GEMPAK-SNLIST>mrgdat = yes
>> > GEMPAK-SNLIST>save
>> > GEMPAK-SNLIST>run
>> >signal SEGV (no mapping at the fault address) in mr_mand_ at 0x59084
>> >0x00059084: mr_mand_+0x0210:    ld      [%o3 + %l1], %f2
>> >That was a problem with MRGDAT, I think.
>> >
>> >For MRGDAT = NO, we see:
>> >signal SEGV (no mapping at the fault address) in _single_to_decimal at
>> >0xff124170
>> >0xff124170: _single_to_decimal+0x0008:  ld      [%i0], %f2
>> >
>> >Note, if I run gdb on a Linux box trying to read this file, I see the
>> >following: 
>> > MRGDAT   = YES
>> > GEMPAK-SNLIST>r
>> >Program received signal SIGSEGV, Segmentation fault.
>> >0x08067c58 in dp_unpk_ ()
>> >
>> >and
>> >
>> > GEMPAK-SNLIST>mrgdat = no
>> > GEMPAK-SNLIST>r
>> > [FL -4]  Cannot read file ....
>> >Program received signal SIGSEGV, Segmentation fault.
>> >0x08067c58 in dp_unpk_ ()
>> >
>> >
>> >I have also placed a copy of our $UPA/20030828_upa.gem file into
>> >http://www.atmos.washington.edu/~ovens/gempak_sounding_problem/
>> >directory.  I would assume that you get the same problems running
>> >snlist since I believe it the file that is corrupt, not snlist.  I
>> >also put our dcuair.log file in that directory.
>> >
>> >David
>> >Unidata Support wrote:
>> >> 
>> >> 
>> >> David,
>> >> 
>> >> I have not found any problem here. Can you tel me what OS
>> >> your decoder is running on as well as your plotting programs.
>> >> 
>> >> If you built the distribution locally, you could see if dbx or gdb 
>> >> would tell you where the program was.
>> >> 
>> >> Otherwise, you could send me a copy of a file that exhibited the behavior
>> >> and I could try to duplicate your problem here.
>> >> 
>> >> Steve Chiswell
>> >> 
>> >> 
>> >> 
>> >> >From: David Ovens <address@hidden>
>> >> >Organization: UCAR/Unidata
>> >> >Keywords: 200308152057.h7FKvALd000889
>> >> 
>> >> >To: address@hidden
>> >> >
>> >> >Hello,
>> >> >
>> >> >We are decoding upperair data into GEMPAK sounding files using the
>> >> >standard dcuar command as found in the LDM section:
>> >> >
>> >> >WMO      ^U[ABCDEFGHIJKLMNPQRSTWX].... .... ([0-3][0-9])([0-2][0-9])
>> >> > PIPE    /home/disk/ldm/NAWIPS-5.6.H/bin/sol/dcuair -b 24 -m 16
>> >> > -d data/gempak/logs/dcuair.log
>> >> > -e GEMTBL=/home/disk/ldm/NAWIPS-5.6.H/gempak/tables
>> >> > -s snstns.tbl
>> >> > data/gempak/upperair/YYYYMMDD_upa.gem
>> >> >
>> >> >The files are getting corrupted somehow.  GARP crashes when selecting
>> >> >the 09/1200, 13/1200, and 14/1200 date/times to plot.  When I run
>> >> >snlist with the following:
>> >> >
>> >> > SNFILE   = $UPA/20030814_upa.gem
>> >> > AREA     = dset
>> >> > DATTIM   = 0000
>> >> > SNPARM   = dset
>> >> > STNDEX   = SHOW
>> >> > LEVELS   = 500
>> >> > VCOORD   = PRES
>> >> > OUTPUT   = T
>> >> > MRGDAT   = NO
>> >> >
>> >> >The last few entries look like corrupted station info:
>> >> > STID =               STNM =    40811   TIME = 030814/0000         
>> >> > SLAT = ******     SLON = -9999.00   SELV = -9999.0
>> >> >
>> >> >    TTAA      0
>> >> >          PRES      TMPC      DWPC      DRCT      SPED      HGHT
>> >> >        994.00     36.40     13.40      0.00      0.00  -9999.00
>> >> >       1000.00  -9999.00  -9999.00  -9999.00  -9999.00    -34.00
>> >> >        850.00     35.60    -13.40  -9999.00  -9999.00   1454.00
>> >> >        700.00     19.40    -12.60  -9999.00  -9999.00   3165.00
>> >> >        500.00     -6.70    -13.70  -9999.00  -9999.00   5920.00
>> >> >        400.00    -14.10    -59.10  -9999.00  -9999.00   7640.00
>> >> >        300.00    -29.50    -69.50  -9999.00  -9999.00   9750.00
>> >> >
>> >> >    TTBB      0
>> >> >          PRES      TMPC      DWPC
>> >> >        994.00     36.40     13.40
>> >> >        982.00     39.80      2.80
>> >> >        964.00     39.20     -6.80
>> >> >        958.00     40.00     -9.00
>> >> >        941.00     40.80     -8.20
>> >> >        906.00     40.00     -9.00
>> >> >        668.00     15.60     -8.40
>> >> >        635.00     12.00      0.00
>> >> >        506.00     -5.90    -11.90
>> >> >        494.00     -7.70    -15.70
>> >> >        482.00     -7.50    -42.50
>> >> >        460.00     -8.10    -50.10
>> >> >        440.00    -10.30    -57.30
>> >> >        420.00    -12.90    -28.90
>> >> >        413.00    -13.90    -58.90
>> >> >        410.00    -13.90    -58.90
>> >> >        395.00    -14.30    -59.30
>> >> >        337.00    -24.50    -66.50
>> >> >        285.00    -31.70    -70.70
>> >> >        277.00    -33.30  -9999.00
>> >> >
>> >> >Setting DATTIM = 1200 and output = f//tmp/junk gives:
>> >> > GEMPAK-SNLIST>r
>> >> >
>> >> >*** TERMINATING  snlist
>> >> >*** Received signal 11 SIGSEGV
>> >> >Segmentation fault
>> >> >
>> >> >Is anyone else having these problems with dcuar, GARP, and/or snlist?
>> >> >
>> >> >David
>> >> >-- 
>> >> >
>> >> >David Ovens              e-mail: address@hidden
>> >> >(206) 685-8108          plan: Real-time MM5 forecasting for Pacific
>Northwe
>> > st
>> >> >Research Meteorologist
>> >> >Dept of Atmospheric Sciences, Box 351640
>> >> >University of Washington 
>> >> >Seattle, WA  98195
>> >> >
>> >> 
>> >>
>****************************************************************************
>> >> Unidata User Support                                    UCAR Unidata
Program
>> >> (303)497-8643                                                  P.O. Box
>3000
>> >> address@hidden                                   Boulder, CO
>80307
>> >>
>----------------------------------------------------------------------------
>> >> Unidata WWW Service              http://my.unidata.ucar.edu/content/suppo
> rt
>> >>
>****************************************************************************
>> >> 
>> >
>> >
>> >-- 
>> >
>> >David Ovens         e-mail: address@hidden
>> >(206) 685-8108          plan: Real-time MM5 forecasting for Pacific Northwe
> st
>> >Research Meteorologist
>> >Dept of Atmospheric Sciences, Box 351640
>> >University of Washington 
>> >Seattle, WA  98195
>> >
>> 
>> ****************************************************************************
>> Unidata User Support                                    UCAR Unidata Program
>> (303)497-8643                                                  P.O. Box 3000
>> address@hidden                                   Boulder, CO 80307
>> ----------------------------------------------------------------------------
>> Unidata WWW Service              http://my.unidata.ucar.edu/content/support 
>> ****************************************************************************
>> 
>
>
>-- 
>
>       David Ovens     
>
><<  End forwarded message
>
>
>-- 
> Dr. Harry Edmon                       E-MAIL: address@hidden
> 206-543-0547                          address@hidden
> Dept of Atmospheric Sciences          FAX:    206-543-0308
> University of Washington, Box 351640, Seattle, WA 98195-1640
>