[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040818: Gempak decoder crashing problem



Tom,

Are you using the Solaris binary distribution, or build locally?
My question relates to whether the -O optimization is affecting
the local build.

I'll see if I can create a duplicate of your problem for the 5.7.3
release I'm working on.

Steve Chiswell
Unidata User SUpport




>From: Tom McDermott <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200408181620.i7IGKLaW014390

>Hi,
>
>I did a fresh install of GEMPAK5.7.2p2 on my Sun Solaris 8 sparc ldm
>server, now I'm trying to track down some of these decoder
>crashing problems that I've had for ages.  I've had a couple of
>these sort of messages in my ldmd.log file:
>
>Aug 12 18:04:05 vortex pqact[29049]: pbuf_flush (5) write: Broken pipe
>Aug 12 18:04:05 vortex pqact[29049]: pbuf_flush 5: time elapsed   4.100739
>Aug 12 18:04:05 vortex pqact[29049]: pipe_dbufput:
>decoders/dcgrib2-ddata/gempak/logs/dcgrib2_ocean.log-eGEMTBL=/weather/GEMPAK5.
> 7.2p2/gempak/tables
>write error
>Aug 12 18:04:05 vortex pqact[29049]: pipe_prodput: trying again
>Aug 12 18:04:05 vortex pqact[29049]: child 7437 terminated by signal 11
>
>Now here is the corresponding entries for child 7437 in the
>dcgrib2_ocean.log:
>
>[7437] 040812/1358 [DC 3]  Starting up. Version 5.7.2p2
>[7437] 040812/1358 [DC -11]  No command line arguments found.
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 112306 > 84306
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 114160 > 84306
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 102206 > 84306
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 144889 > 78160
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 88306 > 84306
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 122164 > 78160
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 88306 > 84306
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 141184 > 84306
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 119054 > 84306
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 103196 > 84306
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 191070 > 108898
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 93994 > 56636
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 109468 > 67604
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 157832 > 108898
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 137832 > 108898
>[7437] 040812/1404 [DCGRIB -50] bulletin too long 162084 > 108898
>[8458] 040812/1404 [DC 3]  Starting up. Version 5.7.2p2
>
>It appears that the 'bulletin too long' is not causing it to die, because
>other instances of dcgrib terminate normally after a whole bunch of those
>messages.
>
>From browsing your answers in the gempak support archives, I've looked at
>multiple invocations of the decoder writing to the same file, and this
>does not appear to be happening in the dcgrib2_ocean.log.  As to whether
>pqact is falling behind as the cause, I would have used the -USR2 signal
>to put ldmd.log into verbose mode, but these 'terminated by signal 11'
>messages are so sporadic that I could wait a couple of hours for another
>one, meanwhile, the log is filling up.  However, I may have to do it
>eventually.  Recently I did put the log into verbose mode to diagnose
>another problem and these were the 'delay' messages that ocurred:
>
>Aug 11 18:20:41 vortex pqact[6831]: Delay: 0.0225 sec
>Aug 11 18:20:41 vortex pqact[6831]: Delay: 0.0383 sec
>
>Admittedly, I was not getting the signal 11 messages for the ocean files
>at the time, so it may not tell us much.
>
>I hope I've given you enough information to go on so you can make some
>suggestions. Thank you.
>
>Tom
>-----------------------------------------------------------------------------
>Tom McDermott                          Email: address@hidden
>Systems Administrator                  Phone: (585) 395-5718
>Earth Sciences Dept.                   Fax: (585) 395-2416
>SUNY College at Brockport
>
>
>
>From address@hidden  Thu Aug 12 15:09:19 2004
>Return-Path: <address@hidden>
>Received: from vortex.esc.brockport.edu (vortex.esc.brockport.edu [137.21.88.1
> 51])
>       by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i7CL9IaW023531
>       for <address@hidden>; Thu, 12 Aug 2004 15:09:19 -0600 (MDT)
>Organization: UCAR/Unidata
>Keywords: 200408122109.i7CL9IaW023531
>Received: from cyclone (cyclone [137.21.88.157])
>       by vortex.esc.brockport.edu (8.11.7p1+Sun/8.8.8) with ESMTP id i7CL9Hg1
> 5875
>       for <address@hidden>; Thu, 12 Aug 2004 17:09:18 -0400 (EDT)
>Date: Thu, 12 Aug 2004 17:09:17 -0400 (EDT)
>From: Tom McDermott <address@hidden>
>X-X-Sender: <tmcderm@cyclone>
>To: <address@hidden>
>Subject: GEMPAK: minor decoder crashing problem
>Message-ID: <Pine.SOL.4.31.0408121603290.13187-100000@cyclone>
>MIME-Version: 1.0
>Content-Type: TEXT/PLAIN; charset=US-ASCII
>X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on 
>       laraine.unidata.ucar.edu
>X-Spam-Level: 
>X-Spam-Status: No, hits=-3.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
>       version=2.63
>
>Steve,
>
>I did a fresh install of GEMPAK5.7.2p2 on my Sun Solaris 8 sparc ldm
>server yesterday, now I'm trying to track down some of these decoder
>crashing problems that I've had for ages.  Today I've had a couple of
>these sort of messages in my ldmd.log file:
>
>Aug 12 18:04:05 vortex pqact[29049]: pbuf_flush (5) write: Broken pipe
>Aug 12 18:04:05 vortex pqact[29049]: pbuf_flush 5: time elapsed   4.100739
>Aug 12 18:04:05 vortex pqact[29049]: pipe_dbufput:
>decoders/dcgrib2-ddata/gempak/logs/dcgrib2_ocean.log-eGEMTBL=/weather/GEMPAK5.
> 7.2p2/gempak/tables
>write error
>Aug 12 18:04:05 vortex pqact[29049]: pipe_prodput: trying again
>Aug 12 18:04:05 vortex pqact[29049]: child 7437 terminated by signal 11
>
>Now here is the corresponding entries for child 7437 in the
>dcgrib2_ocean.log:
>
>[7437] 040812/1358 [DC 3]  Starting up. Version 5.7.2p2
>[7437] 040812/1358 [DC -11]  No command line arguments found.
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 112306 > 84306
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 114160 > 84306
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 102206 > 84306
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 144889 > 78160
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 88306 > 84306
>[7437] 040812/1358 [DCGRIB -50] bulletin too long 122164 > 78160
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 88306 > 84306
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 141184 > 84306
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 119054 > 84306
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 103196 > 84306
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 191070 > 108898
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 93994 > 56636
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 109468 > 67604
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 157832 > 108898
>[7437] 040812/1359 [DCGRIB -50] bulletin too long 137832 > 108898
>[7437] 040812/1404 [DCGRIB -50] bulletin too long 162084 > 108898
>[8458] 040812/1404 [DC 3]  Starting up. Version 5.7.2p2
>
>It appears that the 'bulletin too long' is not causing it to die, because
>other instances of dcgrib terminate normally after a whole bunch of those
>messages.
>
>From browsing your answers in the gempak support archives, I've looked at
>multiple invocations of the decoder writing to the same file, and this
>does not appear to be happening in the dcgrib2_ocean.log.  As to whether
>pqact is falling behind as the cause, I would have used the -USR2 signal
>to put ldmd.log into verbose mode, but these 'terminated by signal 11'
>messages are so sporadic that I could wait a couple of hours for another
>one, meanwhile, the log is filling up.  However, I may have to do it
>eventually.  Yesterday I did put the log into verbose mode to diagnose
>another problem and these were the 'delay' messages that ocurred:
>
>Aug 11 18:20:41 vortex pqact[6831]: Delay: 0.0225 sec
>Aug 11 18:20:41 vortex pqact[6831]: Delay: 0.0383 sec
>
>Admittedly, I was not getting the signal 11 messages for the ocean files
>at the time, so it may not tell us much.
>
>I hope I've given you enough information so you can make some suggestions.
>Thank you.
>
>Tom
>-----------------------------------------------------------------------------
>Tom McDermott                          Email: address@hidden
>Systems Administrator                  Phone: (585) 395-5718
>Earth Sciences Dept.                   Fax: (585) 395-2416
>SUNY College at Brockport
>
>
--
**************************************************************************** <
Unidata User Support                                    UCAR Unidata Program <
(303)497-8643                                                  P.O. Box 3000 <
address@hidden                                   Boulder, CO 80307 <
---------------------------------------------------------------------------- <
Unidata WWW Service              http://my.unidata.ucar.edu/content/support  <
---------------------------------------------------------------------------- <
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publically available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.