[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000509: LDM dumping core (fwd)



"Jason J. Levit" wrote:

>   Hi Anne,
>
>   After trying what you suggested above, I watched the output of LDM to
> determine the cause of our crashes...here's the last several lines from
> the output:
>
> Jun 06 15:14:59 stokes[1513477]: hereis:
> b2aad94bae282d3361a6cc196fa79470     3656 20000606150101.154 IDS|DDPLUS
> 534  ASUS43 KMKE 061500 /pSWRWI
>         SIGCONT
>         End of Queue
> Jun 06 15:14:59 stokes[1513477]: Growing data by 5275648
>         Mapping 197509120
> Jun 06 15:15:05 172.31.10.18[1524962]: FEEDME(172.31.10.18):
> h_clnt_create(172.31.10.18): Timed out while creating connection
>         SIGALRM
>         FEEDME(172.31.10.18): h_clnt_create(172.31.10.18): Timed out
> while creating connection
> Jun 06 15:16:13 172.24.10.66[1505970]: Growing data by 5275648
>         Mapping 197509120
>         SIGCHLD
> Jun 06 15:16:19 rpc.ldmd[1522902]: child 1513477 terminated by signal 4
> Jun 06 15:16:20 rpc.ldmd[1522902]: Killing (SIGINT) process group
> Jun 06 15:16:20 rpc.ldmd[1522902]: Interrupt
> Jun 06 15:16:20 rpc.ldmd[1522902]: Exiting
> Jun 06 15:16:20 172.31.10.18[1524962]: Interrupt
> Jun 06 15:16:20 172.31.10.18[1524962]: Exiting
> Jun 06 15:16:20 172.31.10.10[1526057]: Interrupt
> Jun 06 15:16:20 172.31.10.10[1526057]: Exiting
> Jun 06 15:16:20 172.24.10.2[1526312]: Interrupt
> Jun 06 15:16:20 172.24.10.2[1526312]: Exiting
> Jun 06 15:16:20 orion(feed)[1522322]: Interrupt
> Jun 06 15:16:20 orion(feed)[1522322]: Exiting
>         SIGCHLD
> Jun 06 15:16:20 rpc.ldmd[1522902]: Terminating process group
>         child 1522322 exited with status 0
>         child 1524962 exited with status 0
>         child 1526057 exited with status 0
>         child 1526312 exited with status 0
>         SIGCHLD
>         child 1524678 exited with status 0
> Jun 06 15:17:00 172.24.240.2[1526472]: Growing data by 5275648
>         Mapping 197509120
>         SIGCHLD
> Jun 06 15:17:00 rpc.ldmd[1522902]: child 1505970 terminated by signal 4
> Jun 06 15:17:00 rpc.ldmd[1522902]: Killing (SIGINT) process group
> Jun 06 15:17:00 172.24.10.34[1481064]: Growing data by 5275648
>         Mapping 197509120
>         SIGCHLD
> Jun 06 15:17:00 rpc.ldmd[1522902]: child 1526472 terminated by signal 9
>         SIGCHLD
> Jun 06 15:18:03 rpc.ldmd[1522902]: child 1481064 terminated by signal 11
> Jun 06 15:18:03 rpc.ldmd[1522902]: Killing (SIGINT) process group
>         SIGCHLD
> Jun 06 15:18:48 rpc.ldmd[1522902]: child 1517117 terminated by signal 11
> Jun 06 15:18:48 rpc.ldmd[1522902]: Killing (SIGINT) process group
>
>  ---- I can't see any recognizable errors that would kill LDM.  Just
> because a connection can't be made (time out) shouldn't kill LDM,
> right?  From time to time one of our feeds goes down and it never caused
> a problem before.
>
>   Can you see anything in the above output that would suggest why LDM is
> suddenly crashing?  Thanks for any help!
>
>   Jason
>
> --
> ----------------------------------------------------------------------------
> Jason J. Levit, N9MLA                       Research Scientist,
> address@hidden                  Center for Analysis and Prediction of
> Storms
> Room 1014                                  University of Oklahoma
> 405/325-3503                              http://www.caps.ou.edu/

Hi Jason,

Were you able to isolate the problem to a particular entry in your pqact.conf 
file?  If so, please show me the
entry.  If not, please send me a copy of your pqact.conf file.

Also, how large is your queue?   Are you using the default queue size?  There 
can be a problem when the queue is
growing, as it is on your machine, and pqexpire is running at the same time.  
The solution to this problem is to
make the queue larger.   Give that a try and let me know the results.

Anne

--
***************************************************
Anne Wilson                     UCAR Unidata Program
address@hidden                  P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************