[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

19990119: LDM crash problem



Kathy,

The product queue is corrupt. You will need to delete the curent queue
with "ldmadmin delqueue" and create a new queue with "ldmadmin mkqueue"
before restarting the LDM.

When the machine goes down unexpectedly and the LDM is still writing
data to the queue, the ldm.pq can get corrupted. Since data is usually
being written to the queue almost continuously, rebooting without
shutting down the LDM is risky. The LDM guide shows an example of
an LDM boot time start script with the stop kill action which will 
have the system shut down the LDM. If the system is not shutdown
gracefully, then the queue corruption can still occur however.

Steve Chiswell
Unidata User Support


>From: address@hidden (Kathy Fryberger)
>Organization: .
>Keywords: 199901191754.KAA18822

>A non-authorized person this last weekend decided to boot our LDM server 
>machine.  When I tried to restart LDM, I receive the following error 
>messages in the ldm log file:
>  
>
>Jan 19 17:46:42 5Q:squall rpc.ldmd[4600]: Starting Up (built: Dec 15 1997 11:1
> 3:56)
>Jan 19 17:46:42 5Q:squall chinook[4605]: run_requester: Starting Up: chinook.u
> nl.edu
>Jan 19 17:46:43 5Q:squall pqexpire[4601]: Starting Up
>Jan 19 17:46:43 3Q:squall pqexpire[4601]: assertion "status != NULL" failed: f
> ile "pq.c", line 3992
>Jan 19 17:46:43 5Q:squall pqexpire[4601]: Exiting
>Jan 19 17:46:43 5Q:squall pqexpire[4601]: > Up since:      19990119174643.143
>Jan 19 17:46:43 5Q:squall pqexpire[4601]: > Queue usage (bytes):121663488
>Jan 19 17:46:43 5Q:squall pqexpire[4601]: >          (nregions):   17049
>Jan 19 17:46:43 5Q:squall pqexpire[4601]: > nprods deleted 0
>Jan 19 17:46:43 5Q:squall pqact[4604]: Starting Up
>Jan 19 17:46:43 3Q:squall chinook[4605]: Que corrupt: tqe_find:tq: TS_NONE -1
>Jan 19 17:46:43 3Q:squall last message repeated 4 times
>Jan 19 17:46:43 3Q:squall chinook[4605]: Que corrupt: tq: 19990116152200.996 n
> o data at 49562208
>Jan 19 17:46:43 3Q:squall chinook[4605]: Que corrupt: tqe_find:tq: TS_NONE -1
>Jan 19 17:46:43 3Q:squall last message repeated 2 times
>Jan 19 17:46:43 3Q:squall chinook[4605]: assertion "tvp->tv_sec >= TS_ZERO.tv_
> sec && tvp->tv_usec >= TS_ZERO.tv_usec && tvp->tv_sec <= TS_ENDT.tv_sec && tv
> p->tv_usec <= TS_ENDT.tv_usec" failed: file "pq.c", line 3659
>Jan 19 17:46:43 5Q:squall chinook[4605]: Exiting
>Jan 19 17:46:43 3Q:squall pqact[4604]: Que corrupt: tqe_find:tq: TS_NONE -1
>Jan 19 17:46:43 3Q:squall last message repeated 4 times
>Jan 19 17:46:43 3Q:squall pqact[4604]: Que corrupt: tq: 19990116152200.996 no 
> data at 49562208
>Jan 19 17:46:43 3Q:squall pqact[4604]: Que corrupt: tqe_find:tq: TS_NONE -1
>Jan 19 17:46:43 3Q:squall last message repeated 2 times
>Jan 19 17:46:43 3Q:squall pqact[4604]: assertion "tvp->tv_sec >= TS_ZERO.tv_se
> c && tvp->tv_usec >= TS_ZERO.tv_usec && tvp->tv_sec <= TS_ENDT.tv_sec && tvp-
> >tv_usec <= TS_ENDT.tv_usec" failed: file "pq.c", line 3659
>Jan 19 17:46:43 5Q:squall pqact[4604]: Exiting
>Jan 19 17:46:43 5Q:squall pqbinstats[4603]: Starting Up (4600)
>Jan 19 17:46:43 3Q:squall pqbinstats[4603]: Que corrupt: tqe_find:tq: TS_NONE 
> -1
>Jan 19 17:46:43 3Q:squall last message repeated 4 times
>Jan 19 17:46:43 3Q:squall pqbinstats[4603]: Que corrupt: tq: 19990116152200.99
> 6 no data at 49562208
>Jan 19 17:46:43 3Q:squall pqbinstats[4603]: Que corrupt: tqe_find:tq: TS_NONE 
> -1
>Jan 19 17:46:43 3Q:squall last message repeated 2 times
>Jan 19 17:46:43 3Q:squall pqbinstats[4603]: assertion "tvp->tv_sec >= TS_ZERO.
> tv_sec && tvp->tv_usec >= TS_ZERO.tv_usec && tvp->tv_sec <= TS_ENDT.tv_sec &&
>  tvp->tv_usec <= TS_ENDT.tv_usec" failed: file "pq.c", line 3659
>Jan 19 17:46:43 5Q:squall pqbinstats[4603]: Exiting
>Jan 19 17:46:45 5Q:squall udp.ldmd[4606]: Starting Up
>Jan 19 17:46:45 5Q:squall localhost[4612]: Connection from localhost
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: child 4605 terminated by signal 6
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: Killing (SIGINT) process group
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: Interrupt
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: Exiting
>Jan 19 17:46:45 5Q:squall udp.ldmd[4606]: Interrupt
>Jan 19 17:46:45 5Q:squall udp.ldmd[4606]: Exiting
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: Terminating process group
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: child 4603 terminated by signal 6
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: Killing (SIGINT) process group
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: child 4601 terminated by signal 6
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: Killing (SIGINT) process group
>Jan 19 17:46:45 5Q:squall rpc.ldmd[4600]: child 4602 terminated by signal 2
>Jan 19 17:46:45 5Q:squall localhost[4612]: Interrupt
>Jan 19 17:46:45 5Q:squall localhost[4612]: Exiting
>Jan 19 17:47:14 5Q:squall rpc.ldmd[4600]: child 4604 terminated by signal 6
>Jan 19 17:47:14 5Q:squall rpc.ldmd[4600]: Killing (SIGINT) process group
>
>What can I do to fix this problem and get LDM back up and running?
>  thanks!   kathy fryberger    address@hidden  605-399-1528
>             South Dakota School of Mines, etc.
>**********************************************************
>address@hidden  kathy fryberger  605-394-2291
>**********************************************************
>