[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 19991108: I need some assistance



Karli,

The log messages are still saying that "Que corrupt:".  I would go to the
data directory and delete the ldm.pq file.  Somehow the file might not be
delete.  Also, the data directory needs to be on a local disk drive. If
you are still having a problem, can I get a login to the machine?

Robb...



On Sun, 28 Nov 1999, McIDAS wrote:

> The machine's clock shows the correct time.  This is an Octane Machine
> with 128MB Ram running IRIX 6.4 and has two partitions with 1.3GB and
> 5.0GB free respectively.  It is running McIDAS 7.5, ldm-5.0.5 and
> ldm-mcidas-7.1.1 (or 7.1.3 if it had been configured correctly).
> 
> After commenting out the line 'exec "pqact"' I got this output:
> -----------------------------------------------------------------------
> ldm@breeze 45% alias sverb   "bin/rpc.ldmd -vl -
> etc/ldmd.conf"                               
> ldm@breeze 46% sverb
> Nov 28 19:01:34 rpc.ldmd[5644]: Starting Up (built: Aug 22 1997
> 12:07:40)
> Nov 28 19:01:34 aqua[5646]: run_requester: Starting Up:
> aqua.atmos.uah.edu
> Nov 28 19:01:34 striker[5593]: run_requester: Starting Up:
> striker.atmos.albany.
> edu
> Nov 28 19:01:35 udp.ldmd[5647]: Starting Up
> Nov 28 19:01:59 aqua[5646]: lastmatch:
> c9896c74abb279dea769a3a091a1b891    44766
>  19991128180616.338  MCIDAS 000  LWTOA3 205 DIALPROD=U3 99332 180612
> Nov 28 19:01:59 aqua[5646]: run_requester: 19991128180616.338 TS_ENDT
> {{FSL2|MCI
> DAS,  ".*"}}
> Nov 28 19:01:59 striker[5593]: lastmatch:
> b9144b67fb6c5fd60c2fb5938b418cef
>  84 19991128185500.141    NLDN 000  99332184853
> Nov 28 19:01:59 striker[5593]: run_requester: 19991128185500.141 TS_ENDT
> {{NLDN,
>   ".*"}}
> Nov 28 19:01:59 striker[5593]: FEEDME(striker.atmos.albany.edu): OK
> Nov 28 19:01:59 aqua[5646]: FEEDME(aqua.atmos.uah.edu): reclass:
> 19991128180616.
> 338 TS_ENDT {{MCIDAS,  ".*"}}
> Nov 28 19:01:59 striker[5593]: hereis: dup:       84
> 19991128185500.141    NLDN000  99332184853
> Nov 28 19:01:59 aqua[5646]: FEEDME(aqua.atmos.uah.edu): OK
> Nov 28 19:02:00 striker[5593]: Que corrupt: ftbl
> Nov 28 19:02:00 striker[5593]:       84 19991128190100.696    NLDN 000 
> 99332185
> 459
> Nov 28 19:02:00 aqua[5646]: dup    :    44766 19991128180616.338  MCIDAS
> 000  LW
> TOA3 205 DIALPROD=U3 99332 180612
> Nov 28 19:02:03 aqua[5646]:   189889 19991128181050.145  MCIDAS 000 
> LWTOA3 193DIALPROD=U1 99332 181048
> Nov 28 19:02:03 aqua[5646]: assertion "rp->prev == OFF_NONE" failed:
> file "pq.c"
> , line 678
> Nov 28 19:02:05 rpc.ldmd[5644]: child 5648 terminated by signal 6
> Nov 28 19:02:05 rpc.ldmd[5644]: Killing (SIGINT) process group
> Nov 28 19:02:05 rpc.ldmd[5644]: Interrupt
> Nov 28 19:02:05 rpc.ldmd[5644]: Exiting
> Nov 28 19:02:05 striker[5593]: Interrupt
> Nov 28 19:02:05 striker[5593]: Exiting
> Nov 28 19:02:05 udp.ldmd[5647]: Interrupt
> Nov 28 19:02:05 udp.ldmd[5647]: Exiting
> Nov 28 19:02:06 rpc.ldmd[5644]: Terminating process group
> Nov 28 19:02:29 rpc.ldmd[5644]: child 5646 terminated by signal 6
> Nov 28 19:02:29 rpc.ldmd[5644]: Killing (SIGINT) process group
> -----------------------------------------------------------------------
> after eliminating all requests I still had the same problem:
> -----------------------------------------------------------------------
> 
> ldm@breeze 56% !s
> sverb
> Nov 28 19:31:04 rpc.ldmd[5454]: Starting Up (built: Aug 22 1997
> 12:07:40)
> Nov 28 19:31:05 udp.ldmd[5767]: Starting Up
> Nov 28 19:31:45 rpc.ldmd[5454]: child 5717 terminated by signal 6
> Nov 28 19:31:45 rpc.ldmd[5454]: Killing (SIGINT) process group
> Nov 28 19:31:45 rpc.ldmd[5454]: Interrupt
> Nov 28 19:31:45 rpc.ldmd[5454]: Exiting
> Nov 28 19:31:45 udp.ldmd[5767]: Interrupt
> Nov 28 19:31:45 udp.ldmd[5767]: Exiting
> Nov 28 19:31:45 rpc.ldmd[5454]: Terminating process group
> Nov 28 19:31:45 rpc.ldmd[5454]: child 5791 terminated by signal
> 15              
> -----------------------------------------------------------------------
> and this is what top was showing me while I ran LDM without any
> requests:
> -----------------------------------------------------------------------
> IRIX64 breeze 6.4 02121744 IP30 Load[0.00,0.07,0.08] 20:08:28   50 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>    karli  6011  6011   0.37    0   20   115    75    0:00 
> top             
>     root  1038  1038   0.06    *   20   140    60   10:30 
> mediad          
>     root  1117  1112   0.03    *   20  1078    34    8:25 
> clogin          
>     root  1102  1102   0.03    *   20   879    96    6:49 
> Xsgi            
>     root  5708   171   0.01    *   20   111    53    0:00 
> telnetd         
>     root   261   261   0.01    *   +0   121   121    2:36 
> xntpd           
>     root  1039   171   0.01    *   20   120    45    0:50 
> fam             
> 
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.13,0.09,0.09] 20:08:33   60 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>      ldm  5997  5950  11.54    *   20  6353   780    0:00 
> pqexpire        
>      ldm  6014  5950   3.67    *   20   287    61    0:00 
> dmmisc.k        
>      ldm  6021  5950   3.48    *   20   283    59    0:00 
> dmsyn.k         
>      ldm  6022  5950   3.36    *   20   271    58    0:00 
> dmraob.k        
>    karli  6011  6011   3.20    0   20   116    76    0:00 
> top             
>      ldm  6030  5950   2.11    *   20   279    59    0:00 
> dmsfc.k         
>     root  5708   171   0.12    *   20   111    53    0:00 
> telnetd         
>     root  1102  1102   0.06    *   20   879    96    6:49 
> Xsgi            
>     root  1117  1112   0.06    *   20  1078    34    8:25 
> clogin          
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.13,0.09,0.09] 20:08:35   60 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>      ldm  5997  5950   7.84    *   20  6353  1212    0:00 
> pqexpire        
>    karli  6011  6011   1.11    0   20   116    76    0:00 
> top             
>     root  1117  1112   0.07    *   20  1078    34    8:25 
> clogin          
>     root  5708   171   0.07    *   20   111    53    0:00 
> telnetd         
>     root  1102  1102   0.05    *   20   879    96    6:49 
> Xsgi            
>     root   261   261   0.02    *   +0   121   121    2:36 
> xntpd           
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.18,0.11,0.10] 20:08:38   60 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>      ldm  5997  5950  10.93    *   20  6353  2192    0:00 
> pqexpire        
>    karli  6011  6011   1.20    0   20   116    76    0:00 
> top             
>     root  1117  1112   0.07    *   20  1078    34    8:25 
> clogin          
>     root  1102  1102   0.06    *   20   879    96    6:49 
> Xsgi            
>     root  5708   171   0.03    *   20   111    53    0:00 
> telnetd         
>      ldm  6005  5950   0.02    *   20  6364    30    0:00 
> rpc.ldmd        
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.18,0.11,0.10] 20:08:39   60 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>      ldm  5997  5950   4.14    *   20  6353  2338    0:00 
> pqexpire        
>    karli  6011  6011   0.97    0   20   116    76    0:00 
> top             
>     root  1038  1038   0.15    *   20   140    60   10:30 
> mediad          
>     root  5708   171   0.06    *   20   111    53    0:00 
> telnetd         
>     root  1117  1112   0.06    *   20  1078    34    8:25 
> clogin          
>     root  1102  1102   0.05    *   20   879    96    6:49 
> Xsgi            
>     root  1039   171   0.03    *   20   120    45    0:50 
> fam             
>     root   261   261   0.02    *   +0   121   121    2:36 
> xntpd           
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.24,0.12,0.10] 20:08:43   60 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>    karli  6011  6011   1.76    0   20   116    76    0:00 
> top             
>     root  1117  1112   0.05    *   20  1078    34    8:25 
> clogin          
>     root  5708   171   0.05    *   20   111    53    0:00 
> telnetd         
>     root  1102  1102   0.04    *   20   879    96    6:49 
> Xsgi            
>      ldm  5950  5950   0.02    *   20  6363    31    0:00 
> rpc.ldmd        
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.29,0.13,0.10] 20:08:46   60 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>      ldm  5997  5950   6.30    *   20  6353  4050    0:01 
> pqexpire        
>    karli  6011  6011   0.95    0   20   116    76    0:00 
> top             
>     root   816   816   0.74    *   20   120    58    2:10 
> sendmail        
>     root  1117  1112   0.06    *   20  1078    34    8:25 
> clogin          
>     root  5708   171   0.05    *   20   111    53    0:00 
> telnetd         
>     root  1102  1102   0.05    *   20   879    96    6:49 
> Xsgi            
>     root   261   261   0.02    *   +0   121   121    2:36 
> xntpd           
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.33,0.14,0.11] 20:08:50   60 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>      ldm  5997  5950  10.59    *   20  6353  4457    0:01 
> pqexpire        
>    karli  6011  6011   0.75    0   20   116    70    0:00 
> top             
>     root  1117  1112   0.07    *   20  1078    23    8:25 
> clogin          
>     root  5708   171   0.06    *   20   111    51    0:00 
> telnetd         
>     root  1102  1102   0.06    *   20   879    88    6:49 
> Xsgi            
>      ldm  5950  5950   0.02    *   20  6363    23    0:00 
> rpc.ldmd        
>     root   261   261   0.01    *   +0   121   121    2:36 
> xntpd           
>      ldm  6005  5950   0.01    *   20  6364    23    0:00 
> rpc.ldmd        
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.37,0.15,0.11] 20:08:54   60 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>      ldm  5997  5950  10.17    *   20  6353  4860    0:01 
> pqexpire        
>    karli  6011  6011   0.91    0   20   116    70    0:00 
> top             
>     root  1038  1038   0.32    *   20   140    54   10:30 
> mediad          
>     root  1102  1102   0.13    *   20   879    85    6:49 
> Xsgi            
>     root   261   261   0.11    *   +0   121   121    2:36 
> xntpd           
>     root  1117  1112   0.11    *   20  1078    21    8:25 
> clogin          
>     root  5708   171   0.07    *   20   111    50    0:00 
> telnetd         
> 
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.37,0.15,0.11] 20:08:56   53 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>    karli  6011  6011   0.88    0   20   116    70    0:00 
> top             
>      ldm  5950  5950   0.24    *   20  6363    38    0:00 
> rpc.ldmd        
>     root  1117  1112   0.14    *   20  1078    32    8:25 
> clogin          
>     root  1102  1102   0.13    *   20   879    91    6:49 
> Xsgi            
>      ldm  6024  5950   0.09    *   20   222    40    0:00 
> startxcd.k      
>     root  5437   171   0.09    *   20   111    49    0:00 
> telnetd         
>     root  5708   171   0.05    *   20   111    50    0:00 
> telnetd         
>      ldm  6005  5950   0.04    *   20  6364    35    0:00 
> rpc.ldmd        
>     root    77     0   0.02    *   20    96    43    0:02 
> syslogd         
>     root   165     0   0.02    *   20    92    42    0:07 
> portmap         
>     root   261   261   0.02    *   +0   121   121    2:36 
> xntpd           
> 
> IRIX64 breeze 6.4 02121744 IP30 Load[0.87,0.26,0.15] 20:08:57   50 procs
>     user   pid  pgrp   %cpu proc  pri  size   rss    time 
> command        
>    karli  6011  6011   0.96    0   20   116    70    0:00 
> top             
>     root     1     0   0.59    *   20    26    18    0:27 
> init            
>     root  1038  1038   0.15    *   20   140    60   10:30 
> mediad          
>     root    77     0   0.13    *   20    96    50    0:02 
> syslogd         
>     root   165     0   0.12    *   20    92    46    0:07 
> portmap         
>     root  1039   171   0.11    *   20   120    45    0:50 
> fam             
>     root  5708   171   0.06    *   20   111    50    0:00 
> telnetd         
>     root  5437   171   0.06    *   20   111    49    0:00 
> telnetd         
>      ldm  5463  5463   0.05    *   20    36    16    0:00 
> csh             
>     root  1117  1112   0.05    *   20  1078    32    8:25 
> clogin          
>     root  1102  1102   0.04    *   20   879    91    6:49 
> Xsgi            
>     root   261   261   0.01    *   +0   121   121    2:36 
> xntpd           
> -----------------------------------------------------------------------
> Karli Lopez
> 
> 
> Robb Kambic wrote:
> > 
> > On Mon, 22 Nov 1999, McIDAS wrote:
> > 
> > > Rob,
> > > thanks for the tip.  Executing the command yielded some pretty
> > > interesting output:
> > >
> > > ---------------------------------------------------------------------
> > > ldm@breeze 1% bin/rpc.ldmd -vl - etc/ldmd.conf
> > > Nov 22 19:59:03 rpc.ldmd[21390]: Starting Up (built: Aug 22 1997
> > > 12:07:40)
> > > Nov 22 19:59:03 aqua[21329]: run_requester: Starting Up:
> > > aqua.atmos.uah.edu
> > > Nov 22 19:59:03 striker[21395]: run_requester: Starting Up:
> > > striker.atmos.albany.edu
> > > Nov 22 19:59:04 udp.ldmd[21382]: Starting Up
> > > Nov 22 19:59:30 aqua[21329]: pq_sequence: xdr_prod_info() failed
> > > Nov 22 19:59:30 striker[21395]: pq_sequence: xdr_prod_info() failed
> > > Nov 22 19:59:30 aqua[21329]: pq_last: seq:I/O error (errno = 5)
> > > Nov 22 19:59:30 aqua[21329]: run_requester: 19991122185903.945 TS_ENDT
> > > {{UNIDATA,  ".*"},{FSL2|MCIDAS,  ".*"}}
> > > Nov 22 19:59:30 striker[21395]: pq_last: seq:I/O error (errno = 5)
> > > Nov 22 19:59:30 striker[21395]: run_requester: 19991122185903.951
> > > TS_ENDT {{NLDN,  ".*"}}
> > 
> > Karla,
> > 
> > The first thing to check is that your machine time is correct. Also,
> > comment out the "exec pqact ...." line in your ldmd.conf file.  I would
> > also comment out the other request lines in the ldmd.conf until it runs
> > correctly.  What type of machine is this?  What's the output of top?
> > 
> > Robb...
> > 
> >  > Nov 22 19:59:36 rpc.ldmd[21390]: child 21416 terminated by
> > signal 6 > Nov 22 19:59:36 rpc.ldmd[21390]: Killing (SIGINT) process group
> > > Nov 22 19:59:36 rpc.ldmd[21390]: Interrupt
> > > Nov 22 19:59:36 rpc.ldmd[21390]: Exiting
> > > Nov 22 19:59:36 striker[21395]: Interrupt
> > > Nov 22 19:59:36 striker[21395]: Exiting
> > > Nov 22 19:59:36 aqua[21329]: Interrupt
> > > Nov 22 19:59:36 aqua[21329]: Exiting
> > > Nov 22 19:59:36 udp.ldmd[21382]: Interrupt
> > > Nov 22 19:59:36 udp.ldmd[21382]: Exiting
> > > Nov 22 19:59:36 rpc.ldmd[21390]: Terminating process group
> > > ldm@breeze 2%
> > >
> > > ---------------------------------------------------------------------
> > > I got this output in less that a minute.  My guess is that the data
> > > stream is failing (but this wouldn't cause it to die) or something is
> > > externally killng it.
> > > Karli
> > >
> > > Robb Kambic wrote:
> > > >
> > > > Karli,
> > > >
> > > > Run the ldm from  home on the command line with the messages to the
> > > > screen, ie.
> > > >
> > > > % bin/rpc.ldmd -vl - etc/ldmd.conf
> > > >
> > > > This should give us a clue of the problem.
> > > >
> > > > Robb...
> 
> ===============================================================================
> > Robb Kambic                                Unidata Program Center
> > Software Engineer III                      Univ. Corp for Atmospheric 
> > Research
> > address@hidden                   WWW: http://www.unidata.ucar.edu/
> > ===============================================================================
> 
> -- 
> 
> ====================================================================
> Amos Winter                                  address@hidden
> Director
> Puerto Rico Climatology Center 
> P.O. Box 9013                           
> Department of Marine Sciences                  phone: (787) 265-5416    
> University of Puerto Rico - Mayaguez             fax: (787) 265-2195
> Mayaguez, PR 00681-9013
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================