[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 19990510: Any known problems with ldm-5.0.5 and irix 6.5?



Pete,

I would take some of the options out:


-make your queue size 100MB
-reboot the machine.

If not successful then start eliminating sites your feeding, pqact,
pqbinstats, etc.  If you are still having problems running bare bones let
me know. Also, has anything changed recently with the machine? New OS,
patches, etc..  We are running Irix 6.5 with no problems. In fact thelma
is SGI with 6.5

Robb...



On Mon, 10 May 1999, Unidata Support wrote:

> >To: address@hidden
> >From: address@hidden (Pete Pokrandt)
> >Subject: Any known problems with ldm-5.0.5 and irix 6.5?
> >Organization: .
> >Keywords: 199905101718.LAA06627
> 
> 
> Hi,
> 
> I've recently been having some fairly regular crashes with our ldm
> running on an Irix 6.5 machine.
> 
> The problem as far as I can diagnose it is that the queue is getting
> corrupted somehow, and it affects both rpc.ldmd, and pqexpire.
> 
> I don't know if this is a symptom of some flakey hardware (though I'm
> suspect of this, since the ldm is the only program that is having
> problems, and no other memory/hardware errors are turning up in the log
> files) or if there is something strange about the program behaviour that
> we are running into.
> 
> Our queue is currently set at 80 Mb, perhaps that is too low, and
> the action of growing it is getting us in trouble?
> 
> Thanks for any info (or sympathy!) you can provide.
> 
> Pete Pokrandt
> 
> >From the ldmd logs, and from dbx'ing the programs and core files, I've
> been able to get the following info, if that helps at all to diagnose
> what's happening:
> 
> This was a crash resulting from pqexpire:
> 
> from ldmd.log:
> 
> May 09 20:25:07 3Q:sunset pqexpire[223950]: Que corrupt: pq_seqdel: 
> 19990509191245.397 no signature at 25009864
> May 09 20:25:08 3Q:sunset pqexpire[223950]: assertion "nr->prev == 
> rp->offset" failed: file "pq.c", line 703
> May 09 20:26:15 5Q:sunset 192.52.106.21[224372]: Connection reset by peer
> May 09 20:26:15 5Q:sunset thelma[202854]: Connection reset by peer
> May 09 20:26:15 5Q:sunset 192.52.106.21[224372]: Disconnect
> May 09 20:26:15 5Q:sunset thelma[202854]: Disconnect
> May 09 20:26:15 5Q:sunset unidata[224182]: Connection reset by peer
> May 09 20:26:15 5Q:sunset unidata[224182]: Disconnect
> May 09 20:26:21 5Q:sunset rpc.ldmd[226910]: child 223950 terminated by signal 
> 6
> May 09 20:26:21 5Q:sunset rpc.ldmd[226910]: Killing (SIGINT) process group
> May 09 20:26:21 5Q:sunset rpc.ldmd[226910]: Interrupt
> May 09 20:26:21 5Q:sunset cirrus(feed)[217703]: Interrupt
> May 09 20:26:21 5Q:sunset mcidas(feed)[215432]: Interrupt
> May 09 20:26:21 5Q:sunset mapmaker(feed)[222081]: Interrupt
> May 09 20:26:21 5Q:sunset cirrus(feed)[193599]: Interrupt
> May 09 20:26:21 5Q:sunset cirrus(feed)[223568]: Interrupt
> May 09 20:26:21 5Q:sunset findeisen(feed)[225984]: Interrupt
> May 09 20:26:21 5Q:sunset udp.ldmd[70892]: Interrupt
> May 09 20:26:21 5Q:sunset unidata[224182]: Interrupt
> May 09 20:26:21 5Q:sunset striker[225282]: Interrupt
> May 09 20:26:21 5Q:sunset 192.52.106.21[224372]: Interrupt
> May 09 20:26:21 5Q:sunset thelma[202854]: Interrupt
> May 09 20:26:21 5Q:sunset pqact[199571]: Interrupt
> May 09 20:26:21 5Q:sunset pqbinstats[224599]: Interrupt
> May 09 20:26:21 5Q:sunset rpc.ldmd[226910]: Exiting
> May 09 20:26:21 5Q:sunset cirrus(feed)[217703]: Exiting
> May 09 20:26:21 5Q:sunset mapmaker(feed)[222081]: Exiting
> May 09 20:26:21 5Q:sunset cirrus(feed)[193599]: Exiting
> May 09 20:26:21 5Q:sunset ProfHorn(feed)[203100]: Interrupt
> May 09 20:26:21 5Q:sunset cirrus(feed)[223568]: Exiting
> May 09 20:26:21 5Q:sunset findeisen(feed)[225984]: Exiting
> May 09 20:26:21 5Q:sunset udp.ldmd[70892]: Exiting
> May 09 20:26:21 5Q:sunset unidata[224182]: Exiting
> May 09 20:26:21 5Q:sunset striker[225282]: Exiting
> May 09 20:26:21 5Q:sunset 192.52.106.21[224372]: Exiting
> May 09 20:26:21 5Q:sunset thelma[202854]: Exiting
> May 09 20:26:21 5Q:sunset pqact[199571]: Exiting
> May 09 20:26:21 5Q:sunset ProfHorn(feed)[203100]: Exiting
> May 09 20:26:21 5Q:sunset mcidas(feed)[215432]: Exiting
> May 09 20:26:21 5Q:sunset pqbinstats[224599]: Exiting
> May 09 20:26:21 5Q:sunset rpc.ldmd[226910]: Terminating process group
> 
> 
> 
> sunset 2% ls -l core
> -rw-r--r--    1 ldm      user     127809192 May  9 15:26 core
> 
> sunset 3% file core
> core:           IRIX N32 core dump of 'pqexpire'
> 
> sunset 4% dbx ~/bin/pqexpire core
> dbx version 7.2.1 patch 2991 May 14 1998 17:09:10
> Core from signal SIGABRT: Abort (see abort(3c))
> (dbx) where
> >  0 _kill(0x36ace, 0x6, 0x0, 0x0, 0x0, 0x200000, 0x3, 0x200e70) 
> > ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/signal/kill.s":15, 
> > 0xfac1334]
>    1 _raise(0x36ace, 0x6, 0x0, 0x0, 0x0, 0x200000, 0x3, 0x200e70) 
> ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/signal/raise.c":27, 
> 0xfac1c28
> ]
>    2 abort(0x36ace, 0x6, 0x0, 0x0, 0x0, 0x200000, 0x3, 0x200e70) 
> ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/gen/abort.c":52, 
> 0xfa37038]
>    3 _uassert(ex = 0x10028880 = "nr->prev == rp->offset", file = 0x1002a880 = 
> "pq.c", line = 703) ["/usr/local/ldm/ldm-5.0.5/src/ulog/ulog.c":816,
> 0x100106f8]
>    4 bin_hashunlink(binp = 0x4000050, rlp = 0xb869000, rp = 0xb887718) 
> ["/usr/local/ldm/ldm-5.0.5/src/pq/pq.c":703, 0x10005804]
>    5 ftbl_hashunlink(htp = 0x4000040, rlp = 0xb869000, rp = 0xb887718) 
> ["/usr/local/ldm/ldm-5.0.5/src/pq/pq.c":720, 0x1000588c]
>    6 consolidate(htp = 0x4000040, rlp = 0xb869000, rp = 0xb887738) 
> ["/usr/local/ldm/ldm-5.0.5/src/pq/pq.c":786, 0x10005b2c]
>    7 ftbl_rp_delete(htp = 0x4000040, rlp = 0xb869000, rp = 0xb887738) 
> ["/usr/local/ldm/ldm-5.0.5/src/pq/pq.c":815, 0x10005c7c]
>    8 pq_seqdel(pq = 0x1002b448, mt = TV_GT=1, clss = 0x7fff2db0, wait = 0, 
> extentp = 0x7fff2df0, timestampp = 0x7fff2df8) ["/usr/local/ldm/ldm-5.0.
> 5/src/pq/pq.c":4071, 0x1000eb68]
>    9 main(ac = 1, av = 0x7fff2e74) 
> ["/usr/local/ldm/ldm-5.0.5/src/pqexpire/pqexpire.c":408, 0x10003f20]
>    10 __start() 
> ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/csu/crt1text.s":177, 
> 0x1000314c]
> (dbx)
> 
> 
> This was a crash 20 minutes ago of rpc.ldmd
> 
> >From ldmd.log:
> 
> 
> May 10 16:50:10 5Q:sunset rpc.ldmd[240353]: Terminating process group
> May 10 16:51:30 5Q:sunset pqexpire[243592]: Interrupt
> May 10 16:51:30 5Q:sunset pqexpire[243592]: Exiting
> May 10 16:51:30 3Q:sunset striker[238358]: Que corrupt: ftbl
> May 10 16:51:30 5Q:sunset striker[238358]: Growing data by 2813952
> May 10 16:51:31 5Q:sunset rpc.ldmd[240353]: child 231729 terminated by signal 
> 6
> May 10 16:51:31 5Q:sunset rpc.ldmd[240353]: Killing (SIGINT) process group
> May 10 16:51:31 5Q:sunset pqexpire[243592]: > Up since:      
> 19990510134708.877
> May 10 16:51:40 3Q:sunset striker[238358]: assertion "rp->offset + Extent(rp) 
> == offset" failed: file "pq.c", line 2854
> May 10 16:53:20 5Q:sunset thelma[94487]: Interrupt
> May 10 16:53:20 5Q:sunset thelma[94487]: Exiting
> May 10 16:53:20 5Q:sunset pqexpire[243592]: Interrupt
> May 10 16:53:27 5Q:sunset rpc.ldmd[240353]: child 238358 terminated by signal 
> 6
> May 10 16:53:27 5Q:sunset rpc.ldmd[240353]: Killing (SIGINT) process group
> 
> sunset 27% dbx ~/bin/rpc.ldmd core
> dbx version 7.2.1 patch 2991 May 14 1998 17:09:10
> Core from signal SIGABRT: Abort (see abort(3c))
> (dbx) where
> >  0 _kill(0x3a316, 0x6, 0x0, 0x0, 0x0, 0x200000, 0x2, 0x200e70) 
> > ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/signal/kill.s":15, 
> > 0xfac1334]
>    1 _raise(0x3a316, 0x6, 0x0, 0x0, 0x0, 0x200000, 0x2, 0x200e70) 
> ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/signal/raise.c":27, 
> 0xfac1c28
> ]
>    2 abort(0x3a316, 0x6, 0x0, 0x0, 0x0, 0x200000, 0x2, 0x200e70) 
> ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/gen/abort.c":52, 
> 0xfa37038]
>    3 _uassert(ex = 0x10046348 = "rp->offset + Extent(rp) == offset", file = 
> 0x1004b0e0 = "pq.c", line = 2854) ["/usr/local/ldm/ldm-5.0.5/src/ulog/u
> log.c":816, 0x1002516c]
>    4 data_grow(pq = 0x1004e4d0, extent = 13720, rgnpp = 0x7fff2454) 
> ["/usr/local/ldm/ldm-5.0.5/src/pq/pq.c":2854, 0x1001ef94]
>    5 rpqe_new(pq = 0x1004e4d0, extent = 13720, sxi = 0x1005da88 = 
> "\201^\vmW9k\r@\222\246.\260\003\335\252\020\005\022\330", vpp = 0x7fff24a4, 
> sxep
> p = 0x7fff24ac) ["/usr/local/ldm/ldm-5.0.5/src/pq/pq.c":3092, 0x1001f96c]
>    6 pqe_new(pq = 0x1004e4d0, infop = 0x1005da80, ptrp = 0x1004af20, indexp = 
> 0x1004a610) ["/usr/local/ldm/ldm-5.0.5/src/pq/pq.c":3380, 0x10020560]
>    7 comingsoon_5_svc(argsp = 0x7fff2560, rqstp = 0x7fff2b30) 
> ["/usr/local/ldm/ldm-5.0.5/src/server/svc.c":329, 0x10013834]
>    8 ldmprog_5(rqstp = 0x7fff2b30, transp = 0x10056a20) 
> ["/usr/local/ldm/ldm-5.0.5/src/protocol/ldm_svc.c":86, 0x100301ac]
>    9 _svc_getreqset(0x3a316, 0x6, 0x0, 0x7fff2970, 0x0, 0x200000, 0x2, 
> 0x200e70) ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/rpc/svc.c":399
> , 0xfabbc28]
>    10 one_svc_run(xp_sock = 3, inactive_timeo = 720, donep = 0x1004c264) 
> ["/usr/local/ldm/ldm-5.0.5/src/protocol/ldmclnt.c":242, 0x1001621c]
>    11 forn(proc = 4, remote = 0x100512b0 = "striker.atmos.albany.edu.", 
> clsspp = 0x1004c388, rpctimeo = 60, interval = 30, inactive_timeo = 720, do
> nep = 0x1004c264, dispatch = 0x1002ffac) 
> ["/usr/local/ldm/ldm-5.0.5/src/protocol/ldmclnt.c":479, 0x10016a28]
>    12 prog_requester(source = 0x100512b0 = "striker.atmos.albany.edu.", clssp 
> = 0x10052870) ["/usr/local/ldm/ldm-5.0.5/src/server/acl.c":658, 0x100
> 07990]
>    13 run_requester(source = 0x100512b0 = "striker.atmos.albany.edu.", clssp 
> = 0x10052870) ["/usr/local/ldm/ldm-5.0.5/src/server/acl.c":700, 0x1000
> 7afc]
>    14 new_requester(source = 0x1004f860 = "striker.atmos.albany.edu.", clssp 
> = 0x10052870) ["/usr/local/ldm/ldm-5.0.5/src/server/acl.c":726, 0x1000
> 7bfc]
>    15 requester_add(source = 0x1004f860 = "striker.atmos.albany.edu.", clssp 
> = 0x10052870) ["/usr/local/ldm/ldm-5.0.5/src/server/acl.c":740, 0x1000
> 7c6c]
>    16 invert_request_acl() ["/usr/local/ldm/ldm-5.0.5/src/server/acl.c":796, 
> 0x10007eb4]
>    17 read_conf(conf_path = 0x7fff3027 = "/usr/local/ldm/etc/ldmd.conf") 
> ["/usr/local/ldm/ldm-5.0.5/src/server/conf.y":180, 0x1000a314]
>    18 main(ac = 4, av = 0x7fff2e64) 
> ["/usr/local/ldm/ldm-5.0.5/src/server/ldmd.c":1035, 0x1000e77c]
>    19 __start() 
> ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/csu/crt1text.s":177, 
> 0x10005d4c]
> 
> 
> --
> +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
> ^ Pete Pokrandt                    V 1515  AOSS Bldg  1225 W Dayton St^
> ^ Systems Programmer               V Madison,         WI     53706    ^
> ^                                  V      address@hidden       ^
> ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (W)  222-0919 (H) ^
> ^ University of Wisconsin-Madison  V       262-0166 (Fax)262-3086 (VM)^
> +<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
> 

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================