[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #VXR-367622]: ldm segfault



Hi Karen!

> I have an unusual situation here.  Something I've not seen before anyway...
> 
> I have 6 ldm servers right now.  Three sets of redundant pairs with
> different data on them.  Two are older hardware, while 4 are relatively
> new hardware.  All are running 6.8.1 on RedHat Linux 5 and all are
> running the queue out of RAM disk.  They've been running since October
> with no problems.
> 
> This past weekend I had a very unusual occurrence.  Two of the servers
> (with the newer hardware) that have duplicate feeds, both had rpc.ldmd
> segfault within 1 minute of each other.

Were the two servers also feeding from each other?

> All ldm processes exited and
> the queues were not zeroed out.  I ran ldmadmin clean, remade my queues
> and restarted and everything was good until Monday.  Monday morning one
> of the servers segfaulted, and the other followed suit, but not until
> several hours later.   Surprisingly I didn't get any core dumps,
> although I'm not entirely sure why at this time.
> 
> These systems don't run a pqact, and with the exception of a monitoring
> program (hobbit--which informed me that rpc.ldmd had stopped!)
> everything running on them is stock.  Basically ldm is their entire job.
> 
> Not sure if it's important, but as for data feeds, they get a lot of
> different feed types, including NEXRAD2, NNEXRAD, FSL2, FSL4, FSL5, WMO,
> DDS, HDS, IDS and some EXP (mesonet and refractivity).

I think we get all those feeds here (except for some of the EXP, probably) and 
nothing happened. I haven't heard of any other LDM-s segfaulting recently.

> I'm just wondering if you guys have any ideas or insight?

Besides the LDM setups, what else do the two systems have in common? OS? 
Version? Hardware?

> -------------------------------------------
> There are 2 kinds of people in the world:
> 
> 1) Those who can extrapolate from incomplete data.
> 
> -------------------------------------------
> address@hidden
> 
> Phone:  405-325-6982
> Cell: 405-834-8559
> INDUS Corporation
> National Severe Storms Laboratory

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: VXR-367622
Department: Support LDM
Priority: Normal
Status: Closed