[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030604: problem with new LDM: rtstats & SIGSEGV



David,

> To: "Unidata Support (address@hidden)" <address@hidden>
> cc: David Fitzgerald <address@hidden>
> From: David Fitzgerald <address@hidden>
> Subject: problems with new LDM server
> Organization: Milersville University

The above message contained the following:

> I have just installed the most recent LDM version on a new Linux
> server.  I am replacing my Sun Ultra-5 as my LDM server and today was
> supposed to be the "big switch".  I had my new server in test mode
> feeding everything from my old machine just fine.  After changing the
> hostname/IP on the new machine and the ldmd.conf file to point to my
> upstream sites, the LDM starts up, then dies.  I get a ton of
> pmap_unset(LDMPROG 300029, LDMVERS 5) failed messages in ldmd.log.  I
> can connect to all of my upstream sites via notifyme and traceroute
> The hep archives say, I may have a poor connection to my upstream
> sites, but I don't believe it.  My new LDM pukes even pulling down one
> product from one site, plus my old,slow Ultra-5 doesn't have any
> problems.
> 
> I have attached my ldmd.log file, could you help?
> 
> Thanks!!
> Dave
> *********************************************
> David Fitzgerald
> Distributed System Specialist II
> Millersville University
> Millersville PA 17551
> Phone: 717-871-2394
> Fax:     717-871-4725
> E-mail: address@hidden
> <mailto:address@hidden> 

The rtstats(1) utility that is executed by an EXEC entry in
the LDM configuration file (ldmd.conf) is terminating due to a
segmentation-violation signal:

...
> Jun 04 12:56:01 twister rtstats[2405]: Starting Up (2401)
...
> Jun 04 12:56:09 twister rpc.ldmd[2401]: child 2405 terminated by signal 11
> Jun 04 12:56:09 twister rpc.ldmd[2401]: Killing (SIGINT) process group
...

This causes the top-level LDM process to terminate the entire LDM
process-group.

We've never seen rtstats(1) do this before.  Did it create a core-file?
The directory in which to look is the LDM home-directory (see the
"ldmhome" value in the output of the command "ldmadmin config").  If a
core-file was created, then would you please send me its stack dump.

I understand from Tom that you fixed a problem in /etc/hosts and that
this has resulted in correct behavior of the LDM system.  This is good.
I'd really like to know what the problem is in rtstats(1), however.  If
you can help, then that would also be good.

Regards,
Steve Emmerson