[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: troubles stopping ldm with ldmadmin on linux (fwd)




===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================

---------- Forwarded message ----------
Date: Thu, 18 May 2000 12:10:17 -0600
From: Doug Hunt <address@hidden>
To: address@hidden
     address@hidden, address@hidden
Subject: Re: troubles stopping ldm with ldmadmin on linux

Gerry, all:  I suspect that is why my patched ldmadmin does the shutdown
quicker--it kills all children and the parent at the same time, not just
killing the parent and letting the kids clean up.  There may be some
negative side effects of this approach, such as unflushed data.

Regards, 

  Doug Hunt

Gerry Creager wrote:
> 
> There have been a couple of similar discussions about this in my
> department here of late.
> 
> Seems that, when you kill a parent process in Linux the forked children
> die gracefully, which could take time.   In Solaris, (_now!_) the forked
> children die immediately, while... in previous incarnations of Solaris,
> the processes could well have been left zombies and never died nor
> responded appropriately to failed IO.
> 
> "D. J. Raymond" wrote:
> >
> > I also see this on my Debian Linux box.  I suspect that the processes
> >  won't die until some crucial I/O is done -- probably a good thing!
> >  Why it takes longer on Linux than on Solaris, I do not know.  Maybe
> >  it is a matter of writing stuff from virtual memory to disk, which
> >  could take a long time if, say, 50 MB or so of the product queue were
> >  memory mapped.  The time to die is quite variable -- if the ldm has
> >  just been started, the processes die quickly, but if it has been
> >  running for a while, they take longer.
> 
> >    From: Jeff Masters <address@hidden>
> >
> >    I have the same trouble on my Slackware linux boxes, my pqexpire and some
> >    of my rpc.ldmd processes don't die for many minutes.  I find that even a
> >    manual kill -9 to the slow-to-die rpc.ldmd and pqexpire jobs will not 
> > make
> >    them exit, you just have to wait up to 8 minutes for them to die. On our
> >    Solaris box, everything exits immediately.
> 
> >    On Wed, 17 May 2000, Devin Kramer wrote:
> >
> >    > Doug,
> >    >
> >    > Although a minute or so is a bit long I don't think it is strange to 
> > see
> >    > some of the rpc.ldmd children hanging on after a stop.  I believe b/c
> >    > some of these  are UDP connections they will not necessarily die
> >    > instantly.  I could have the TCP and UDP thing backwards but I could
> >    > swear that one or the other suffers from this.   I see it on our 
> > Solaris
> >    > 2.6 box quit often. Our wait time is more like 10-15 seconds but we
> >    > often find we need to wait a bit between ldmadmin stop's and ldmadmin
> >    > start's  or it will just hang.  If as you say it is more like minutes
> >    > then maybe there is some other issue.
> 
> Gerry
> --
> Gerry Creager                        |      Never ascribe to Malice that
> Computer Science Department          |      which can adequately be
> Texas A&M University                 |      explained by Stupidity.
> 979.458.4020  (Phone)                |      -- Lazerus Long
> 979.847.8578  (Fax)

-- 
address@hidden
Software Engineer III
UCAR - COSMIC
Tel. (303) 497-2611