[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: troubles stopping ldm with ldmadmin on linux (fwd)




===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================

---------- Forwarded message ----------
Date: Thu, 18 May 2000 09:28:02 -0400
From: James D. Marco <address@hidden>
To: Jim Koermer <address@hidden>
Subject: Re: troubles stopping ldm with ldmadmin on linux

Hi All,
        Yes, I agree and it is not restricted to LDM. This is correct 
behavior, from a computer science standpoint. Several data-logger
daemons I wrote do the same thing on HP, Sun, SGI, and Linux. These 
processes maintain open files over a long time monitoring/collecting 
'bursty' data, but, are not continuously used; they receive data spaced 
over a 'long time' in terms of CPU utilization...more than one second. I 
always assumed this was caused by a combination of:
                Large files (as you mention)
                The process-owned buffered file IO (streams)
                The operating system disk buffering
                The OS swap: swaps a process to disk if not memory locked
                The hardware caching - CPU cache, Disk cache, Controller cache.

The entire network of dependencies for all this is quite large.

One other item that occurs to me, the 'Garbage Collection' mechanism's
in modern OS's.  Large processes and process buffer utilization will 
leave large holes in real memory when killed. Assuming a threshold value 
for memory fragmentation, the OS will probably initiate a cleanup...
which can be expensive in terms of CPU time. 

Usually, all this happens within 10-15 seconds, but 60-120 seconds is 
probably not unusual for a well tuned system.

If delays are much longer, I would look elsewhere.  My first guess is
that the hard drive is badly fragmented, overloaded, or in need of 
reformatting (low level & file system.) File system organization can be
a problem. Locate swap space and large LDM Queues/Decoder outputs on 
separate drives, not just partitions on the same physical drive. Increase
the amount of RAM. More....
                                                        jdm     
At 09:47 PM 5/17/00 -0400, you wrote:
>Doug,
>
>Unless this is something unique to linux, I'm not sure that this
>behavior is all that unusual. I've noticed it for quite some time on
>FreeBSD and AIX systems running LDM. Usually this occurs during the
>ingestion of a large McIDAS area file that may take some time to
>download completely. I assume that this could also happen with some of
>the larger grib files. You can check this by looking at the file sizes
>after doing the "ldmadmin stop". After the file in question completely
>downloads, the associated rpc.ldmd process will end. I've noticed that
>if a large (~25MB) file just started downloading after the stop, it can
>take several minutes for it to complete.
>
>--
>James P. Koermer             E-Mail: address@hidden
>Professor of Meteorology     Office Phone: (603)535-2574
>Natural Science Department   Office Fax: (603)535-2723
>Plymouth State College       WWW: http://vortex.plymouth.edu/
>Plymouth, NH 03264
>
>
>Doug Hunt wrote:
>> 
>> Hi all:  I have recently been having troubles stopping LDM via 'ldmadmin
>> stop' on linux.  The ldmadmin script seems to not check correctly if all
>> LDM kids are killed off.  The result is that after an 'ldmadmin stop',
>> one must wait for a minute or so for all rpc.ldmd children to die.  If
>> one tries 'ldmadmin start' during this time, it hangs...
>> 
>> I have made a small patch to 'ldmadmin' which seems to clean up this
>> problem.  Instead of just killing off the rpc.ldmd process group leader,
>> it kills off all the kids too.
>> 
>> Attached is the new ldmadmin script.
>> 
>> Regards,
>> 
>>   Doug Hunt
>> 
>> --
>> address@hidden
>> Software Engineer III
>> UCAR - COSMIC
>> Tel. (303) 497-2611
>>
>
James D. Marco, address@hidden, address@hidden
Programmer/Analyst, System/Network Administration, 
Computer Support, Et Al. 
Office:         1020 Bradfield Hall, Cornell University 
Home:           302 Mary Lane, Varna      (607)273-9132
Computer Lab:   1125 Bradfield            (607)255-5589