Gabe Langbauer wrote:
Thanks Art,
I've checked out the hardware and it seems fine...We don't have any NFS
mounted, so that's not a problem...I've checked all system logs, none
produce anything...I've done a little logging now and it appears that a
gempak "gf" program that runs at about the same time as my cleanup script
"runs away" such that it runs for several hours, taking up 99% of one of
our cps's, then CRASH!!!
Gabe,
I would have responded to this thread earlier but I was out late last
week. We had tons of problems with our ldm server running Linux last
fall which sounds very similar to the behavior you are describing. I
had migrated our LDM from Freebsd to Linux (Slackware, not that it makes
much difference) and had huge problems with I/O waits bogging the entire
system down. Our ldm server runs numerous gempak scripts from the cron
which would frequently go weird on me and add to the system load. Our
ldm is running on a dual cpu Dell with a hardware raid-5 SCSI controller
(384MB cache on the controller). I tweaked the frequency and
configuration of our scour scripts endlessly and never could get the
system to run reliably. It would eventually have numerous instances of
scour all competing for the the disk. I spent a LOT of time and effort
in tweaking the filesystem parameters, kernel configuration, application
configurations, etc. Regardless, it would eventually load up to the
point that it became totally unresponsive and I could not log in on the
console to clean things up. We have a two channel NOAAport feed and file
nearly everything for both gempak and mcidas so our load is probably a
bit atypical.
Eventually, I tired of babysitting the server and migrated the system
back to Freebsd. The combination of the runaway gempak processes and
scour problems proved to be too much hassle. Since then, the system has
run nearly flawlessly. The only problems I've had were created by my
own action: tweaking a script or something of that nature. The system
I/O waits are about 1/10 of what they were with Linux (using ext3,
reiserfs, or XFS) when running the ldm scour. The scour now takes about
1/20th the time that it did running under Linux. Gempak scripts behave
*much* more predictably as well. Personally, I prefer to run Linux
whenever possible as I've been using it since 94 or '95 but I think it
could be worth your time to look at Freebsd or Solaris x86 as either
would perform much better under high load.
--
Mark Tucker
Meteorology Dept. Systems Administrator
Lyndon State College
http://apollo.lsc.vsc.edu
mark.tucker@xxxxxxxxxxxxxxx
(802)-626-6328