[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030920: mesodata load average



>From:  Gerry Creager N5JXS <address@hidden>
>Organization:  Texas A&M University -- AATLT
>Keywords:  200309201339.h8KDdPk1012811

Hi Gerry,

>Yesterday afternoon, when I logged in, it looked like things had calmed 
>down.

I did a couple of things on mesodata yesterday to try and calm things:

- setup the runtime links to all point at ldm-6.0.14; this was more
  housekeeping than something that would calm things

- commented out the the ~ldm/scour.conf entry that was scouring the
  ~ldm/data/gempak/nexrad directories.  I FTPed down a script
  designed to scour NEXRAD data, prune_nexrad.csh, and set it
  up to keep about a day's worth (about a day since prune_nexrad.csh
  is setup to keep a certain number of images, and I set it up to
  keep 288 files which is 1 day of images when the radar is operating
  in storm mode)

- changed the LDM scour to run less than every 2 hours; I noticed
  that the thing using up the most of the machine was multiple
  invocations of the scouring script, and, since scouring hits the disk
  _hard_, it is better to run it as little as possible

- cleaned out the ~ldm/logs directory of .stats files (produced by
  running pqbinstats; there were 2462 of these files there); these
  files were not getting sent to Unidata since there was no crontab
  entry to do the send and remove. I added the entry that runs
  'bin/ldmadmin dostats' at 35 past the hour.

- cleaned out  the ~ldm/data/nexrad/NIDS directory since you are now
  FILEing NEXRAD images in the ~ldm/data/gempak/nexrad/NIDS directory.
  This freed up over a GB of disk (there were 122000+ files there)

I noticed that the ~ldm/data/ddplus directory is huge as are a number
of other directories under ~ldm/data:

$ cd ~ldm/data
$ du -sk *
19660   AR
2141676 ARCHIVES
79456   binex
408     combhourly_pwv
0       cronlog
34389440        ddplus
4       decoded
124176  difax
4       fcst
4       forecasts
 ...

If -- and I didn't have the time to determine this -- scouring is
attempted in any of these directories, your system will slow to a
crawl.  I ran out of time yesterday afternoon so I havn't determined if
attempted scouring in any of these directories is what is causing your
problems.

>Last night, radar was flowing nicely.  And, of course, this 
>morning, the load avg was back around 15.

Yesterday the load average was hovering at around 13-15.  I found 5
invocations of LDM's scour running (5 instances of 'find').  It was
after I stopped the LDM and killed those scour invocations that the
load average went back down to less than 1.

>I've cut scour back to 0100 local, once per day.  I'm still trying to 
>find somenthing causing the load to shoot up.

This is a good step, and, now that the NEXRAD directories are being
scoured by a different script, it should be all that you need.

>I may have to revamp some of the parsing and db stuff to fix this.

I don't think that this is the problem, but I havn't had enough time
to really look at things in enough detail to know.

>Any thoughts?

The other thing I saw was that your GEMPAK decoding was not setup 
exactly as Chiz recommends.  I would like to revamp this setup so
that future GEMPAK upgrades can be done without a lot of thinking.
A standard installation would also allow the GEMPAK utility to
rotate GEMPAK log files.  At least one of them that I renamed yesteday
was 2 GB in size, and, since that is the maximum file size, it was
no longer being written into.

>Thanks, gerry

Got to run...

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+