I recently had an interesting discussion with Mike Schmidt about large
numbers of cores and LDM. It sounds like the real key isn't core-count,
but memory for the queue, and not (ever) having to swap that off to
disk. I'm wondering if you're large queues are going to disk for
virtual, and hanging, because of the core count? Just thinking out loud.
I'll keep you posted!
gerry
Pete Pokrandt wrote:
Gerry,
Let me know how it goes for you running Scientific Linux 6.0. I recently
tried it on the replacement for idd.aos.wisc.edu, and found that with a
large queue, the ldm would stop responding after a certain amount of
time. It seemed to work with smaller queue sizes (less than half the
size of my physical RAM) but if I went bigger than that, it would always
hang. One of the ldmd processes would peg at 100% of one CPU and no
data would flow after that.
There's a back and forth between me and Unidata support over at
http://www.unidata.ucar.edu/support/help/MailArchives/ldm/maillist.html
(subjects are "New IDD relay - ldm is hanging after some time") It
looked like it was hanging somehow in the glibc library while doing I/O.
I backed it out to CentOS 5.6 and for now it seems to be stable.
The new server has dual Opteron 6128 processors (2x8 cores), 32 Gb of
RAM and dual 300 Gb SAS drives. Any queue 16 Gb or bigger and I ran into
trouble.
Pete
On 6/25/2011 8:08 AM, Gerry Creager wrote:
bigbird is back and should be working. I had one config error
yesterday (firewall) which is corrected. Apologies for the inconvenience.
If you see problems or anomalies with bigbird, please let me know.
Background:
bigbird has been acting a bit unstable of late. It was also running
CentOS 4.8 and I was having problems getting security patches on it.
When I had problems Thursday evening and again Friday morning, it was
time to update the OS.
I made the switch from CentOS to Scientific Linux. SciLinux is
supported by Fermi Lab, with a full-time staff, and is on the same
lines as CentOS: A free (as in beer) version of RHEL. Installation
from DVD went very well, and a base server install was about half the
time as for CentOS based on what I recall. Like so much else, it does
start SELinux in enforcing mode by default but this is an easy
first-boot fix.
CentOS appears to be suffering some community fragmentation and
strife. RHEL 6.1 is out but CentOS hasn't released a v6.0 yet.
SciLinux was fast off the mark to get their 6.0 out. I'm a little
concerned about the future of CentOS.
gerry
--
Gerry Creager -- gerry.creager@xxxxxxxx
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843