[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[McIDAS #FQS-529833]: Hang on...another ride coming.



HI Gilbert,

Sorry I was not able to respond yesterday, but we had the second day of the
Users Committee meeting during the first half of the day and I spent the
second working on use of VMware Player in Windows XP.  More on VMware Player
if you are interested... its great!

re:
> OK, so my original weather.admin.niu.edu fried thanks to a
> thunderstorm/lightning hit with its raid array of two hard drives.

Yup.

> Then, I got new machines to replace weather2 and weather3.

Yup, yup...

> But, they were to become weather and weather2 since weather died.

Yup, yup, yup...

> Well, guess what.
> 
> As you recall, I got bad hard drives on the new weather2. Weather2's died
> about two weeks ago, and I just got it in today.
> 
> But guess what.
> 
> Weather.admin started crashing today, and file system checks indicate the
> hard drive on that one is bad too!

OK, so getting this many bad hard disks is either extremely bad luck or
an indication that there is something else wrong with your setup.  Our
experience with hard disks has been very good over the past decade.  Yes,
we have had some fail, but only a small number, and those were in systems
that were either _heavily loaded_ (e.g., motherlode.ucar.edu) or had
sustained power hits.

If I were you I would check to make sure that your cases were being adequately
ventilated.  I just upgraded the motherboard in my home machine (to an excessed
dual Athlon 2400+ populated Tyan board), and I ran into unexpected cooling
problems with the 400W power supply that I had to install to run the replacement
motherboard.  I first thought that there was something wrong with either the
motherboard (excessive current draw) or power supply itself, but I found that
I had an airflow loop that was sucking the hot air being exhausted from the
power supply back into the case right next to the power supply.  I installed
an 80 mm (standard size) fan in the case and the temperature inside the case
dropped dramatically.  If your system does not have proper ventilation, it
is possible that you are frying your hard drives from the excess heat.  Just
a thought...

> So, the new weather2, which had a bad hard drive, has suddenly become the
> new weather.admin.niu.edu. And, I just called my vendor, and he's sending
> out a new hard drive today, which I'll probably get Tuesday, for
> weather2.admin.niu.edu. Which is the current weather.admin.niu.edu with
> the failing hard drive. I will be gone much of Tuesday, so I may not get
> it done until Wednesday. If I get the new hard drive Monday (UPS ground, I
> think), I'll do it then, but I doubt it will be there.
> 
> Can you check weather to make sure it is OK? I just swapped machines. I
> think you might have to install a few minor things, of which I have
> forgotten.

I don't know if you are busily working on your machines, or if your local
network is down, but I was unable to access any of your machines a few
minutes ago (it is currently 9:57 MDT on Saturday).  I will keep trying
as time permits today, tonight, and/or tomorrow.

> I am getting an error in /var/log/messages stating that
> weather.admin.niu.edu !=localhost.localdomain. Anyoen over there hear of
> that problem? I temporarily remaned it as such to download patches. Then
> when I renamed it to weather.admin.niu.edu, I'm still getting this error.
> A config file must not have taken, but I am clueless as to which one. I
> was using system-config-network to make it all work. I don't know if it is
> causing problems. I checked everything, and it looks fine...

I agree that this sounds like there is conflicting configuration information
somewhere, and I don't know exactly where to look.  I would, however, first
verify that you do not have a conflict between definitions in /etc/hosts
and /etc/sysconfig/network.  I will poker around to see what I can see
as soon as I can get on your machines.

Hang in there...

Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: FQS-529833
Department: Support McIDAS
Priority: Normal
Status: Open