[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20040526: bigbird status (cont.)



Tom Yoksas wrote:
From:  Gerry Creager N5JXS <address@hidden>
Organization:  Texas A&M University -- AATLT
Keywords:  200405262241.i4QMfhtK005798 LDM RAID JFS


Hi Gerry,


IM==Instant Messaging... Sometimes convenient for on-line communications while remote troubleshooting...


I should have known...


I've seen the RAID failure on reboot several times.


Interesting...  Did you run fsck (or variant) to get things patched
up before remounting the RAID filesystem?

Yes.  Took a few (10 or less) min.  This time was taking a lot longer.

I really want to get rid of this card and get into a 3Ware card. New 'Net find today suggests that, as suspected, Promise's proprietary RAID is less than advertised. They said nicer things 'bout HighPoint, but the only "real RAID" comments were reserved for Adapeptec and 3Ware... noting Adaptec followed 3Ware's lead.


I believe that Pete Pokrandt of U Wisc/AOS is using a 3Ware card in
his Linux PC.

That's where I should have gone.

effectively, when rebooting, the system times out while flushing the product queue now.


Product queue?  If you mean LDM product queue, that is on a different
file system.  Also, I did not see a startup script for the LDM
in /etc/init.d, so I added one:

Was, or should have been, embedded in rc.local

/etc/init.d/ldmd

Since this wasn't there on reboot, the LDM queue would not have
been checked by it.


That apparently is tied to the RAID corruption in some manner. If we really saw a RAID corruption while running today, that's a first for me on this system. Further, there's spares. It should have alarmed and fixed itself.


I agree, but the load average did go up to 400...

Yep.

I'll keep looking. By doing the reboot, we did salvage the messages logs, and there might be some clues.


OK.


Thanks for spotting the problem. I was working on bigfoot and didn't even look at bigbird today, save to place it on a KVM. Hmmm. It's possible that caused a hiccup, but it shouldn't have. It's been in idle state WRT the monitor, keyboard, mouse for weeks. The reboot for the box is serendipitous. I wasn't planning to reboot 'til needed anyway, so I'd not have had console access (at least for X) 'til I did. Keyboard and video worked as expected...


OK.


Later, gerry


fsck.jfs is still running on /dev/md0, and it will
take time to finish.  I will try to look in on bigbird later tonight
or early tomorrow morning.  As soon as fsck.jfs finishes, I will try
to mount /data and crank up the LDM.

Time to head home...

It's taking a lot longer than it ever has before....

I'll be writing most of the night.  Proposal time.
TTFN, Gerry
--
Gerry Creager -- address@hidden
Network Engineering -- AATLT, Texas A&M University  
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843