[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NOAA port Datoo failure (fwd)



Russ,

I believe that you are the main contact person for NOAAport ingestor
software but hopefully you also handle the hardware problems. Here's the
situation, the NOAAport ingestor at LSU has never been stable since day
one, the machine crashes about every five days. This problem has not been
heard of on any other NOAAport ingest machines so this is new territory.
The machine itself has been maintained by Robert L and it has also been
checked by Mike S.  The os has been updated with patches, SDI software has
been updated and nothing else unusual is running on the machine except the
LDM.  This is the common configuration on most of the ingest machines. The
current conclusion of the problem is that the SDI card might be defective
by process of elimination.  This system was originally purchased by
ALden for LSU but the machine was not installed until recently because
other parts of the system needed to be installed. So what's the steps
to get another SDI card for the machine?  Is it under warranty?  Is
there a way to check the card on the machine? etc.  At this time, the
machine is rebooted every 5 days to keep it from locking up.

Thanks,
Robb...

===============================================================================
Robb Kambic                                Unidata Program Center
Software Engineer III                      Univ. Corp for Atmospheric Research
address@hidden             WWW: http://www.unidata.ucar.edu/
===============================================================================

---------- Forwarded message ----------
Date: Mon, 05 Mar 2001 15:38:55 -0600
From: Robert Leche <address@hidden>
To: Robb Kambic <address@hidden>
Subject: Re: NOAA port Datoo failure

Hello Robb,


Robb Kambic wrote:

> Robert,
>
> I resending this message in case you didn't get it the first time, since I
> heard that you had a breakin on some systems.  I think a good starting
> point would be to install the latest SDI s/w.

Ok, I am Game.


> I will put it on a ftp dir
> only for a short period of time because it's not public s/w.  So I need
> a note saying that you are ready to receive the s/w.
>

Let's do it. What is the ftp url?


>
> After talking to our system admin, we feel that their might be a h/w
> problem on the machine since you said that the machine hangs ( os hang )
> and that you need to do a hard reboot.  SA wants to login into the machine
> to check for problems and also to set up the machine so the next time it
> hangs he can extract some information about problem.
>

That's fine, do you still have the login id and password to datoo?

>
> If you change the slot that the SDI card was in, the system needs to be
> rebooted with "boot -r" so it can check the card configurations.
>

I am having trouble finding "boot". I found reboot, but not boot. I assume this 
is a
unix command. Also I did not see an obvious CMOS boot function.

>
> None of the other ingest system demonstrate the hang symptoms.  Has any
> machine configuration changes taken place around the time the system
> started to hang?
>

This platform has always had this problem from day one.

>
> Anything else you can think of about this problem would be helpful.
>
>

I do not see any obvious reasons to the problem. We have undergone 3 versions 
of the
software and I do not note a change in the failure pattern. Today, I wanted to 
try
changing the PCI slot that the card is in. And at the moment, I am unable to 
cause
the reboot with a -r reconfigure option.



Bob

> Robb...
>
> On Fri, 9 Feb 2001, Robb Kambic wrote:
>
> > Robert,
> >
> > There is a new version of the  NOAAport ingest s/w that came out last
> > week, I haven't obtained it. I plan on getting it Mon and installing it
> > Tues.  I'll let you know when the s/w is ready for you to download.
> > Hopefully this will help some of your problems.  If it doesn't then I will
> > take another look at your machine, so far none of the other ingest sites
> > are having this problem.
> >
> > Robb...
> >
> >
> >
> > On Fri, 9 Feb 2001, Robert Leche wrote:
> >
> > > Hello Robb,
> > > For weeks now, I have looked for an appropriate time to troubleshoot the
> > > Ingestion lockup problem that happens every 4~5 days on
> > > Datoo.srcc.lsu.edu. This problem when it occurs requires a reboot  of
> > > the computer. Killing the ingcntl and ldm processes do not cause a
> > > successful ldm process to restart. We have an opportunity to
> > > troubleshoot Datoo this morning. The NOAAport stopped , just after
> > > midnight. Opportunity awaits us!
> > >
> > > I have three files attached to this email:
> > > 1) a text file that has the results  top command and the NOAA port
> > > console messages.
> > > 2) ldm logs tarred up.
> > > 3) Results of the ps command.
> > >
> > > I will leave Datoo in it's present condition until 3:00pm cdt or until I
> > > hear from you. After 3:00 PM I will reboot Datoo and return it to
> > > service.
> > >
> > >
> > > Many Thanks,
> > >
> > > Bob
> > > address@hidden
> > > 225-578-5023
> > >
> > >
> >
> > ===============================================================================
> > Robb Kambic                              Unidata Program Center
> > Software Engineer III                    Univ. Corp for Atmospheric Research
> > address@hidden                 WWW: http://www.unidata.ucar.edu/
> > ===============================================================================
> >
> >
>
> ===============================================================================
> Robb Kambic                                Unidata Program Center
> Software Engineer III                      Univ. Corp for Atmospheric Research
> address@hidden                   WWW: http://www.unidata.ucar.edu/
> ===============================================================================