[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030519: NLDN inject machine problems after upgrading to LDM-6?



>From: Unidata User Support <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200305191728.h4JHSALd011729 LDM-6 NLDN IDD

Hi David,

This morning, I was made aware of intermittant (?) problems on the NLDN
IDD injection machine, striker.atmos.albany.edu, by Tom McDermott of
SUNY Brockport:

  >On Mon, 19 May 2003, Unidata Support wrote:
  >
  >> We have had no reports from SUNY Albany about problems running the
  >> LDM-6.0.1[01] on striker, so your report of a regular crash on it is
  >> news to us.
  >
  >Well you can see evidence of it by examining the latencies for NLDN at
  >Steve's rtstats page for pretty much any host.  To take one at random,
  >'sundog.atmos.ucla.edu', they go off the chart starting around 20Z
  >Saturday until around 1153Z today.  Albany is aware of the problem.  Here
  >is what I received from David Knight regarding an earlier episode:
  >
  >-----------------------------------------------------------------------------
  >From: David Knight <address@hidden>
  >Date: Tue, 22 Apr 2003 12:28:19 +0000 (GMT)
  >To: Tom McDermott <address@hidden>
  >Cc: address@hidden
  >Subject: Re: Unable to Connect to Striker
  >
  >Tom,
  >     OK thanks. Looks like there is a problem with striker.
  >I've rebooted it, and, you should be able to connect again.
  >
  >Kevin,
  >Apr 22 11:45:50 striker rpc.ldmd[5138]: accept: Too many open files
  >Looks like striker hit the limit on the number of files it can
  >have open... Not sure exactly why yet...
  >
  >DAvid
  >
  >> Hi,
  >>
  >> vortex.esc.brockport.edu has been unable to connect to striker since
  >1724Z
  >> yesterday.
  >>
  >> Tom

Review of real time statistics pages from sites receiving NLDN data
from striker shows that there was a data outage from around 19Z on the
17th until around 12Z today.

Did you see the same problem of "Too many open files" on striker
today?  If so, has the soft limit for number of open files been upped
from its default (e.g., on Solaris the soft limit is 128; the hard
limit is 1024)?  If this has been increased, significantly (like to the
max), is it possible that some other process(es) are opening files and
not closing them properly (e.g., the process that creates NLDN products
and injects them into the LDM queue)?

Is there anything we can do to help you troubleshoot this problem?  If
yes, please let us know.

Tom Yoksas