[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Top level CONDUIT relay



On Wed, 2007-06-20 at 00:04 -0400, Chi Y Kang wrote:
> I setup the test LDM server for the NCEP folks to test the local pull 
> from the LDM servers.  That should give us some information / network or 
> system related issue.  We'll handle that tomorrow.  I am a little bit 
> concerned that the slow down all occurred at the some time as the ldm1 
> crash last week.

Chi,

When ldm1 was restarted last week, all connected hosts would have to
re-establish their connection.

My original thought was that there was a change in the I2 volume
limitation at your end, which had taken place at some point, but
existing connections were still grandfathered by the router, so the
change was seen when the ldm was restarted.
Since we do not know the details of your setup there, we were wondering
if a
global limitation for all ports was added.

This would also explain why removing one of the downstream feeders
improved the
latency we are seeing. I can't explain the difference between ldm1 and
ldm2 though so I still do not have a good feeling about what variables
we have been hitting.

In the mean time, the data to NSF and Wisconsin from ldm2 has been great
since removing theconnection from Unidata at 13Z this morning.

Steve

> 
> Also, can NCEP also check if there are any bad dbnet queues on the 
> backend servers?  Just to verify.
> 
> 
> 
> Steve Chiswell wrote:
> > Thanks Justin,
> > 
> > I also had a typo in my message:
> > ldm1 is running slower than ldm2
> > 
> > Now if the feed to ldm2 all of a sudden slows down if Pete and other
> > sites add a request to it, it would really signal some sort of total
> > bandwidth limitation
> > on the I2 connection. Seemed a little coincidental that we had a show
> > period
> > of good connectivity to ldm1 after which it slowed way down.
> > 
> > Steve
> > 
> > 
> > On Tue, 2007-06-19 at 17:01 -0400, Justin Cooke wrote:
> >> I just realized the issue. When I disabled the "pqact" process on ldm2 
> >> earlier today it caused our monitor script (in cron, every 5 min) to 
> >> kill the LDM and restart it. I have removed the check for the pqact in 
> >> that monitor...things should be a bit better now.
> >>
> >> Chi.Y.Kang wrote:
> >>> Huh, i thought you guys were on the system.  let me take a look on ldm2
> >>> and see what is going on.
> >>>
> >>>
> >>> Justin Cooke wrote:
> >>>   
> >>>> Chi.Y.Kang wrote:
> >>>>     
> >>>>> Steve Chiswell wrote:
> >>>>>  
> >>>>>       
> >>>>>> Pete and David,
> >>>>>>
> >>>>>> I changed the CONDUIT request lines at NSF and Unidata to request data
> >>>>>> from ldm1.woc.noaa.gov rather than ncepldm.woc.noaa.gov after seeing
> >>>>>> lots of
> >>>>>> disconnect/reconnects to the ncepldm virtual name.
> >>>>>>
> >>>>>> The LDM appears to have caught up here as an interim solution.
> >>>>>>
> >>>>>> Still don't know the cause of the problem.
> >>>>>>
> >>>>>> Steve
> >>>>>>       
> >>>>>>         
> >>>>> I know the NCEP was stop and starting the LDM service on the ldm2 box
> >>>>> where the VIp address is pointed to at this time.  how is the current
> >>>>> connection to LDM1?  is the speed of the conduit feed acceptable?
> >>>>>   
> >>>>>       
> >>>> Chi, NCEP has not restarted the LDM on ldm2 at all today. But looking
> >>>> at the logs it appears to be dying and getting restarted by cron.
> >>>>
> >>>> I will watch and see if I see anything.
> >>>>
> >>>> Justin
> >>>>     
> >>>
> >>>   
> 
> 
-- 
Steve Chiswell <address@hidden>
Unidata