[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000809: LDM feed?



Unidata Support wrote:
------- Forwarded Message

>To: Mark Tucker <address@hidden>
>cc: address@hidden
>From: Pete Pokrandt <address@hidden>
>Subject: Re: LDM feed?
>Organization: UCAR/Unidata
>Keywords: 200008091656.e79Gu4T26059

In a previous message to me, you wrote:

 >Pete,
 >Hi.  We've had a bit of a problem in feeding from Sunset.  So far I have
 >not been able to pull anything from your ldm to ours (cirrus.lsc.vsc.edu).
 >Initially I thought the problems were with a recent upgrade of our ldm
 >to 5.0.10 but I have since moved over some of our product requests to PSU
 >and they are coming in without any problems. Here is a sample of the
 >ldmd.log:
 >
 >Aug 09 14:35:13 cirrus sunset[2043]: run_requester: 20000809133512.760
 >TS_ENDT {{MCIDAS,  "^pnga2area Q[01]"},{SPARE,  "^meso"}}
 >Aug 09 14:35:14 cirrus sunset[2043]: FEEDME(sunset.meteor.wisc.edu):
 >reclass: 20000809133512.760 TS_ENDT {{MCIDAS,  "^pnga2area Q[01]
 >Aug 09 14:35:14 cirrus sunset[2043]: FEEDME(sunset.meteor.wisc.edu): OK
 >Aug 09 14:35:25 cirrus sunset[2043]: RECLASS: 20000809133525.665 TS_ENDT
 >{{MCIDAS,  "^pnga2area Q[01]"}}
 >Aug 09 14:35:25 cirrus sunset[2043]: skipped: 20000809133518.397 (7.268
 >seconds)
 >Aug 09 14:35:26 cirrus sunset[2043]: RECLASS: 20000809133526.782 TS_ENDT
 >{{MCIDAS,  "^pnga2area Q[01]"}}
 >Aug 09 14:35:27 cirrus sunset[2043]: Connection reset by peer
 >Aug 09 14:35:27 cirrus sunset[2043]: Disconnect
 >
 >I get the RECLASS error regardless of what product I request from sunset.
 >Do you have any ideas as to what may be the problem?
 >
 >Mark Tucker
 >Information Technology
 >Lyndon State College
 >address@hidden
 >http://apollo.lsc.vsc.edu
 >

Mark,

I've got similar messages in my logs (attached below).

I had assumed it was either a bad route, or a problem with the
queue on your end, but apparently that is not the case.

Usually such reclass problems are related to network problems,
but a traceroute shows a pretty clean route between us:

sunset.meteor.wisc.edu 7% /usr/etc/traceroute cirrus.lsc.vsc.edu
traceroute to cirrus.lsc.vsc.edu (155.42.21.3), 30 hops max, 40 byte packets
 1  144.92.130.1 (144.92.130.1)  2 ms  2 ms  2 ms
 2  r-peer.net.wisc.edu (144.92.128.131)  2 ms  2 ms  2 ms
 3  UWMadisonISP-atm0-0-252.core.wiscnet.net (216.56.1.17)  2 ms  3 ms  3 ms
 4  NChicago1-core0.nap.net (207.227.0.201)  7 ms  8 ms  9 ms
 5  p4-2.chcgil1-ba1.bbnplanet.net (4.24.6.113)  7 ms  8 ms  7 ms
 6  4.24.5.230 (4.24.5.230)  7 ms  8 ms  9 ms
 7  p2-2.chicago1-nbr1.bbnplanet.net (4.0.5.233)  8 ms  7 ms  7 ms
 8  p5-0-0.chicago1-br1.bbnplanet.net (4.0.1.206)  13 ms  8 ms  10 ms
 9  p0-0-0.chicago2-cr4.bbnplanet.net (4.0.3.165)  9 ms  9 ms  10 ms
10  h3-0.uswest-ch.bbnplanet.net (4.0.196.250)  9 ms  11 ms  9 ms
11  168.103.1.14 (168.103.1.14)  31 ms  31 ms  31 ms
12  155.42.5.2 (155.42.5.2)  41 ms  39 ms  39 ms
13  cirrus.lsc.vsc.edu (155.42.21.3)  38 ms  40 ms  38 ms

I don't know what's the deal here, other sites seem to be
having no problem feeding from sunset.

Perhaps the support folks might have some idea what's going on?

I am running ldm 5.1.2beta1 for what it's worth, but I think you
were having trouble before I switched from 5.0.11

Here's what the ldmd.log messages look like from my end (on
sunset.meteor.wisc.edu):

Aug 09 16:35:50 5Q:sunset cirrus[243262]: Connection from cirrus.lsc.vsc.edu
Aug 09 16:35:50 5Q:sunset cirrus(feed)[243262]: Starting Up: 20000809153548.822 TS_ENDT {{MCIDAS,  "^pnga2area Q[01]"}}
Aug 09 16:35:50 5Q:sunset cirrus(feed)[243262]: topo:  cirrus.lsc.vsc.edu MCIDAS
Aug 09 16:35:51 5Q:sunset cirrus(feed)[243262]: RECLASS: 20000809153551.424 TS_ENDT {{MCIDAS,  "^pnga2area Q[01]"}}
Aug 09 16:35:51 3Q:sunset cirrus(feed)[243262]: pq_sequence failed: I/O error (errno = 5)
Aug 09 16:35:51 5Q:sunset cirrus(feed)[243262]: Exiting
Aug 09 16:35:57 5Q:sunset rpc.ldmd[155071]: child 243262 exited with status 1
Aug 09 16:36:23 5Q:sunset cirrus[231544]: Connection from cirrus.lsc.vsc.edu
Aug 09 16:36:23 5Q:sunset cirrus(feed)[231544]: Starting Up: 20000809153621.663 TS_ENDT {{MCIDAS,  "^pnga2area Q[01]"}}
Aug 09 16:36:23 5Q:sunset cirrus(feed)[231544]: topo:  cirrus.lsc.vsc.edu MCIDAS
Aug 09 16:36:23 5Q:sunset cirrus(feed)[231544]: RECLASS: 20000809153623.959 TS_ENDT {{MCIDAS,  "^pnga2area Q[01]"}}
Aug 09 16:36:24 3Q:sunset cirrus(feed)[231544]: pq_sequence failed: I/O error (errno = 5)
Aug 09 16:36:24 5Q:sunset cirrus(feed)[231544]: Exiting
Aug 09 16:36:30 5Q:sunset rpc.ldmd[155071]: child 231544 exited with status 1
Aug 09 16:36:56 5Q:sunset cirrus[241255]: Connection from cirrus.lsc.vsc.edu
Aug 09 16:36:57 5Q:sunset cirrus(feed)[241255]: Starting Up: 20000809153654.434 TS_ENDT {{MCIDAS,  "^pnga2area Q[01]"}}
Aug 09 16:36:58 5Q:sunset cirrus(feed)[241255]: topo:  cirrus.lsc.vsc.edu MCIDAS
Aug 09 16:37:08 5Q:sunset cirrus(feed)[241255]: RECLASS: 20000809153708.120 TS_ENDT {{MCIDAS,  "^pnga2area Q[01]"}}
Aug 09 16:37:10 5Q:sunset cirrus(feed)[241255]: RECLASS: 20000809153710.825 TS_ENDT {{MCIDAS,  "^pnga2area Q[01]"}}
Aug 09 16:37:11 3Q:sunset cirrus(feed)[241255]: pq_sequence failed: I/O error (errno = 5)
Aug 09 16:37:11 5Q:sunset cirrus(feed)[241255]: Exiting

Thanks Robb/Anne for any assistance, or ideas you come up with.

Pete

--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
^ Systems Programmer               V Madison,         WI     53706    ^
^                                  V      address@hidden       ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+

------- End of Forwarded Message

Hi Mark (and Pete),

This looks like a fundamental issue of a poor network connection.   pq_sequence errors like those in Pete's log occur when the
LDM is reading a product from the queue in an attempt to feed it to a downstream site and the connection to that
downstream site goes down.    Most likely something timed out due to the lack of a timely response, and dropped the connection.  Traceroute may not show such errors since it only sends very small packets which are much more likely to get through.

There may or may not be something to be done about this.

First, you could request less data.  That's always an option, albeit undesirable.  Probably smaller products are more likely to
get through than larger products.

Next, the only other thing you might have some control over has to do with your campus network and the people who
administer it.   From Pete's traceroute, above, it looks like most of the slowest times are occuring on your campus.  (The last
two sites have the same domain, 155.42, and they are among the slowest.  Bear in mind that the times that traceroute
reports are not cumulative - the times reported are the times between sites.)

Robb and I think you need to build a case to present to your campus  network people.  If traceroute consistently shows slow
speeds near cirrus, I would show that to them.  Maybe they can move you closer to the exterior connection so that you can
avoid some slow hops on campus.

Notice that the third to last site reported by the traceroute is also slow.   Perhaps that's the router for the campus' ISP.   If
so, since the campus network people are subscribers to that ISP, they could present this information to the ISP and request
better service.

I suggest you run ldmping.  Do:  ldmping -i 5 -h sunset.meteor.wisc.edu.  Let it run for a while so that you can see the
variation in transmission speeds.    Perhaps there are patterns in the fluctuations that can be attributed to something that
occurs on campus.  This is also something that I would show to the campus network people.

Otherwise, I'm afriad that's the best we can come up.  We've written a web page about this issue - if you haven't seen it
already, take a look at: http://www.unidata.ucar.edu/packages/ldm/troubleshooting/networkTrouble.html.

Also, the RECLASS message isn't about network problems per se, but is about product latency.  If you want to know more
about RECLASS, see http://www.unidata.ucar.edu/packages/ldm/troubleshooting/reclassMsg/reclassDoc.html, although
that won't really help you with this problem.

Anne

-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                  P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************