[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #IIX-893121]: Question...



Gerry,

Sorry for the delayed response.  We're having our Policy Committee
meeting.

> We've started seeing some additional problems with loss of data on some
> of our systems.  We've unloaded some of the filing and processing off
> dc-ldm2, but it's appearing now that sasquatch may have some issues.
> I'm including the latest log, but I'd like to call attention to the
> following, and state up front that I don't recall what these mean and am
> confused.  Again.
...
> Oct 29 19:57:42 sasquatch weather.renci.org(feed)[6772] NOTE: feed or
> notify failure; HEREIS: RPC: Unable to send; errno = Broken pipe
> Oct 29 19:57:42 sasquatch rpc.ldmd[6646] NOTE: child 6772 exited with
> status 7
> Oct 29 20:11:03 sasquatch gambit.itsc.uah.edu(feed)[6774] NOTE: feed or
> notify failure; HEREIS: RPC: Unable to send; errno = Broken pipe
> Oct 29 20:11:03 sasquatch rpc.ldmd[6646] NOTE: child 6774 exited with
> status 7
> Oct 29 20:12:30 sasquatch weather.renci.org(feed)[6981] NOTE: feed or
> notify failure; HEREIS: RPC: Unable to send; errno = Broken pipe
> Oct 29 20:12:30 sasquatch rpc.ldmd[6646] NOTE: child 6981 exited with
> status 7
> Oct 29 20:12:30 sasquatch weather.renci.org(feed)[7162] NOTE: Starting
> Up(6.6.2/6): 20071029191229.735 TS_ENDT {{EXP,  ".*"}},
> SIG=17e8813a38c6f1e280124c4d502ed161, Primary
> Oct 29 20:12:30 sasquatch weather.renci.org(feed)[7162] NOTE: topo:
> weather.renci.org {{EXP, (.*) - (SADC....-UNC)}}
> 
> Something tells me these are normal but I'm no longer confident of my
> recall on this.

The messages are relatively normal.  The upstream LDM processes on
Sasquatch couldn't write to the downstream LDM processes because the
pipes broke for some reason.  I know those messages aren't very
informative -- but that's the best that can be done on the upstream
side.  To discover the reason, you'd have to look at the log files
on the downstream side.

One reason could be network congestion.  Another is termination of
the downstream LDM processes.  And sometimes, the reason can't be
found (e.g., gateways terminating the connections, new firewall
rules, etc.).

> We're working specifically with weather.renci.org on this and had some
> issued lasat week and over the weekend getting data back and forth
> between the systems.  Any help you might be able to offer is already
> greatly appreciated.

Look at the downstream log file around the same time.  What does it
say?

> Gerry
> --
> Gerry Creager -- address@hidden
> Texas Mesonet -- AATLT, Texas A&M University
> Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: IIX-893121
Department: Support LDM
Priority: Normal
Status: Closed