[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010129: ldm trouble



Hi Jennie,

Under Tom's advice, I removed the /home/ldma/ldm-mcidas/bin path from
batch.k, which seemed to eliminate the pqact error that was filling up
the logs.  (He said that was not needed as the required software was in
the /home/mcidas/bin which was before it on the path.)  

This fix caused another error to appear in mcidas.log: 

/usr/local/ldm/util/batch.k: /home/mcidas/uvaworkdata/ROUTEPP.LOG:
cannot create

This was because there was already a file in that directory having that
name that was owned by ldma (although it was group writable - I don't
understand why ldm could not write to it.)  It is now owned by ldm - I
changed the ownership of all the files owned by ldma in that directory
to be owned by ldm.  

So, things seem to be working as far as I can tell.  No problems are
appearing in ldmd.log or ldmd.log.

I'm sorry for this problem.  Please let me know if you find any others.

To answer your questions: no, your upstream host does not care what user
ID you're using. 

To see if your 'allow'ed at your upstream feeds and to see whether they
were getting data, I tried a notifyme to each one for which there was an
ldmd.X file in the etc directory.  navier's up and getting data:

windfall: /usr/local/ldm/etc $ notifyme -vl - -h navier.meteo.psu.edu
Jan 29 23:12:39 notifyme[16646]: Starting Up: navier.meteo.psu.edu:
20010129231239.843 TS_ENDT {{ANY,  ".*"}}
Jan 29 23:12:40 notifyme[16646]: NOTIFYME(navier.meteo.psu.edu):
reclass: 20010129231239.843 TS_ENDT {{NNEXRAD,  ".*"}}
Jan 29 23:12:40 notifyme[16646]: NOTIFYME(navier.meteo.psu.edu): OK
Jan 29 23:12:40 notifyme[16646]:    15498 20010129231239.993 NNEXRAD
540  SDUS53 KGRR 292306 /pNCRGRR
Jan 29 23:12:40 notifyme[16646]:      740 20010129231240.005 NNEXRAD
541  SDUS53 KGRR 292306 /pNVLGRR
Jan 29 23:12:40 notifyme[16646]:      235 20010129231240.008 NNEXRAD
542  SDUS51 KCAR 292307 /pNVLCBW
Jan 29 23:12:40 notifyme[16646]:      296 20010129231240.128 NNEXRAD
544  SDUS55 KSLC 292306 /pNVLMTX
Jan 29 23:12:40 notifyme[16646]:     4853 20010129231240.128 NNEXRAD
545  SDUS50 PHFO 292312 /pN0VHKI
...
although it is very odd that when I do this command from your machine it
reclasses it into a request for the NEXRAD feed.  When I do this command
from my machine it reclasses it to a request for ANY.  Wierd.

Your "allowance" on sunset looks fine:

windfall: /usr/local/ldm/etc $ notifyme -vl - -h sunset.meteor.wisc.edu
Jan 30 00:18:29 notifyme[17682]: Starting Up: sunset.meteor.wisc.edu:
20010130001829.187 TS_ENDT {{ANY,  ".*"}}
Jan 30 00:18:29 notifyme[17682]: NOTIFYME(sunset.meteor.wisc.edu):
reclass: 20010130001829.187 TS_ENDT {{DIFAX|FSL2|UNIDATA,  ".*"}}
Jan 30 00:18:29 notifyme[17682]: NOTIFYME(sunset.meteor.wisc.edu): OK
Jan 30 00:18:30 notifyme[17682]:      465 20010130001830.415 IDS|DDPLUS
991  SRUS53 KGID 300018 /pRR2GRI
Jan 30 00:18:34 notifyme[17682]:      360 20010130001833.998 IDS|DDPLUS
999  USUS45 KRIW 300018 /pMANRIW
Jan 30 00:18:35 notifyme[17682]:      312 20010130001835.705 IDS|DDPLUS
004  FTUS44 KLZK 300018 AAA /pTAFPBF
Jan 30 00:18:35 notifyme[17682]:      375 20010130001835.707 IDS|DDPLUS
005  USUS43 KDTX 300018 /pMANDTX
Jan 30 00:18:39 notifyme[17682]:      360 20010130001839.486 IDS|DDPLUS
011  USUS41 KILN 300018 /pMANILN
^
but, I see that the conf file in your etc dir isn't named right. 
Instead of ldmd.meteor.wisc.edu it should be called
ldmd.sunset.meteor.wisc.edu.  I've changed that.


And, your name server can't find blueskies:  

windfall: /usr/local/ldm/etc $ notifyme -vl - -h
blueskies.sprl.umich.edu     
Jan 29 23:10:13 notifyme[16628]: Starting Up: blueskies.sprl.umich.edu:
20010129231013.995 TS_ENDT {{ANY,  ".*"}}
Jan 29 23:10:14 notifyme[16628]: NOTIFYME(blueskies.sprl.umich.edu): 13:
gethostbyname(blueskies.sprl.umich.edu): lookup failed

I get the same from my machine. I'll have to ask if anyone knows what's
up with them.

But, it looks like your ldmfail should work now.  I don't know why you
got that response from navier.  Perhaps it went down just when you were
looking at things, then the failover failed because the file wasn't
named right???

Please let me know how your ldm is doing.

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************


Unidata Support wrote:
> 
> ------- Forwarded Message
> 
> >To: address@hidden
> >cc: address@hidden
> >From: Local Data Manager <address@hidden>
> >Subject: ldm trouble
> >Organization: UCAR/Unidata
> >Keywords: 200101291912.f0TJCgX19824
> 
> Anne,
> 
> Well, as I was trying to look at what the ldm was doing, it
> seemed to fail altogether, I did and ldmadmin watch, and nothing
> came up, but just about that time, I got a message that the ldm
> had failed over.  When I looked at the logs, I find the response
> to a FEEDME on navier (my default upstream host) was RPC: Program
> not registered, and then everything seemed to stop all together.
> 
> I am uncertain about  restarting things at the moment.  A few
> thoughts come to mind, did changing the user making requests
> have any impact on our upstream host (they only know that
> requests
> come from a certain IP address, correct, so its doesn't
> "register"
> if we are user ldma or user ldm?
> 
> As I noted, we have been getting some data updated, so some
> things
> were getting through.  I did note that there is an old PATH
> in the /usr/local/ldm/util file batch.k.  This is the script
> that launches mcidas commands, and it needs the path of the
> ldm-mcidas.  It is still pointing to /home/ldma/bin/ldm-mcidas
> and it should now be /usr/local/ldm/ldm-mcidas/bin (I think.
> I only have one terminal open at the moment, so I cannot look).
> This is probably minor, unless the new ldm required a new version
> of ldm-mcidas and we were telling it to use the old, that would
> potentially mess up some of our scripts that make new products.
> 
> Here is the tail of the ldmd.log file:
> 
> Jan 29 18:38:56 windfall.evsc.Virginia.EDU pqact[6694]: child
> 19618 exited with
> status 127
> Jan 29 18:38:57 windfall.evsc.Virginia.EDU pqact[6694]: child
> 19620 exited with
> status 127
> Jan 29 18:38:57 windfall.evsc.Virginia.EDU pqact[6694]: child
> 19622 exited with
> status 127
> Jan 29 18:38:57 windfall.evsc.Virginia.EDU pqact[6694]: child
> 19624 exited with
> status 127
> Jan 29 18:39:21 windfall.evsc.Virginia.EDU navier[6697]:
> Connection reset by pee
> r
> Jan 29 18:39:21 windfall.evsc.Virginia.EDU navier[6697]:
> Disconnect
> Jan 29 18:39:51 windfall.evsc.Virginia.EDU navier[6697]:
> run_requester: 20010129
> 183831.903 TS_ENDT {{HDS|DDPLUS,  ".*"},{MCIDAS,  "^pnga2area
> Q[01]"}}
> Jan 29 18:39:51 windfall.evsc.Virginia.EDU navier[6697]:
> FEEDME(navier.meteo.psu
> .edu): RPC: Program not registered
> Jan 29 18:50:00 windfall.evsc.Virginia.EDU ldmping[25551]:
> SVC_UNAVAIL   0.25273
> 9    0   navier.meteo.psu.edu  RPC: Program not registered
> Jan 29 18:50:02 windfall.evsc.Virginia.EDU rpc.ldmd[6692]:
> Exiting
> Jan 29 18:50:02 windfall.evsc.Virginia.EDU rpc.ldmd[6692]:
> Terminating process g
> roup
> Jan 29 18:50:02 windfall.evsc.Virginia.EDU pqact[6694]: Exiting
> Jan 29 18:50:02 windfall.evsc.Virginia.EDU pqbinstats[6696]:
> Exiting
> Jan 29 18:50:32 windfall.evsc.Virginia.EDU navier[6697]: Exiting
> 
> Unsure of what to do ....
> 
> Jennie
> 
> --
> 
> ------- End of Forwarded Message