[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021017: Setting up LDM for McIDAS-XCD (cont.)



>From: Richard Massa <address@hidden>
>Organization: UC Davis
>Keywords: 200210150109.g9F19F112387 McIDAS-XCD LDM

Richard,

re: login
>I'd be happy to give you a login, but I'd rather fix the problems myself, for
>no other reason than I'm responsible for the system and I'd like to go through
>these steps with you so I can learn them and hopefully fix them myself here
>eventually.

I understand.  The only reason I suggested a login is that I am _the_
McIDAS support person at Unidata, and I have lots more than one site :-(.

re: run 'ldmadmin watch' to verify that the LDM is getting data

>I'm always running that on my workstation to make sure that data is still 
>getting to us :)

OK.

re: verify that XCD decoders are running

>They were running, but I restarted anyways to see if it would help.  Sorry...
>should have included that in my email.
>
>mcidas@atm20:/var/data/xcd$ ps -ef |grep inge  
>ldm      21857 21830  0 14:25 ?        00:00:00 ingetext.k DDS
>ldm      21858 21830  0 14:25 ?        00:00:00 ingebin.k HRS
>ldm      21989 21858  0 14:25 ?        00:00:00 ingebin.k HRS
>ldm      21998 21857  0 14:25 ?        00:00:01 ingetext.k DDS
>
>mcidas@atm20:/var/data/xcd$ ps -ef |grep start
>ldm      21831 21824  0 14:25 ?        00:00:00 startxcd.k
>ldm      22002 21831  0 14:25 ?        00:00:00 startxcd.k

These listings look OK.

>I did try stopping and restarting... that has made some improvement (I think)
>as I'm seeing MDXX0060, and MDXX0070, which weren't there before, as well as a
>substantial increase in file size.  Perhaps things are working, but I am still
>unable to look at temperature (just my test case) in mcidas...

Which says that the data isn't there.

>(A good 
>question: which data set is that, and where can I find a list of 
>filename->realname mappings?)

The POINT data is all decoded into an ADDE dataset with group name
RTPTSRC.  The different sets in the group were defined when you
ran BATCH LSSERVE.BAT.  You can review the settings by running
DSSERVE LIST RTPTSRC, and this should give:

Group/Descriptor         Type  Format & Range     RT Comment
------------------------ ----- ------------------ -- --------------------
RTPTSRC/AIRCRAFT         POINT MD   61-70         Y  Real-Time Aircraft data
RTPTSRC/FOUS14           POINT MD   41-50         Y  Real-Time FOUS14 data
RTPTSRC/LIGHTNING        POINT MD   71-80         Y  Real-Time Lightning data
RTPTSRC/PROF6MIN         POINT MD   91-100        Y  Real-Time 6-Minute Profiler
                                                      data
RTPTSRC/PROFHOURLY       POINT MD   81-90         Y  Real-Time Hourly Profiler d
                                                     ata
RTPTSRC/PTSRCS           POINT MD   1-100         Y  All point data in MDXX file
                                                     s
RTPTSRC/SFCHOURLY        POINT MD   1-10          Y  Real-Time SFC Hourly
RTPTSRC/SHIPBUOY         POINT MD   31-40         Y  Real-Time Ship and Buoy dat
                                                     a
RTPTSRC/SYNOPTIC         POINT MD   51-60         Y  Real-Time SYNOPTIC data
RTPTSRC/UPPERMAND        POINT MD   11-20         Y  Real-Time Upper Air (Mandat
                                                     ory)
RTPTSRC/UPPERSIG         POINT MD   21-30         Y  Real-Time Upper Air (Signif
                                                     icant)
DSSERVE: done

This gives the breakdown you are after:

RTPTSRC/SFCHOURLY:= Real-Time SFC Hourly -> MD files 1-10 (MDXX0001 - MDXX0010)

and so on.

>I also still don't see anything from a PTLIST command.
>
>ldm@atm20:/a/data/xcd$watch -n1 "ls -la MD*"
>Every 1s: ls -la MD*                                    Thu Oct 17 14:36:45 20
> 02
>-rw-r--r--    1 ldm      ldm       5545608 Oct 16 03:38 MDXX0018
>-rw-r--r--    1 ldm      ldm       5781328 Oct 16 13:47 MDXX0019
>-rw-r--r--    1 ldm      ldm       5635920 Oct 17 14:33 MDXX0020
>-rw-r--r--    1 ldm      ldm        784960 Oct 16 03:24 MDXX0028
>-rw-r--r--    1 ldm      ldm       1275344 Oct 16 13:47 MDXX0029
>-rw-r--r--    1 ldm      ldm       1264688 Oct 17 14:33 MDXX0030
>-rw-r--r--    1 ldm      ldm       4987588 Oct 16 13:31 MDXX0038
>-rw-r--r--    1 ldm      ldm       4984780 Oct 16 13:48 MDXX0039
>-rw-r--r--    1 ldm      ldm       4578052 Oct 17 14:35 MDXX0040
>-rw-r--r--    1 ldm      ldm       2215544 Oct 16 08:26 MDXX0049
>-rw-r--r--    1 ldm      ldm       4909200 Oct 16 05:08 MDXX0057
>-rw-r--r--    1 ldm      ldm       8858712 Oct 16 11:22 MDXX0058
>-rw-r--r--    1 ldm      ldm       7778136 Oct 17 14:29 MDXX0059
>-rw-r--r--    1 ldm      ldm       8859048 Oct 17 14:35 MDXX0060
>-rw-r--r--    1 ldm      ldm       3112844 Oct 16 04:04 MDXX0068
>-rw-r--r--    1 ldm      ldm       5436048 Oct 16 13:47 MDXX0069
>-rw-r--r--    1 ldm      ldm       5449292 Oct 17 14:32 MDXX0070

This gives you the same sort of info that the PTLIST command gave.

>Also, I've been seeing something that might be indicative of a (the) problem:
>I've got a zombie dmsfc.k process, that seems to be rapidly changing its PID
>(or dying and getting a new one started). Its parent is the slave startxcd.k
>process...

OK, this is telling us that dmsfc.k, the SAO/METAR decoder, is dying and
is being automatically restarted by the XCD supervisory process, startxcd.k.
The question is _why_ dmsfc.k is dying.

>I've grepped -i dms (and startxcd) in the ldmd.log.* files and it 
>doesn't turn up anything at all.

The McIDAS-XCD processes don't log to the LDM log file.  Instead, they
log to the file you defined as MCLOG in xcd_run.

>I've currently restarted the ldm with the -v
>flag so its being noisy, and I don't see anything besides products coming in.
>
>ldm@atm20:/a/data/xcd$ ps -ef |grep 7141 
>ldm       7141  7005  0 14:45 ?        00:00:00 startxcd.k
>ldm       7144  7141  0 14:45 ?        00:00:00 DMRAOB
>ldm       7145  7141  0 14:45 ?        00:00:00 DMSYN
>ldm       7146  7141  0 14:45 ?        00:00:00 DMMISC
>ldm       7157  7141  9 14:45 ?        00:00:11 DMGRID
>ldm      30730  7141 85 14:47 ?        00:00:04 [dmsfc.k <defunct>]

dmsfc.k is dying.  We must find out why.

re: perhaps SCHEMA has been deleted.

>The SCHEMA file is still there:
>ldm@atm20:/a/data/xcd$ ls -la /a/data/xcd/SCHEMA 
>-rw-r--r--    1 mcidas   mcidas     481792 Oct 15 12:57 /a/data/xcd/SCHEMA

I offered this suggestion since many folks make the mistake of trying
to scour the McIDAS data files (MDXX*, etc.) using the LDM scour facility.
If this is not done _very carefully_, then the SCHEMA file will eventually
get deleted since its timestamp never changes.  Once that happens, the
decoders that need the database information in SCHEMA will stop working
since they won't know how to create the output files (the MD files are
Meteorological Database files; SCHEMA contans the database file schemas
for the output data files).

>Thanks for bearing with me through all of this.

No worries.

>If none of this sheds any
>light, then you're welcome do the login thing, as its probably a lot more
>efficient for you...

I just logged to your system and see that the contents of your
~mcidas/workdata directory do not look complete.  To me this means that
the install was not done correctly/completely.  A number of configuration
files that should be found in the workdata directory are located in the
~mcidas/data directory.  I don't understand how this could happen since
the 'make install.xcd' will install the files in the workdata directory.
Did you copy these files to the /home/mcidas/data directory?

Also, I see a ~mcidas/update directory and a SKEDFILE in ~mcidas.
It is very possible that you inherited this kind of structure from
Erick.  

Instead of troubleshooting things in the current setup, I would much
rather clean things up and get the installation looking like a standard
McIDAS installation.  That way, troubleshooting new problems will be
much easier.

So, I recommend:

<as 'ldm'>

o shutting down the LDM

<as 'mcidas'>

o uninstalling and then reinstalling McIDAS-X 2002 (this is pretty quick

cd mcidas2002/src
make uninstall.all

At this point, there will be several files in the ~mcidas/data
directory that shouldn't be there (e.g., STRTABLE, LWPATH.NAM, etc.).
I would recommend removing all files from that directory EXCEPT the
ones that you explicitly created during the configuration process
or that were created as custom files for your site.

Here are the files that I would definitely keep:

ADDESITE.TXT
LSSERVE.BAT
LOCAL.NAM
LOCDATA.BAT
RESOLV.SRV      <- this file should be in the workdata directory; move it there

Once the directory is cleaned out, then a reinstall of the package can be
done:

cd ~mcidas/mcidas2002/src
make install.all

The other thing I would recommend doing -- unless specific setups at
your site rely on the split of XCD-created data and the ldm-mcidas created
data -- is combine the contents of /var/data/ldm/mcidas and /var/data/xcd.
You will need to trust me when I say that the consolidation process will make
life easier for you down the road.  What it will do in the short term,
however, is make you redo some of the setup work you did just a little
while ago.

I want to blast this off to you now before finishing my thoughts so that
you can get started if you are there or give me the green light to
do the modifications myself.

Tom