[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[TIGGE #CUA-629523]: Re: dataportal not receiving data from tigge-ldm.ecmwf.int



Manuel,

> That can be achieved with lsof(1), which lists the open file
> descriptors, including network sockets. This is the list of processes
> with port 'unidata-ldm' open:
> 
> COMMAND     PID       USER   FD      TYPE             DEVICE       SIZE
> NODE NAME
> rpc.ldmd  31906        ldm    0u     IPv4           21859518
> TCP *:unidata-ldm (LISTEN)
...
> rpc.ldmd  31907        ldm    0u     IPv4           21859518
> TCP *:unidata-ldm (LISTEN)
...
> This shows two processes listening on port unidata-ldm: 31906 and 31907

Indeed it does show two processes listening on port 388 for TCP connections.  
This should be impossible and probably indicates a problem with your 
operating-system.

Because both processes are listening on port 388, this might be the cause of 
your problem (I didn't see how because the O/S should prevent multiple 
processes from listening on the same port for the same type of connection).

> This is the list of rpc.ldmd's (ps -fu ldm| grep rpc.ldmd):
> ldm      31906     1  0 Apr11 ?        00:00:12 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm      31907     1  0 Apr11 ?        00:21:33 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm      32091     1  0 Apr11 ?        00:08:01 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm      32145     1  0 Apr11 ?        00:03:47 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm      32147     1  0 Apr11 ?        00:07:41 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> ldm      18695     1  0 Apr12 ?        00:00:59 rpc.ldmd -P 388 -v -q
> /usr/local/ldm/data/ldm.pq /usr/local/ldm/etc/ldmd.conf
> 
> 
> So just to confirm, you want me to kill the following processes:
> 32091 32091 32145 32147 18695

PID 31091 is listed twice.

At this point I'm not sure which LDM server process should continue to run.  I 
suggest executing the "ldmadmin stop" command and then manually sending a 
SIGTERM signal to any and all remaining LDM process.  If they don't terminate, 
then try a SIGINT and, finally, a SIGKILL.

Once that's done, do a "ldmadmin clean" to cleanup.  Then execute a "pqcheck 
-v" to check the product-queue for corruption.  Recreate the product-queue if 
necessary.

Then, very carefully, execute an "ldmadmin start" and see if it creates 
multiple top-level LDM servers (it shouldn't).

Is there an EXEC entry in the LDM configuration-file (etc/ldmd.conf) that 
starts another LDM?

Keep me apprised.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: CUA-629523
Department: Support IDD TIGGE
Priority: Normal
Status: On Hold