[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021113: ldm having issues starting up related to McIDAS-XCD (cont.)



>From: William C Klein <address@hidden>
>Organization: Valparaiso
>Keywords: 200211131551.gADFpqL24688 LDM ldmd.conf xcd_run McIDAS-XCD

Bill,

I logged onto aeolus as 'ldm'.  Here is what I looked for and found:

-> Time to move onto the rest:

re: permissions on the McIDAS-XCD executables that 'xcd_run' tries

These look OK.

re: the permissions on the directories that XCD decoders want to write to

These also look OK.

re: there are no inter process communication handles left on the system.

This is the problem.  as 'ldm', I ran ipcs and got a long listing of
interprocess communication handles that needed to be removed:

[ 35 ] > ipcs
IPC status from <running system> as of Wed Nov 13 14:30:23 CST 2002
T         ID      KEY        MODE        OWNER    GROUP
Message Queues:
Shared Memory:
m          0   0x50000d2f --rw-r--r--     root     root
m          1   0          --rw-------      ldm vumcidas
m          2   0          --rw-------      ldm vumcidas
m          3   0          --rw-------      ldm vumcidas
m          4   0          --rw-------      ldm vumcidas
m          5   0          --rw-------      ldm vumcidas
m          6   0          --rw-------      ldm vumcidas
m          7   0          --rw-------      ldm vumcidas
m          8   0          --rw-------      ldm vumcidas
m          9   0          --rw-------      ldm vumcidas
m         10   0          --rw-------      ldm vumcidas
m         11   0          --rw-------      ldm vumcidas
m         12   0          --rw-------      ldm vumcidas
 ...
m         97   0          --rw-------      ldm vumcidas
m         98   0          --rw-------      ldm vumcidas
m         99   0          --rw-------      ldm vumcidas
Semaphores:

Also, my comment about the ~ldm/.mctmp directory containing lots of
subdirectories was also found:

[ aeolus : ldm : ~ ]                                                            
[ 39 ] > ls .mctmp
1     116   126   136   20    30    40    5     6     7     78    88    97
10    117   127   138   21    31    403   50    60    70    79    89    98
102   118   128   14    22    32    41    51    61    701   8     9     99
103   119   129   15    23    33    42    52    62    702   80    90
104   12    13    16    24    34    43    53    63    71    81    91
105   120   130   17    25    35    44    54    64    72    82    92
106   121   131   1742  26    36    45    55    65    73    83    93
108   122   132   1743  27    37    46    56    66    74    84    94
11    123   133   18    28    38    47    57    67    75    85    946
114   124   134   19    29    39    48    58    68    76    86    95
115   125   135   2     3     4     49    59    69    77    87    96

I removed the .mctmp subdirectories and cleaned-up the ipc handles:

[ aeolus : ldm : ~/.mctmp ]                                                     
[ 42 ] > rm -rf *

The next step was to delete all ipc segments:

set COUNT=1
while ( $COUNT <= 99 )
%while echo COUNT = $COUNT
%while ipcrm -m $COUNT
%while @ COUNT = $COUNT + 1
%while end

After doing this, I decided to become McIDAS and see if I could
create a McIDAS session (since this would exercise the shared memory
system on aeolus).  Here is what happened:

<login as 'mcidas'>
cd workdata
mcenv
ld.so.1: mcenv: fatal: relocation error: file mcenv: symbol __s_rsFe_pv: 
referenced symbol not found
Killed


This indicates some sort of a shared memory system problem on aeolus.
The next step I would normally suggest is a reboot, but I see that
aeolus has only been up for just over 7 hours.

Question:  did your problems start after a reboot earlier today?

Tom