[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #VOV-751174]: Re: [ldm-users] Problems getting data from idd.aos.wisc.edu



Pete,

> Looks like updating to CentOS 6.3 and recompiling had no effect. I just
> restarted using 6.3 and again, the CPU usage on individual ldmd
> processes is very high (50-90%) and data is moving at a crawl.
> 
> I do notice that in 6.11.3 the CPU utilization is almost entirely
> consumed by system, rather than user context.

That is very odd. We don't see that here at all: our LDM 6.11.3 server handles 
about 88 downstream connections with a load average around 1 to 2.

> Here's what a 'top' looks like on 6.11.2 vs 6.11.3
> 
> 2/15/2013
> 6.11.3
> 
> top - 14:51:08 up 20 min,  2 users,  load average: 21.46, 17.09, 11.79
> Tasks: 575 total,  17 running, 558 sleeping,   0 stopped,   0 zombie
> Cpu0  :  1.6%us, 95.1%sy,  0.0%ni,  3.3%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu1  :  1.7%us, 88.3%sy,  0.0%ni,  7.0%id,  0.0%wa,  0.0%hi, 3.0%si,
> 0.0%st
> Cpu2  :  1.7%us, 97.7%sy,  0.0%ni,  0.7%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu3  :  2.0%us, 96.7%sy,  0.0%ni,  1.3%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu4  :  1.7%us, 95.4%sy,  0.0%ni,  2.6%id,  0.0%wa,  0.0%hi, 0.3%si,
> 0.0%st
> Cpu5  :  1.7%us, 97.7%sy,  0.0%ni,  0.7%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu6  :  1.7%us, 95.3%sy,  0.0%ni,  3.0%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu7  :  1.7%us, 93.4%sy,  0.0%ni,  5.0%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu8  :  1.7%us, 97.0%sy,  0.0%ni,  1.3%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu9  :  1.7%us, 98.0%sy,  0.0%ni,  0.3%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu10 :  1.7%us, 97.7%sy,  0.0%ni,  0.7%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu11 :  1.0%us, 98.0%sy,  0.0%ni,  1.0%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu12 :  1.3%us, 94.1%sy,  0.0%ni,  4.6%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu13 :  1.0%us, 92.7%sy,  0.0%ni,  6.3%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu14 :  1.0%us, 98.4%sy,  0.0%ni,  0.7%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu15 :  1.3%us, 98.3%sy,  0.0%ni,  0.3%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Mem:  32874844k total, 19404024k used, 13470820k free,    39280k buffers
> Swap: 32767984k total,        0k used, 32767984k free, 17950924k cached
> 
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND
> 28750 ldm       20   0 23.0g 872m 871m R 81.6  2.7   0:59.36 ldmd
> 28752 ldm       20   0 23.0g 927m 926m R 80.3  2.9   1:04.70 ldmd
> 28810 ldm       20   0 23.0g 154m 152m R 79.6  0.5   0:05.12 ldmd
> 28749 ldm       20   0 23.0g 847m 846m R 78.7  2.6   1:03.10 ldmd
> 28782 ldm       20   0 23.0g 443m 441m R 73.7  1.4   0:26.03 ldmd
> 28746 ldm       20   0 23.0g 865m 864m R 72.0  2.7   1:00.12 ldmd
> 28754 ldm       20   0 23.0g 867m 866m R 72.0  2.7   0:59.15 ldmd
> 28808 ldm       20   0 23.0g 187m 185m R 69.7  0.6   0:07.03 ldmd
> 28753 ldm       20   0 23.0g 884m 882m R 69.4  2.8   0:59.83 ldmd
> 28807 ldm       20   0 23.0g 170m 169m R 68.4  0.5   0:07.60 ldmd
> 28813 ldm       20   0 23.0g  86m  84m R 54.5  0.3   0:01.65 ldmd
> 28812 ldm       20   0 23.0g  81m  80m R 52.9  0.3   0:01.60 ldmd
> 28814 ldm       20   0 23.0g  75m  73m R 38.3  0.2   0:01.16 ldmd
> 28282 ldm       20   0 23.0g 2876 1720 S 35.7  0.0   0:56.41 ldmd
> 28281 ldm       20   0 23.0g 3388 2224 S 35.0  0.0   0:59.65 ldmd
> 28294 ldm       20   0 23.0g  14m  12m S 35.0  0.0   1:00.19 ldmd
> 28302 ldm       20   0 23.0g 3592 2404 S 35.0  0.0   0:59.76 ldmd
> 28278 ldm       20   0 23.0g  20m  18m S 34.7  0.1   1:00.39 ldmd
> 28300 ldm       20   0 23.0g  30m  28m S 34.7  0.1   1:01.31 ldmd
> 28301 ldm       20   0 23.0g 3720 2528 S 34.4  0.0   0:59.61 ldmd
> 28280 ldm       20   0 23.0g  32m  31m S 34.0  0.1   1:00.48 ldmd
> 28296 ldm       20   0 23.0g  13m  12m S 34.0  0.0   0:58.95 ldmd
> 28279 ldm       20   0 23.0g 3932 2776 S 33.7  0.0   1:00.65 ldmd
> 28288 ldm       20   0 23.0g  13m  12m S 33.7  0.0   0:59.04 ldmd
> 28290 ldm       20   0 23.0g  14m  12m S 33.7  0.0   0:59.94 ldmd

Very high system loads, indeed.

> 
> 
> 6.11.2
> top - 15:14:45 up 44 min,  3 users,  load average: 1.21, 2.22, 4.88
> Tasks: 580 total,   1 running, 579 sleeping,   0 stopped,   0 zombie
> Cpu0  :  0.7%us, 15.0%sy,  0.0%ni, 84.4%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu1  :  1.0%us, 15.3%sy,  0.0%ni, 83.1%id,  0.0%wa,  0.0%hi, 0.7%si,
> 0.0%st
> Cpu2  :  1.0%us, 14.8%sy,  0.0%ni, 83.8%id,  0.0%wa,  0.0%hi, 0.3%si,
> 0.0%st
> Cpu3  :  1.3%us, 16.0%sy,  0.0%ni, 82.3%id,  0.0%wa,  0.0%hi, 0.3%si,
> 0.0%st
> Cpu4  :  1.0%us, 15.3%sy,  0.0%ni, 83.1%id,  0.0%wa,  0.0%hi, 0.7%si,
> 0.0%st
> Cpu5  :  1.0%us, 16.8%sy,  0.0%ni, 81.8%id,  0.0%wa,  0.0%hi, 0.3%si,
> 0.0%st
> Cpu6  :  1.0%us, 16.6%sy,  0.0%ni, 82.5%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu7  :  0.3%us, 15.7%sy,  0.0%ni, 83.7%id,  0.0%wa,  0.0%hi, 0.3%si,
> 0.0%st
> Cpu8  :  0.3%us, 15.6%sy,  0.0%ni, 84.1%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu9  :  0.7%us, 15.2%sy,  0.0%ni, 84.1%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu10 :  0.7%us, 15.2%sy,  0.0%ni, 84.1%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu11 :  0.7%us, 15.3%sy,  0.0%ni, 84.1%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu12 :  0.7%us, 15.2%sy,  0.0%ni, 84.1%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu13 :  0.7%us, 15.2%sy,  0.0%ni, 83.8%id,  0.0%wa,  0.0%hi, 0.3%si,
> 0.0%st
> Cpu14 :  0.7%us, 15.3%sy,  0.0%ni, 84.1%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Cpu15 :  1.0%us, 14.6%sy,  0.0%ni, 84.4%id,  0.0%wa,  0.0%hi, 0.0%si,
> 0.0%st
> Mem:  32874844k total, 24002200k used,  8872644k free,    34976k buffers
> Swap: 32767984k total,        0k used, 32767984k free, 20543600k cached
> 
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND
> 38385 ldm       20   0 23.0g 552m 551m S  2.3  1.7   0:52.59 ldmd
> 38405 ldm       20   0 23.0g 642m 640m S  2.3  2.0   0:52.01 ldmd
> 38424 ldm       20   0 23.0g 558m 556m S  2.3  1.7   0:54.13 ldmd
> 38433 ldm       20   0 23.0g 549m 547m S  2.3  1.7   0:53.95 ldmd
> 38497 ldm       20   0 23.0g 526m 525m S  2.3  1.6   0:48.50 ldmd
> 38507 ldm       20   0 23.0g 1.7g 1.7g S  2.3  5.5   2:31.76 ldmd
> 38510 ldm       20   0 23.0g 2.1g 2.1g S  2.3  6.7   2:54.38 ldmd
> 38872 ldm       20   0 23.0g 168m 167m S  2.3  0.5   0:11.16 ldmd
> 39034 ldm       20   0 23.0g 176m 174m S  2.3  0.5   0:03.17 ldmd
> 38386 ldm       20   0 23.0g 1.2g 1.2g S  2.0  3.8   1:44.74 ldmd
> 38387 ldm       20   0 23.0g 572m 570m S  2.0  1.8   0:53.63 ldmd
> 38388 ldm       20   0 23.0g 1.2g 1.2g S  2.0  3.8   1:45.35 ldmd
> 38391 ldm       20   0 23.0g 644m 642m S  2.0  2.0   0:52.13 ldmd
> 38394 ldm       20   0 23.0g 579m 578m S  2.0  1.8   0:52.05 ldmd
> 38400 ldm       20   0 23.0g 1.1g 1.1g S  2.0  3.6   1:28.46 ldmd
> 38401 ldm       20   0 23.0g 2.1g 2.1g S  2.0  6.7   2:57.24 ldmd
> 38403 ldm       20   0 23.0g 1.2g 1.2g S  2.0  3.8   0:52.49 ldmd
> 38404 ldm       20   0 23.0g 526m 524m S  2.0  1.6   0:52.45 ldmd
> 38408 ldm       20   0 23.0g 539m 537m S  2.0  1.7   0:51.08 ldmd
> 38414 ldm       20   0 23.0g 632m 630m S  2.0  2.0   0:48.43 ldmd
> 38415 ldm       20   0 23.0g 558m 556m S  2.0  1.7   0:53.70 ldmd
> 38419 ldm       20   0 23.0g 526m 525m S  2.0  1.6   0:51.41 ldmd
> 38422 ldm       20   0 23.0g 563m 561m S  2.0  1.8   0:52.63 ldmd
> 38423 ldm       20   0 23.0g 558m 556m S  2.0  1.7   0:53.92 ldmd
> 38426 ldm       20   0 23.0g 563m 561m S  2.0  1.8   0:52.66 ldmd

I can't imagine what could be causing LDM 6.11.3 to have much higher system 
loads than LDM 6.11.2 on your system.

> My queue size is 24 Gb, my system RAM is 32 Gb, could that have anything
> to do with it?

That seems close. Your product-queue should fit in memory. We like to have 
about twice as much memory as the size of the product-queue but have used less. 
Our backend server has about 74 GB of memory and a 30 GB queue.

> Steve, I can get you ldm access to idd.aos.wisc.edu if you want to poke
> around at all.

That would help.

> I also can send ldmd.log when running 6.11.3 vs 6.11.2.

If I get on the system, then I can look at the log file directly. Would you 
mind if I switched between the LDM-s (or is the system operational)?

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: VOV-751174
Department: Support LDM
Priority: Normal
Status: Closed