[thredds] TDS performance issues on a production server

Hi, we’ve recently released a web application that uses TDS for mapping, which 
is getting a lot of traffic. At one point the server stopped responding 
altogether, which is a major problem. A quick restart of tomcat got it going 
again, so I’m starting to dig into the logs. We normally get the GET / request 
complete behavior, but occasionally we’ll have:

GET …url…
GET …url…
GET …url…
GET …url…
GET …url…
GET …url…
GET …url…
GET …url…

meanwhile having a 100% CPU spike (with 12 CPUs) for a minute or more

request compete
request compete
request compete
request cancelled by client
request cancelled by client
request compete
request compete

While watching the logs the few times I’ve seen this occur it seems to pull out 
of it ok. However the time the server failed, requests were never returned. 
From the logs, requests came in for roughly 40 minutes without being completed. 
Unfortunately do to the high visibility we started to get emails from users and 
the press about the application no longer working. 

Has anyone experienced this before and/or can you give guidance on how to 
diagnose or prevent this?

Here are some config settings:
CentOS 5.7 
Java 1.6
TDS 4.3.17
only WMS is enabled
Java -Xmx set to 8Gb (currently taking 5.3, the dataset is 600 Gb of 
30-arcsecond grids for the continental US, 3.4 Gb per file)
For better or worse we are configured to use 2 instances of TDS to keep the 
catalogs and configuration isolated. I’m not sure if this matters, but I didn’t 
want to omit it. Since it is a live server I can’t easily change to the 
preferred proxy configuration.

I am trying not to panic yet. However, if the server goes unresponsive again, 
staying calm may no longer be an option.
Jay Alder
US Geological Survey
Oregon State University
104 COAS Admin Building
Office Burt Hall 166
http://ceoas.oregonstate.edu/profile/alder/

  • 2013 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: