Re: [thredds] TDS performance issues on a production server

To: Gerry Creager - NOAA Affiliate <gerry.creager@xxxxxxxx>
Subject: Re: [thredds] TDS performance issues on a production server
From: Jay Alder <jay.alder@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 12 Dec 2013 10:19:42 -0800

The server is CentOS 5.7, with 48 Gb main memory using Sun/Oracle Java 1.6 and 
tomcat 6.

Our catalog is pretty small, with only a couple hundred datasets. With such a 
small catalog I setup our machine to restart tomcat each night, which only 
takes about 2 seconds. Spiking 12 CPUs to 100% for over a minute is still 
curious and bogs down response time when it occurs. Hopefully the nightly 
restarts will prevent TDS from becoming completely unresponsive.

On Dec 12, 2013, at 9:47 AM, Gerry Creager - NOAA Affiliate 
<gerry.creager@xxxxxxxx> wrote:

> Jay, 
> 
> What OS, release and Tomcat version are you running? I've seen a similar 
> issue on another piece of software (Ramadda). Since I've seen this behavior 
> with the standalone server and the tomcat-based server, I'm beginning to 
> suspect my Java installation, but have not had sufficient time to investigate 
> yet.
> 
> There may be an OS correlation here, so I'm interested. I'm running RHEL 6 
> and the various updated flavors of OpenJDK and Tomcat6.
> 
> gerry
> 
> 
> 
> 
> On Wed, Dec 11, 2013 at 4:33 PM, Jay Alder <alderj@xxxxxxxxxxxxxxxxxxxx> 
> wrote:
> Hi, we’ve recently released a web application that uses TDS for mapping, 
> which is getting a lot of traffic. At one point the server stopped responding 
> altogether, which is a major problem. A quick restart of tomcat got it going 
> again, so I’m starting to dig into the logs. We normally get the GET / 
> request complete behavior, but occasionally we’ll have:
> 
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> 
> meanwhile having a 100% CPU spike (with 12 CPUs) for a minute or more
> 
> request compete
> request compete
> request compete
> request cancelled by client
> request cancelled by client
> request compete
> request compete
> 
> While watching the logs the few times I’ve seen this occur it seems to pull 
> out of it ok. However the time the server failed, requests were never 
> returned. From the logs, requests came in for roughly 40 minutes without 
> being completed. Unfortunately do to the high visibility we started to get 
> emails from users and the press about the application no longer working. 
> 
> Has anyone experienced this before and/or can you give guidance on how to 
> diagnose or prevent this?
> 
> Here are some config settings:
> CentOS 5.7 
> Java 1.6
> TDS 4.3.17
> only WMS is enabled
> Java -Xmx set to 8Gb (currently taking 5.3, the dataset is 600 Gb of 
> 30-arcsecond grids for the continental US, 3.4 Gb per file)
> For better or worse we are configured to use 2 instances of TDS to keep the 
> catalogs and configuration isolated. I’m not sure if this matters, but I 
> didn’t want to omit it. Since it is a live server I can’t easily change to 
> the preferred proxy configuration.
> 
> I am trying not to panic yet. However, if the server goes unresponsive again, 
> staying calm may no longer be an option.
> 
> Jay Alder
> US Geological Survey
> Oregon State University
> 104 COAS Admin Building
> Office Burt Hall 166
> http://ceoas.oregonstate.edu/profile/alder/
> 
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 
> 
> 
> 
> -- 
> Gerry Creager
> NSSL/CIMMS
> 405.325.6371
> ++++++++++++++++++++++
> “Big whorls have little whorls,
> That feed on their velocity; 
> And little whorls have lesser whorls, 
> And so on to viscosity.” 
> Lewis Fry Richardson (1881-1953)

Jay Alder
US Geological Survey
Oregon State University
104 COAS Admin Building
Office Burt Hall 166
http://ceoas.oregonstate.edu/profile/alder/

Jay Alder
US Geological Survey
Oregon State University
104 COAS Admin Building
Office Burt Hall 166
http://ceoas.oregonstate.edu/profile/alder/

References:
- [thredds] TDS performance issues on a production server
  - From: Jay Alder
- Re: [thredds] TDS performance issues on a production server
  - From: Gerry Creager - NOAA Affiliate

2013 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: