The server is CentOS 5.7, with 48 Gb main memory using Sun/Oracle Java 1.6 and
tomcat 6.
Our catalog is pretty small, with only a couple hundred datasets. With such a
small catalog I setup our machine to restart tomcat each night, which only
takes about 2 seconds. Spiking 12 CPUs to 100% for over a minute is still
curious and bogs down response time when it occurs. Hopefully the nightly
restarts will prevent TDS from becoming completely unresponsive.
On Dec 12, 2013, at 9:47 AM, Gerry Creager - NOAA Affiliate
<gerry.creager@xxxxxxxx> wrote:
> Jay,
>
> What OS, release and Tomcat version are you running? I've seen a similar
> issue on another piece of software (Ramadda). Since I've seen this behavior
> with the standalone server and the tomcat-based server, I'm beginning to
> suspect my Java installation, but have not had sufficient time to investigate
> yet.
>
> There may be an OS correlation here, so I'm interested. I'm running RHEL 6
> and the various updated flavors of OpenJDK and Tomcat6.
>
> gerry
>
>
>
>
> On Wed, Dec 11, 2013 at 4:33 PM, Jay Alder <alderj@xxxxxxxxxxxxxxxxxxxx>
> wrote:
> Hi, we’ve recently released a web application that uses TDS for mapping,
> which is getting a lot of traffic. At one point the server stopped responding
> altogether, which is a major problem. A quick restart of tomcat got it going
> again, so I’m starting to dig into the logs. We normally get the GET /
> request complete behavior, but occasionally we’ll have:
>
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> GET …url…
> GET …url…
>
> meanwhile having a 100% CPU spike (with 12 CPUs) for a minute or more
>
> request compete
> request compete
> request compete
> request cancelled by client
> request cancelled by client
> request compete
> request compete
>
> While watching the logs the few times I’ve seen this occur it seems to pull
> out of it ok. However the time the server failed, requests were never
> returned. From the logs, requests came in for roughly 40 minutes without
> being completed. Unfortunately do to the high visibility we started to get
> emails from users and the press about the application no longer working.
>
> Has anyone experienced this before and/or can you give guidance on how to
> diagnose or prevent this?
>
> Here are some config settings:
> CentOS 5.7
> Java 1.6
> TDS 4.3.17
> only WMS is enabled
> Java -Xmx set to 8Gb (currently taking 5.3, the dataset is 600 Gb of
> 30-arcsecond grids for the continental US, 3.4 Gb per file)
> For better or worse we are configured to use 2 instances of TDS to keep the
> catalogs and configuration isolated. I’m not sure if this matters, but I
> didn’t want to omit it. Since it is a live server I can’t easily change to
> the preferred proxy configuration.
>
> I am trying not to panic yet. However, if the server goes unresponsive again,
> staying calm may no longer be an option.
>
> Jay Alder
> US Geological Survey
> Oregon State University
> 104 COAS Admin Building
> Office Burt Hall 166
> http://ceoas.oregonstate.edu/profile/alder/
>
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
>
> --
> Gerry Creager
> NSSL/CIMMS
> 405.325.6371
> ++++++++++++++++++++++
> “Big whorls have little whorls,
> That feed on their velocity;
> And little whorls have lesser whorls,
> And so on to viscosity.”
> Lewis Fry Richardson (1881-1953)
Jay Alder
US Geological Survey
Oregon State University
104 COAS Admin Building
Office Burt Hall 166
http://ceoas.oregonstate.edu/profile/alder/