Nicolas,
> I'm thinking about using TDS as the main gateway for data retrieval.
> However, I never used tomcat before, and I'm quite worried about Java
> overhead / memory footprint.
This will greatly depend on your dataset size, and which THREDDS
services you plan on enabling (NCSS, WMS, etc.). For best performance
the amount of RAM to allocate for Java/Tomcat should be greater than
your largest single/group unit of data. We (HYCOM.org) are typically
serving out very large global ocean data files (each of our daily
salinity, temp, uvel, and vvel files are ~2GB). When these files are
aggregated together into a single day (which we do) you get a memory
footprint of >8GB (the NCSS service requires all of these files to be
loaded into RAM). We have a lot of RAM in our data servers (at least
32GB), and we allocate as much of the system RAM as we possibly can
for THREDDS (nearly all of it).
> So, is TDS suitable for serving data to many clients ?
> Can you gives me samples of requests rate / bandwidth / hardware / cpu
> charge / memory usage / io bottleneck you have on your servers ?
Yes. We have a single THREDDS server for dozens of clients. Outgoing
bandwidth is between 150~350GB/day. The bottleneck we've seen is
memory and disk I/O bound (reading source data quickly). Don't have
many CPUs (like the 64-core bulldozer), have 8~16 very fast cores for
THREDDS. Tomcat uses a lot of RAM, and uses it heavily. Allocate a lot
of it and make your THREDDS server a single purpose appliance. We've
also seen better performance when we don't use apache+tomcat with the
proxy (like the documentation recommends). We run tomcat on port 80,
which eliminated our apache+tomcat request timeouts (tomcat and apache
have different timeout values).