Hi John, Robert,
Thanks very much for your replies.
From what I understand now, TDS would take advantage of GPFS to
optimize request prcessing between several users (in different threads
reading files in parallel) if the GPFS is parameterize to work like
HADOOP (using IBM's FPO). For a single user request the performances
would be roughly equivalent (if files are read sequentially by TDS).
We can test this. We'll let you know.
In addition, we'll investigate to read a long list of netcdf files in
parallel in different threads (5, 10, more ?) and see. We can do it in
standalone benchmark or in one of our application server (oceanotron).
We'll let you know about this as well.
Thomas
On 03/01/2016 06:36 PM, Robert Casey wrote:
Hi Thomas and John-
From what I have been able to gather, GPFS is a parallel cluster that
behaves according to POSIX standards and looks to an OS just like any
other file mount. You should be able to use all of the same file I/O
commands you already use. Not aware of any specialized enhancements.
All of the I/O libraries for GPFS appear to be very low level. It's
optimized for fast parallel reads and writes and parallelizes the
metadata servers to each disk node as well, which is much more capable
than even Parallel NFS. Looks like it is a good alternative to using
HDFS based on this article.
http://www.datanami.com/2014/02/18/what_can_gpfs_on_hadoop_do_for_you_/
As they suggest, you can get Hadoop like behavior on GPFS by using
IBM's File Placement Optimization (FPO), mapping compute cycles to
each of the data nodes in parallel.
-Rob
On Mar 1, 2016, at 8:57 AM, John Caron <jcaron1129@xxxxxxxxx
<mailto:jcaron1129@xxxxxxxxx>> wrote:
Hi Thomas:
TDS uses standard Java interfaces to the filesystem, so it wouldnt be
taking advantage of anything that needed special commands. Both the
netcdf library and TDS are thread-safe, so can scale up to large
number of simultaneous requests, so it seems likely that a clustered
Tomcat environment would work well.
Perhaps by distributing data correctly over data nodes, significant
improvements might be possible. So much depends on access patterns,
so a good way to proceed would be to create a synthentic load (eg
script a bunch of requests to the TDS) that mimics what you expect
users to need, and measure performance as you modify your system.
I dont know enough about GPFS to know what features could be used to
go beyond what you get from posix API. Anyone else?
John
On Thu, Feb 25, 2016 at 2:27 AM, Thomas LOUBRIEU
<thomas.loubrieu@xxxxxxxxxx <mailto:thomas.loubrieu@xxxxxxxxxx>> wrote:
Dear all,
In our data center, the new high-performance clustered file
system we're going to use is GPFS (General Parallel File System).
I am wondering is java-netcdf library or thredds data server can
take benefit of this high performance file system if the netcdf
files are stored on it ?
Are you aware of work being done or systems working with GPFS or
otherwise on similar high performance systems (HDFS, moosefs,
...). I am definitely not an expert and any information regarding
reading netcdf in java on these clustered file system (preferably
GPFS) would help us very much.
Thanks,
Thomas
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx <mailto:thredds@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx <mailto:thredds@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/