Re: [netcdf-java] [thredds] improved performances through GPFS

        Hi Thomas and John-

        From what I have been able to gather, GPFS is a parallel cluster that 
behaves according to POSIX standards and looks to an OS just like any other 
file mount.  You should be able to use all of the same file I/O commands you 
already use.  Not aware of any specialized enhancements.  All of the I/O 
libraries for GPFS appear to be very low level.  It's optimized for fast 
parallel reads and writes and parallelizes the metadata servers to each disk 
node as well, which is much more capable than even Parallel NFS.  Looks like it 
is a good alternative to using HDFS based on this article.

        http://www.datanami.com/2014/02/18/what_can_gpfs_on_hadoop_do_for_you_/ 
<http://www.datanami.com/2014/02/18/what_can_gpfs_on_hadoop_do_for_you_/>

        As they suggest, you can get Hadoop like behavior on GPFS by using 
IBM's File Placement Optimization (FPO), mapping compute cycles to each of the 
data nodes in parallel.

        -Rob
 

> On Mar 1, 2016, at 8:57 AM, John Caron <jcaron1129@xxxxxxxxx> wrote:
> 
> Hi Thomas:
> 
> TDS uses standard Java interfaces to the filesystem, so it wouldnt be taking 
> advantage of anything that needed special commands. Both the netcdf library 
> and TDS are thread-safe, so can scale up to large number of simultaneous 
> requests, so it seems likely that a clustered Tomcat environment would work 
> well.
> 
> Perhaps by distributing data correctly over data nodes, significant 
> improvements might be possible. So much depends on access patterns, so a good 
> way to proceed would be to create a synthentic load (eg script a bunch of 
> requests to the TDS) that mimics what you expect users to need, and measure 
> performance as you modify your system.
> 
> I dont know enough about GPFS to know what features could be used to go 
> beyond what you get from posix API. Anyone else?
> 
> John
> 
> On Thu, Feb 25, 2016 at 2:27 AM, Thomas LOUBRIEU <thomas.loubrieu@xxxxxxxxxx 
> <mailto:thomas.loubrieu@xxxxxxxxxx>> wrote:
> Dear all,
> 
> In our data center, the new high-performance clustered file system we're 
> going to use is GPFS (General Parallel File System). I am wondering is 
> java-netcdf library or thredds data server can take benefit of this high 
> performance file system if the netcdf files are stored on it ?
> 
> Are you aware of work being done or systems working with GPFS or otherwise on 
> similar high performance systems (HDFS, moosefs, ...). I am definitely not an 
> expert and any information regarding reading netcdf in java on these 
> clustered file system (preferably GPFS) would help us very much.
> 
> Thanks,
> 
> Thomas
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx <mailto:thredds@xxxxxxxxxxxxxxxx>
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 
> <http://www.unidata.ucar.edu/mailing_lists/>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/

  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: