Re: [thredds] THREDDS Data Server serving from Amazon S3

To throw my two cents in here, I’m going to reiterate Roy’s comment that S3 
virtual file systems are SLOW. 

It’s an object store, so anything that goes in must come out in whole to be 
accessed. The idea of reading the coordinate variables of a file or just get 
the attributes would mean moving the whole file to your local EBS or at least 
memory. While there may be a model that would work with S3 it would require 
data to be broken down into individual objects that support typical dataset 
access patterns. 

- Dave

On Jul 14, 2015, at 2:21 PM, Roy Mendelssohn - NOAA Federal 
<roy.mendelssohn@xxxxxxxx> wrote:

> Look at s3fs (Bob Simons alerted me to this).  It makes a virtual file system 
> for S3.  It is also S…L…O…W.
> 
> -Roy
>> On Jul 14, 2015, at 1:00 PM, Jeff McWhirter <jeff.mcwhirter@xxxxxxxxx> wrote:
>> 
>> 
>> Glacier could be used for storage of all that data that you need to keep 
>> around but rarely if ever access  - e.g., level-0 instrument output, raw 
>> model output,  etc. If your usage model supports this type of latency then 
>> the cost savings (1/10th) are significant
>> 
>> This is where hiding the storage semantics behind a file system breaks down. 
>> The application can't be agnostic of the underlying storage as they need to 
>> support delays in staging data, communicating to the end-user, caching, etc.
>> 
>> -Jeff
>> 
>> 
>> 
>> On Tue, Jul 14, 2015 at 1:35 PM, Robert Casey <rob@xxxxxxxxxxxxxxxxxxx> 
>> wrote:
>> 
>>      Hi Jeff-
>> 
>>      Of note, Amazon Glacier is meant for infrequently needed data, so a 
>> call-up for data from that source will require something on the order of a 5 
>> hour wait to retrieve to S3.  I think they are developing a near-line 
>> storage solution that is a bit more expensive to compete with Google's new 
>> near-line storage, which provides retrieval times on the order of seconds.
>> 
>>      -Rob
>> 
>>> On Jul 14, 2015, at 10:10 AM, Jeff McWhirter <jeff.mcwhirter@xxxxxxxxx> 
>>> wrote:
>>> 
>>> On this note -
>>> What I really want is a file system that can transparently manage  data 
>>> between primary (SSD), secondary (S3) and tertiary (Amazon Glacier)  
>>> stores.  Actively used data would migrate into primary storage. Old 
>>> archived data moves off into cheaper tertiary storage. I've thought of 
>>> implementing this at the application level in RAMADDA but a file system 
>>> based approach would be much smarter.
>>> 
>>> How do the archive folks on this list manage these kinds of storage 
>>> environments?
>>> 
>>> -Jeff
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Jul 14, 2015 at 10:44 AM, John Caron <caron@xxxxxxxx> wrote:
>>> Hi David:
>>> 
>>> At the bottom of the TDM, we rely on RandomAccessFile. Do you know if S3 
>>> supports that abstraction (essentially posix file semantics, eg seek(), 
>>> read()) ? My guess is that S3 only allows complete file transfers (?)
>>> 
>>> Would be worth investigating if anyone has implemented a java 
>>> FileSystemProvider for S3.
>>> 
>>> Will have a closer look when i get time.
>>> 
>>> John
>>> 
>>> On Mon, Jul 13, 2015 at 7:59 PM, David Nahodil <David.Nahodil@xxxxxxxxxxx> 
>>> wrote:
>>> Hi all,
>>> 
>>> 
>>> We are looking at moving our THREDDS Data Server to Amazon EC2 instances 
>>> with the data hosted on S3. I'm just wondering if anyone has tried using 
>>> TDS with data hosted on S3?
>>> 
>>> 
>>> I had a quick back-and-forth with Sean at Unidata (see below) about this.
>>> 
>>> 
>>> Regards,
>>> 
>>> 
>>> David
>>> 
>>> 
>>>>> Unfortunately, I do not know of anyone who has done this, although we 
>>>>> have had at lease one other person ask. From what I understand, there is 
>>>>> a way to mount an S3 storage as a virtual file system, in which case I 
>>>>> would *think* that the TDS would work as it normally does (depending on 
>>>>> the kind of data you have).
>>> 
>>>> We have considered mounting the S3 storage as a filesystem and running it 
>>>> like that. However, our feeling was that the tools were not really 
>>>> production ready and that we're really misrepresenting S3 by pretending it 
>>>> is a file system. So this is why we're investigating if anyone has used 
>>>> TDS with the S3 API directly.
>>> 
>>>>> What kind of data do you have? Will your TDS also be in the cloud? Do you 
>>>>> plan on serving the data inside of amazon to other EC2 instances, or do 
>>>>> you plan on crossing the cloud/commodity web boundary with the data, in 
>>>>> which case that could get very expensive quite quickly?
>>> 
>>>> We have about 2 terabytes of marine and climate data that we are currently 
>>>> serving from our existing infrastructure. The plan is to move the 
>>>> infrastructure to Amazon Web Services so TDS would be hosted on EC2 
>>>> machines and the data on S3. We're hoping this setup should work okay, but 
>>>> we might still have a hurdle or two to come. :)
>>> 
>>>> We have someone here who once wrote a plugin/adapter for TDS to work with 
>>>> an obscure filesystem that our data used to be stored on. So we have a 
>>>> little experience in what might be involved in what might be involved for 
>>>> doing the same with S3. We just wanted to make sure that if anyone had 
>>>> done some work already that we made use of that.
>>> 
>>>>> We very, very recently (as in a day ago) got some Amazon resources to 
>>>>> play around on, but we won't have a chance to kick those tires until 
>>>>> after our training workshops at the end of the month.
>>> 
>>> 
>>> University of Tasmania Electronic Communications Policy (December, 2014). 
>>> This email is confidential, and is for the intended recipient only. Access, 
>>> disclosure, copying, distribution, or reliance on any of it by anyone 
>>> outside the intended recipient organisation is prohibited and may be a 
>>> criminal offence. Please delete if obtained in error and email confirmation 
>>> to the sender. The views expressed in this email are not necessarily the 
>>> views of the University of Tasmania, unless clearly intended otherwise.
>>> 
>>> 
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit: 
>>> http://www.unidata.ucar.edu/mailing_lists/ 
>>> 
>>> 
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit: 
>>> http://www.unidata.ucar.edu/mailing_lists/ 
>>> 
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit: 
>>> http://www.unidata.ucar.edu/mailing_lists/
>> 
>> 
>> _______________________________________________
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit: 
>> http://www.unidata.ucar.edu/mailing_lists/
> 
> **********************
> "The contents of this message do not reflect any position of the U.S. 
> Government or NOAA."
> **********************
> Roy Mendelssohn
> Supervisory Operations Research Analyst
> NOAA/NMFS
> Environmental Research Division
> Southwest Fisheries Science Center
> ***Note new address and phone***
> 110 Shaffer Road
> Santa Cruz, CA 95060
> Phone: (831)-420-3666
> Fax: (831) 420-3980
> e-mail: Roy.Mendelssohn@xxxxxxxx www: http://www.pfeg.noaa.gov/
> 
> "Old age and treachery will overcome youth and skill."
> "From those who have been given much, much will be expected" 
> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/



  • 2015 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: