Greetings Shane,
Sorry for the delay. I'll answer in-line:
> * Without trying this out at all or doing more than a cursory read of the PR
> (I know, sorry), it looks like the S3 support is limited to single file
> datasets (no aggregation support). This makes sense, but are there any
> existing approaches/best practices for aggregating S3 nc files or other
> RandomAccessFiles, either internal to THREDDS or external (maybe some kind of
> index engine)?
At the moment, yes, only single file datasets for S3. The plan is to
support our various types of aggregations (NcML, FMRC, Grib, and a
work-in-progress general Grid aggregation) of multiple files stored on
an object store, much like we do for on disk random access files. I
plan on implementing a simple S3 blob indexer, but that might be tough
to do performantly , at least as a one size fits all. I'd be
interested in working to come up with an interface that could support
an external index engine, but the first pass will be based on a
simplistic list bucket type of call.
> * Do you expect that this will work with any S3 compatible API (MinIO,
> Openstack Swift3, etc), or does it assume some AWS specifics?
At this point we're using the AWS SDK (v2) to manage credentials, http
calls, etc. I've heard tales of people using the AWS SDK (V1) to call
into other S3 compatible stores, but I do not know of the specifics
and do not have access to any of those systems to test myself.
However, it would not be much code to implement a new Remote
RandomAccessFile provider like I did with S3 - you basically have to
code up the call needed to make a range request by implementing a
ReadableRemoteFile:
https://github.com/Unidata/netcdf-java/blob/master/cdm/core/src/main/java/ucar/unidata/io/ReadableRemoteFile.java
It's all pretty fresh code, but I'd be happy to talk details if you're
interested.
Cheers,
Sean
> Thanks!
> Shane St Savage
> Axiom Data Science
>
> On Wed, Mar 4, 2020 at 10:56 AM David Neufeld via thredds
> <thredds@xxxxxxxxxxxxxxxx> wrote:
>>
>> Hi Ethan,
>>
>> That's great news!
>>
>> By way of follow-up can you share any forward looking plans the team may
>> have related to zarr support in the future? Is this part of a roadmap for
>> thredds, or is the expectation that netcdf developers can leverage the s3
>> support and then write to zarr as part of their own workflow?
>>
>> Thanks,
>> Dave
>>
>>
>> On Wed, Mar 4, 2020 at 9:53 AM Ethan Davis <edavis@xxxxxxxx> wrote:
>>>
>>> Hi Joe,
>>>
>>> [Sorry for the delayed response.]
>>>
>>> The S3 work moved to the Unidata/netCDF-java repo in PR #173 ("S3
>>> Support"). This PR got merged into master a week or so ago and is available
>>> in the netCDF-Java 5.3.0-SNAPSHOT release (and will be in the upcoming
>>> 5.3.0 release). The latest TDS code built with netCDF-Java 5.3.0-SNAPSHOT
>>> can be configured to serve an individual netCDF file stored as an S3 object
>>> using a datasetRoot configuration, e.g.
>>>
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>>
>>> <catalog name="Test TDS S3"
>>>
>>> xmlns="https://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
>>>
>>> xmlns:xlink="https://www.w3.org/1999/xlink"
>>>
>>> xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
>>>
>>>
>>> xsi:schemaLocation="https://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0
>>>
>>> https://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.6.xsd">
>>>
>>>
>>> <datasetRoot path="s3-test" location="s3://noaa-goes16" />
>>>
>>>
>>> <dataset name="Test GOES-16 S3" ID="testS3Grid"
>>>
>>>
>>> urlPath="s3-test/ABI-L1b-RadC/2019/363/21/OR_ABI-L1b-RadC-M6C16_G16_s20193632101189_e20193632103574_c20193632104070.nc"
>>>
>>> dataType="Grid"/>
>>>
>>>
>>> </catalog>
>>>
>>>
>>> In this case, the datasetRoot location is the bucket name, and the urlPath
>>> is the datasetRoot path combined with the key. We rely on the AWS Java SDK
>>> (v2) to handle credentials, setting of region, etc. For now, you can set
>>> the region by creating a credentials file ~/.aws/credentials that looked
>>> like:
>>>
>>>
>>> [default]
>>>
>>> region=us-east-1
>>>
>>>
>>> Which is how netCDF-java knows which region to use for bucket access. We
>>> may look at other mechanisms to make that a bit more integrated into TDS
>>> configuration but for now that should work.
>>>
>>>
>>> Once the netCDF 5.3.0 release comes out, TDS snapshot builds will be built
>>> with this capability. For now, you would need to build the TDS and
>>> explicitly tell it to build with netCDF-Java 5.3.0-SNAPSHOT.
>>>
>>> Cheers,
>>>
>>> Ethan
>>>
>>> On Tue, Feb 4, 2020 at 2:30 PM H. Joe Lee <hyoklee@xxxxxxxxxxxx> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Is it possible to serve netCDF data on AWS S3 using THREDDS?
>>>> I think it seems possible based on the S3 feature branch [1].
>>>>
>>>> If so, can someone share an example THREDDS catalog configuration?
>>>>
>>>> Regards,
>>>>
>>>> [1] https://github.com/Unidata/thredds/tree/feature/s3+hdfs
>>>>
>>>>
>>>> _______________________________________________
>>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>>> recorded in the Unidata inquiry tracking system and made publicly
>>>> available through the web. Users who post to any of the lists we
>>>> maintain are reminded to remove any personal information that they
>>>> do not want to be made public.
>>>>
>>>>
>>>> thredds mailing list
>>>> thredds@xxxxxxxxxxxxxxxx
>>>> For list information or to unsubscribe, visit:
>>>> https://www.unidata.ucar.edu/mailing_lists/
>>>
>>> _______________________________________________
>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>> recorded in the Unidata inquiry tracking system and made publicly
>>> available through the web. Users who post to any of the lists we
>>> maintain are reminded to remove any personal information that they
>>> do not want to be made public.
>>>
>>>
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe, visit:
>>> https://www.unidata.ucar.edu/mailing_lists/
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> thredds mailing list
>> thredds@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> https://www.unidata.ucar.edu/mailing_lists/
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> https://www.unidata.ucar.edu/mailing_lists/