Hi Antonio,
Thank you for sharing! Do you mind also share the server config, cpu, ram etc.
that runs TDS?
Guan
----- Original Message -----
From: "Antonio S. Cofiño" <cofinoa@xxxxxxxxx>
To: thredds@xxxxxxxxxxxxxxxx
Cc: "y kudo" <y_kudo@xxxxxxxxxxxx>
Sent: Friday, February 26, 2016 8:52:33 AM
Subject: Re: [thredds] TDS as a big data platform
Yoshi,
Below my expertise on TDS (v4.3 and v4.6)
El 19/02/2016 a las 7:26, Yoshiyuki Kudo escribió:
> Hi,
>
> I am in a project where bunch of EO data researchers will use some data
> access services for an attempt to create new data products out of the wealth
> of the data pool. The data will be EO data (coverage data) in netCDF, some
> GBytes per data granule, and will amount to over 120TB, 0.3 million data
> files in total (1 year worth of collection).
>
> I feel TDS or Hyrax can be a good candidate for this platform, but would like
> to hear your advice before further estimation of work and hardware purchase.
> I very much appreciate your expertise on this.
>
> 1) I see some historical threads about how aggregation of large volumes of
> data can be problematic. I will need to consider the aggregation as well,
> but is the 100TB+ aggregation possible ? Both technically and performance
> wise ?
We have an operational service which aggregate collections of datasets.
One of the aggregations consist in 135k files in GRIB1 format and 13TB
of data. Another collection is based on 300k+ files but 8TB on size.
This collections are aggregated in just one NetCDF entity using a NCML,
each one. The 100TB+ of aggregation will be possible, but the limit will
be the performance because the amount of files.
>
> 2) Is there any HW restriction for the TDS set up I should have in mind
> before preparing the HW ? Do I need to have a single disk drive (partition)
> for the 100+TB data management in TDS ?
No, you don't need to have just one partition. But In our case we have
400TB of disk based in ZFS (OpenIndiana) using a pool of 150 desktop
HDDs, using a configuration of raidz2 vdev (10+2 disks). For TDS
services we are using a load-balanced configuration with TDS instances
running in a cluster.
>
> 3) Could you share any success story you know of, about handling large
> volumes of data in a TDS ?
https://rd-alliance.org/sites/default/files/attachment/20150924_Day2_1330_End-userGatewayForClimateServicesAndDataInitiatives_Cofino.pdf
>
> 4) Any other recommendation or things I need to keep in mind ?
We considered, at the beginning, dynamic aggregation based on scan
directory facilities provided by TDS, but at the end it didn't perform
well, and what are we doing is generate static ncml aggregations.
>
> Thank you so much for your support.
Please feel free to ask.
Regards
Antonio
--
Antonio S. Cofiño
Grupo de Meteorología de Santander
Dep. de Matemática Aplicada y
Ciencias de la Computación
Universidad de Cantabria
http://www.meteo.unican.es
> Yoshi
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/