On Sat, Oct 8, 2016 at 3:22 AM, ashwinD12 . <winash12@xxxxxxxxx> wrote:
> Also since many of us maybe downloading many sets of variables over many
> levels and times perhaps a couple of examples can focus on whether it would
> be worthwhile to send these OPeNDAP requests using multi threaded python.
> The script can check to see if there are multiple processors and then
> retrieve data using different threads.
>
> As an example I download from NCEP reanalysis 2 data six variables every
> day - specific humidity, temperature, u,v velocities,geopotential height
> surface pressure. That is 86 calls (17 levels * 5 variables) . So if I have
> more than one processor I can send them off as batches to different
> processors.
>
Ashwin,
Thanks for the suggestion!
In this case, though, I'm not sure I really want to be promoting parallel
requests. Downloading the data is not limited by your CPU, but by the
available bandwidth on your computer, the network itself, and on the remote
server. Also, multiple requests are only going to help hide latency. This
is good for hitting a web page, but for data of any significant size (MBs),
the time to fill the request is dominated by transfer time and parallel
requests won't improve things greatly. In the case of THREDDS servers like
Unidata runs, there are also lots of people hitting the server; teaching
people to make a bunch of requests at once will only serve to increase the
load on the server. It's not really friendly to access a shared resource in
that manner.
For the specific case of downloading data for multiple variables from a
TDS, the netCDF Subset Service (NCSS) is a good bet for this use case. This
service will assemble the data for the variables you request (for your
desired time/space domains) into a netCDF file and then return
that--eliminating the number of requests.
Ryan
--
Ryan May, Ph.D.
Software Engineer
UCAR/Unidata
Boulder, CO