Siphon 0.5.0 has been released with a few improvements and features:
- The datasets and catalog references can now be grabbed from their collections by position (index) (as well as by name).
- Collections of datasets and catalog references now have helper functions that allow extracting a time range or item closest to a time, assuming the entries have appropriately formatted times in the names.
- Datasets gained functions that simplify setting up access over various TDS services
- A catalog with a latest dataset now has a
latest
attribute that points directly to this dataset
Full releases notes are available on the GitHub Release page
Siphon packages are available for Conda on the conda-forge channel, and for pip from the Python Package Index.
Let us know if you run into any problems, either at Siphon's issue tracker, or on the Unidata python-users list.
Specific examples of new APIs
Two of the main improvements to the Siphon API are access to the collection of datasets directly by numeric index and simplified methods for using different data access methods. So before in Siphon one might do:
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/'
'NCEP/GFS/Global_0p25deg/catalog.xml')
ds = list(cat.datasets.values())[0]
ncss = NCSS(ds.access_urls['NetcdfSubset'])
This becomes:
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/'
'NCEP/GFS/Global_0p25deg/catalog.xml')
ds = cat.datasets[0]
ncss = ds.subset()
Similarly, for OPeNDAP or CDMRemote access, you now can do:
nc = ds.remote_access()
where nc
is a netCDF4-python Dataset
object (or similar for CDMRemote).
By default this uses CDMRemote where available (since it's built into Siphon),
but will fall-back to OPeNDAP (or can be manually selected).
There is also support for getting a file-like object for accessing the raw data using HTTP, or just downloading the file locally:
fobj = ds.remote_open()
# Download locally
ds.download('local/file/path')
Siphon has also simplified access to the automatically resolved latest dataset identified on THREDDS servers. Previously, this involved manually finding the latest within the collection of datasets, or using the helper function as:
latest_opendap = get_latest_access_url('http://thredds.ucar.edu/thredds/catalog/grib/'
'NCEP/GFS/Global_0p25deg/catalog.xml', 'OPENDAP')
nc = Dataset(latest_opendap)
This now becomes:
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/'
'NCEP/GFS/Global_0p25deg/catalog.xml')
nc = cat.latest.remote_access()
Siphon has also gained the ability to filter particular datasets from those in the catalog using dates and times. This relies on extracting times from the names using an assumed time format (defaults to YYYYMMDD_HHMM). So now users can do:
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/'
'NCEP/GFS/Global_0p25deg/catalog.xml')
# Find the run closest to 6 hours ago
time = datetime.utcnow() - timedelta(hours=6)
ds = cat.filter_time_nearest(time)
# Find all runs from the last day
end = datetime.utcnow()
start = end - timedelta(days=1)
datasets = cat.filter_time_range(start, end)
It is possible to pass a custom regular expression to support other time formats.