Re: [thredds] Fwd: Fwd: [uaf_tech] Re: time start / end

My answers are interspersed below...

On 6/15/2010 4:29 PM, John Caron wrote:
On 6/15/2010 3:04 PM, John Caron wrote:
This is a message from Bob Simons, the developer of ERDDAP, that I
think is of general interest, in case others want to comment also.

-------- Original Message --------
Subject:        Fwd: [uaf_tech] Re: time start / end
Date:   Fri, 11 Jun 2010 09:58:50 -0700
From:   Bob Simons <Bob.Simons@xxxxxxxx>
Organization:   NOAA/ERD
To:     John Caron <caron@xxxxxxxxxxxxxxxx>



In case you aren't on the uaf_tech mail list, this is my pitch for
adding a subscription service to THREDDS. (I understand you are very
busy and this is likely low priority.)


We have talked about a notification service. We made a proposal to NSF
thats related, havent heard if it will be funded. If so, we will at
least consider the possibility of this feature.

There are some tricky parts, for example, we have a IDD metar dataset
that is appended several times per second. In that case you really want
a message service whose socket stays open. For very small data, you
might as well send the data instead of a notification.

Im wondering what kinds of datasets you tried, their frequency, message
size, etc. What would your canonical "use case" be?

Most Common Use -
Most of the THREDDS-like datasets I have worked with are updated roughly every hour, every day, every month, or very infrequently. For these, and even for datasets updating as frequently as about every 10 minutes, a subscription system is very efficient, very useful, and works well. For these datasets, metadata with an exact time
<attribute name="time_coverage_end" value="2010-06-16T08:00:00"/>
seems appropriate and not burdensome for the server to maintain since the server knows what data it has.
ERDDAP's system supports:
* Pinging a remote URL and not getting the reply. There is no separate message.
* Sending an email. The message size is ~400 - 1000 bytes.
But clearly this subscription system would become troublesome if the dataset updated more frequently and/or if there were a huge number of datasets (e.g., one for each of 10,000 sensors).

This approach lets clients find out about dataset changes quickly and efficiently and vastly reduces the load on the server from clients that are constantly checking if a dataset has been updated.


Frequently Updated Datasets -
The datasets I have seen that update frequently (as often as every 6 seconds) have tended to be sensor readings which produce a scalar value (or a few values) at each time point. OPeNDAP hyperslab requests ([start:stop:stride]) seem inappropriate for this type of data because: if a client sends a request for the time axis values in order to see what times are available, then the time axis is likely to have changed by the time the client sends a request for data for a specific time. It sort of works, but it is awkward.

For these frequently updated datasets, it seems like the data is often stored in a database or some other easily updated data structure. And it seems like the more suitable way to request data is via OPeNDAP constraint expressions (e.g., &time>="2010-06-16T00:00:00Z"). Then the client can just ask for any recent data (e.g., &time>="2010-06-16T00:00:00Z") and doesn't need to know the exact end time. It would be nice if there were an option to request data for the last available time (e.g, &time="last").
For these datasets, it seems like the metadata
<attribute name="time_coverage_end" value="present"/>
<attribute name="time_coverage_frequency" value="PT6S"/>
(stating the intent) are more appropriate than a specific time which changes very frequently. If the sensor stops generating data, I admit that "present" can be misleading and at some point it should be replaced by the time for the last datum. By using "present" for frequently updated datasets (and datasets where the client doesn't need to know the exact end time in order to make the data request), we can avoid the need for frequent uses of the subscription system.

Data Push -
I agree that for datasets that are updated frequently with a small amount of data, that the subscription service might as well transmit the new data. Perhaps a subscription service shouldn't be used for very frequently updated datasets or for pushing data, and these datasets should fall into the realm of data-push system like IDD. Given the huge number of sensors (now and coming), I think someone will propose a system to deal with this (if they haven't already). But I think it would be good if the system didn't simplistically transmit all data for all sensors to all clients. I think that for many use cases, it makes more sense for the client to request the desired data as needed, less frequently. Yes, a few people really need a continuously updated graph (while they are actively watching it), but I suspect that most just need to see an up-to-date graph a few times a day. Given the potentially *huge* effort to push sensor data to clients, I hope that it is just done when it is really needed and when there aren't better alternatives.



P.S. I also wonder if there is another solution.  Why does THREDDS need
metadata from the catalog.xml files to determine the dataset's start and
end time? Doesn't it have this information from when it does the
dataset's aggregation?

It has the info when the data is a "feature type" like a Grid. In the
general case, it doesnt know how to extract the coordinate info from a
random CDM file.

For that matter, it seems like a lot of the metadata we put in the
catalog.xml files could be gathered by THREDDS from the dataset's
metadata and data.  Could THREDDS be modified to use the dataset's
metadata and data so we wouldn't have to duplicate the information in
the catalog.xml files?


Yes, we are working on that (albeit slowly). The FMRC adds this kind of
metadata, and we are trying do point datasets (now called discrete
sampled datasets in CF) next.

Again, I don't think grids and hyperslab requests are ideal for these types of datasets. Tables and constraint requests seem more suitable.



Thank you.

-------- Original Message --------
Subject: [uaf_tech] Re: time start / end
Date: Thu, 10 Jun 2010 10:42:03 -0700
From: Bob Simons<Bob.Simons@xxxxxxxx>
Organization: NOAA/ERD
To: _OAR PMEL UAF Tech List<uaf_tech@xxxxxxxx>

My $0.02:

The Ideal - The ideal situation is to have Start and End have specific
dates and times, e.g.,
  Start: 2010-06-03 12:00:00Z
  End: 2010-06-10 12:00:00Z
and to have this always perfectly up-to-date.

Statement of Intent - Something like
   Start: present - 7 days
   End: present
is pretty good as is, because it is a statement of intent.

Not Really Right - If it gets translated to some instantaneous values, e.g.,
  Start: 2010-06-03 12:04:57Z
  End: 2010-06-10 12:04:57Z
then it is less desirable. It implies accuracy and precision, but isn't
correct (e.g., perhaps the dataset is just updated daily sometime every
morning).

Polling - Having a downstream server (e.g., RAMADA or ERDDAP) frequently
check with TDS to find out the actual Start and End times isn't ideal.
The extremes cases are
* The downstream server polls infrequently, and so is usually way
out-of-date.
* The downstream server polls frequently, and so is closer, but never
perfectly up-to-date. The problem with polling is: if lots of downstream
servers are polling 100's of datasets frequently, it can be a burden on
the TDS.  So polling is never an ideal solution.
(Note that one implementation of polling is RSS.)

Subscriptions: it would be great if TDS had a subscription service, so
that TDS would automatically send an email or ping some client specified
URL whenever the dataset changed. This is *much* more efficient than
polling, and the downstream servers are notified within seconds when a
dataset changes. With subscriptions, the downstream server could display
accurate (up-to-date) and precise Start and End times.  And people would
find other uses for a general purpose subscription system.

As an example of how subscriptions are useful:
ERDDAP has
* A subscription system
   (http://coastwatch.pfeg.noaa.gov/erddap/information.html#subscriptions
   and more specifically
   http://coastwatch.pfeg.noaa.gov/erddap/subscriptions/add.html)
* A flag system
   (http://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#flag).
If one ERDDAP is pointing to a dataset at a remote ERDDAP, it subscribes
to the remote ERDDAP's dataset (humans have to confirm the subscriptions).
Whenever the remote dataset changes, the remote ERDDAP contacts a
special URL on the first ERDDAP to set a flag, which indicates that a
specific dataset should be reloaded/checked because it has changed.
As soon as possible, the first ERDDAP reloads the dataset.
So the two ERDDAPs stay in synch, usually within a few seconds.
It would be great if TDS could offer a similar subscription system so
other TDS installations, ERDDAP, RAMADA, and other clients could be
notified immediately whenever a specific TDS dataset changes.



On 6/10/2010 9:03 AM, Kevin O'Brien wrote:
>
>  Below is a bounced message from John Caron.....
>
>
>>  ------------------------------------------------------------------------
>>
>>  Subject:
>>  BOUNCEuaf_tech@xxxxxxxx:  Non-member submission from [John Caron
>>  <caron@xxxxxxxxxxxxxxxx>]
>>  From:
>>  uaf_tech-owner@xxxxxxxx
>>  Date:
>>  Thu, 10 Jun 2010 05:13:15 -0700
>>  To:
>>  uaf_tech-approval@xxxxxxxx
>>
>>  To:
>>  uaf_tech-approval@xxxxxxxx
>>
>>
>>  Date: Thu, 10 Jun 2010 06:13:06 -0600
>>  From: John Caron<caron@xxxxxxxxxxxxxxxx>
>>  Subject: Re: [uaf_tech] Next UAF telcon: June 10th, 12:30pm EDT
>>  In-reply-to:<AANLkTimmSgUqJCXhWj91GL1-7Dkdj1_aaI17xYNWbQQN@xxxxxxxxxxxxxx>
>>  To: Rich Signell<rsignell@xxxxxxxx>
>>  Cc: Ted Habermann<ted.habermann@xxxxxxxx>,
>>           Steve Hankin<Steven.C.Hankin@xxxxxxxx>,
>>           David Neufeld<David.Neufeld@xxxxxxxx>,
>>           _OAR PMEL UAF Tech List<uaf_tech@xxxxxxxx>,
>>           Ethan Davis<edavis@xxxxxxxxxxxxxxxx>,
>>           support-thredds@xxxxxxxxxxxxxxxx
>>
>>
>>  Hi Rich, et al:
>>
>>  I agree that modifying NcML in the TDS when files arrive is not a viable solution. You 
need to use a scan element for this, although we are replacing<scan>   elements 
with<collection>   elements (in FMRC right now, will be extended to other aggregations in 
4.3).
>>
>>  1) Specifying the time range in the catalog for this case is possible. 
Heres how we do it on motherlode:
>>
>>           <timeCoverage>
>>             <end>present</end>
>>             <duration>7 days</duration>
>>           </timeCoverage>
>>
>>  this means that the starting time is "present" - 7 days. The TDS generates 
the actual ISO dates in the catalog, eg at this moment:
>>
>>  TimeCoverage:
>>
>>  Start: 2010-06-03 12:04:57Z
>>  End: 2010-06-10 12:04:57Z
>>  Duration: 7 days
>>
>>  A bit more detail at:
>>
>>  
http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/v1.0.2/InvCatalogSpec.html#timeCoverageType
>>
>>
>>  2) One can also generate time ranges from the filename, see "Adding 
timeCoverage" in
>>
>>  
http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/DatasetScan.html
>>
>>  this is used when you have files with the starting time embedded in the 
filename and a known duration.
>>
>>
>>  3) We are moving towards automatic generation of the time coverage, as Rich 
mentioned, we do that now in the FMRC, and we will try to extend that to other 
aggregations where the time coordinate can be extracted
>>
>>  Not sure if I covered all the issues.
>>
>>  John
>>
>>  Rich Signell wrote:
>>
>>>  Guys,
>>>
>>>  Sorry to sent this twice, but I wanted to cc John Caron and Ethan
>>>  Davis to allow them to comment.
>>>
>>>  -Rich
>>>
>>>  On Wed, Jun 9, 2010 at 6:02 PM, Rich Signell<rsignell@xxxxxxxx>   wrote:
>>>
>>>>  Ted,
>>>>
>>>>  With time aggregations, the virtual dataset is served dynamically via
>>>>  THREDDS as new data arrives without modifying the underlying catalog
>>>>  that specifies the aggregation.    We don't want to be modifying NcML
>>>>  in the catalog every time a file arrives.   So it seems we have two
>>>>  choices:  (1) have the crawler actually read the last time value and
>>>>  since it's CF-compliant, this is easy (there is a NetCDF-Java function
>>>>  for this).   I think both ncISO and RAMADDA already do this.   (2) we
>>>>  ask Unidata to modify the TDS so that it automatically generates the
>>>>  stop time as THREDDS metadata.  It already does this for FRMC
>>>>  aggregations.   On the plus side, this ensures that we get the right
>>>>  time without reading the time values.  The disadvantage is that it
>>>>  would only work for TDS served data.
>>>>
>>>>  -Rich
>>>>
>>>>  On Wed, Jun 9, 2010 at 5:42 PM, Ted Habermann<ted.habermann@xxxxxxxx>   
wrote:
>>>>
>>>>>  Rich et al.,
>>>>>
>>>>>  Seems to me our first choice should be to use an existing standard for
>>>>>  describing time periods. In my experience the most commonly used is ISO
>>>>>  8601. Describing time periods of known duration is straightforward if we
>>>>>  know the starting point. For example a period with duration 7 days 
starting
>>>>>  today would be: 20100609/P7D. There are probably a couple ways to 
expressing
>>>>>  this explicitly in NcML:
>>>>>
>>>>>  <attribute name="time_coverage_start" value="2010-06-09"/>
>>>>>  <attribute name="time_coverage_duration" value="P7D"/>
>>>>>
>>>>>  or, it may make sense to just calculate the end time and write it into 
the
>>>>>  file:
>>>>>
>>>>>  <attribute name="time_coverage_start" value="2010-06-09"/>
>>>>>  <attribute name="time_coverage_end" value="2010-06-16"/>
>>>>>
>>>>>  If we are dealing with collection level NcML (?), one could say
>>>>>  <attribute name="time_coverage_start" value="present"/>
>>>>>  <attribute name="time_coverage_duration" value="P7D"/>
>>>>>
>>>>>  I'm not sure off hand how this would get translated to ISO. Maybe
>>>>>  <gmd:temporalElement>
>>>>>     <gmd:EX_TemporalExtent>
>>>>>       <gmd:extent>
>>>>>         <gml:TimePeriod gml:id="t3">
>>>>>           <gml:beginPosition indeterminatePosition="now"/>
>>>>>           <gml:endPosition>P7D</gml:endPosition>
>>>>>         </gml:TimePeriod>
>>>>>       </gmd:extent>
>>>>>     </gmd:EX_TemporalExtent>
>>>>>  </gmd:temporalElement>
>>>>>
>>>>>  Ted
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  On 6/9/2010 12:34 PM, Steve Hankin wrote:
>>>>>
>>>>>  David Neufeld wrote:
>>>>>
>>>>>  Hi Rich, Steve,
>>>>>
>>>>>  I think if we move toward a model where metadata is handled as a service 
as
>>>>>  opposed to a static file this problem starts to go away.
>>>>>
>>>>>  Agree in principle.  I have argued this same pov with Ted -- that we 
should
>>>>>  not insist that metadata be inserted into files, if that metadata is
>>>>>  derivable from information already contained in the file.
>>>>>
>>>>>  Ideas for implementing this approach?  The most appealing to me is that 
TDS,
>>>>>  itself, would generate data discovery metadata such as
>>>>>
>>>>>  time_coverage_start = "present minus 30 days";   // a running archive
>>>>>  time_coverage_end = "present plus 10 days";   // a forecast
>>>>>
>>>>>  based upon coordinates and use metadata found inside the dataset, and
>>>>>  perhaps some new ncML directives that govern the "metadata service".  But
>>>>>  the questions remain: who would do this work and when?  And what should 
UAF
>>>>>  do in the interim (i.e. now)?
>>>>>
>>>>>       - Steve
>>>>>
>>>>>  So for example, if we generate metadata dynamically and it contains the
>>>>>  standard static attributes along side of dynamically retrieved values for
>>>>>  geographic and temporal bounds then we're in good shape at the catalog
>>>>>  level.  There is still the issue of how often to harvest the metadata in
>>>>>  other clearinghouses like RAMADDA or Geonetwork, but that can be left 
more
>>>>>  for the portal provider to determine.
>>>>>
>>>>>  Dave
>>>>>
>>>>>  On 6/9/2010 10:39 AM, Steve Hankin wrote:
>>>>>
>>>>>
>>>>>  Rich Signell wrote:
>>>>>
>>>>>  UAF Folks,
>>>>>
>>>>>  I can't make the 12:30 ET/9:30 PT meeting tomorrow, but here are my two
>>>>>  issues:
>>>>>
>>>>>  Hi Rich,
>>>>>
>>>>>  Sorry you cannot make it.   With that in mind have started the 
conversations
>>>>>  here by email ...
>>>>>
>>>>>  1) How to handle temporal metadata for time aggregated datasets that are
>>>>>  changing every day (or perhaps every 15 min for the HF Radar 
measurements).
>>>>>  I got bit by this when I did a temporal/ geospatial search in RAMADDA for
>>>>>  UAF data in the Gulf of Mexico during the last week and turned up no
>>>>>  datasets.  It should have turned up the NCOM Region 1 model data, HF 
radar
>>>>>  data and USGS COAWST model results.   I'm pretty sure the problem is that
>>>>>  RAMADDA harvested the data from the clean catalog more than a week ago, 
so
>>>>>  the "stop dates" in the metadata database are older than one week ago.   
How
>>>>>  should this best be fixed?
>>>>>
>>>>>  Might this be best addressed by using the Unidata Discover Attribute
>>>>>  recommendations:
>>>>>  
http://www.unidata.ucar.edu/software/netcdf-java/formats/DataDiscoveryAttConvention.html?
>>>>>  They offer the global attribute:
>>>>>
>>>>>      time_coverage_end = "present"
>>>>>
>>>>>  Arguably within UAF we should insert such global attributes into the
>>>>>  relevant datasets and also work to communicate the need back to the data
>>>>>  providers to do so on their own THREDDS servers.  An alternative to 
consider
>>>>>  is putting this information into the THREDDS metadata instead of into the
>>>>>  ncML of the dataset.
>>>>>
>>>>>  btw: A seeming omission in the Unidata recommendations is any way to
>>>>>  represent "3 months ago" as the start time.  A start time of this style 
is
>>>>>  pretty common in operational outputs.
>>>>>
>>>>>
>>>>>  2) How to represent FMRC data.   If we scan a catalog with a Forecast 
Model
>>>>>  Run Collection we currently get hundreds of datasets, because the FRMC
>>>>>  automatically produces datasets for the daily forecasts as well as the 
"Best
>>>>>  Time Series" dataset that most people are interested in.   In the latest
>>>>>  version of the Thredds Data Server (4.2 beta), the provider can specify 
they
>>>>>  only want the best time series dataset to be exposed.   This will help
>>>>>  significantly, but it will take a while to get everybody with FMRCs
>>>>>  retrofit.   I will bring this up on the Model Data Interoperability 
Google
>>>>>  Group.
>>>>>
>>>>>  Might be best to hold off this topic until you are on the phone, since 
you
>>>>>  are our resident expert.  No?
>>>>>
>>>>>      - Steve
>>>>>
>>>>>  --
>>>>>  ==== Ted Habermann ===========================
>>>>>        Enterprise Data Systems Group Leader
>>>>>        NOAA, National Geophysical Data Center
>>>>>        V: 303.497.6472   F: 303.497.6513
>>>>>        "I entreat you, I implore you, I exhort you,
>>>>>        I challenge you: To speak with conviction.
>>>>>        To say what you believe in a manner that bespeaks
>>>>>        the determination with which you believe it.
>>>>>        Because contrary to the wisdom of the bumper sticker,
>>>>>        it is not enough these days to simply QUESTION AUTHORITY.
>>>>>        You have to speak with it, too."
>>>>>        Taylor Mali,www.taylormali.com
>>>>>  ====Ted.Habermann@xxxxxxxx  ==================
>>>>>
>>>>  --
>>>>  Dr. Richard P. Signell   (508) 457-2229
>>>>  USGS, 384 Woods Hole Rd.
>>>>  Woods Hole, MA 02543-1598
>>>>
>>>>
>>>
>>>
>>
>>
>
>  --
>  Kevin O'Brien                   UW/JISAO  
>  Research Scientist              NOAA/PMEL/TMAP
>  206-526-6751http://www.pmel.noaa.gov
>
>  "The contents of this message are mine personally and do
>  not necessarily reflect any position of the Government
>  or the  National Oceanic and Atmospheric Administration."
>

--
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
Phone: (831)658-3205
Fax:   (831)648-8440
Email:bob.simons@xxxxxxxx

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric
Administration.
<><  <><  <><  <><  <><  <><  <><  <><  <><



_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  
visit:http://www.unidata.ucar.edu/mailing_lists/



_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/

--
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
Phone: (831)658-3205
Fax:   (831)648-8440
Email: bob.simons@xxxxxxxx

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric
Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: