Re: [thredds] Simple fix => much smaller TDS WMS GetCapabilities size (for model output)

Note that the discussion has shifted from getCapabilities (which is the client asking the server for a description of the data that is available) to (I presume) getMap (which is a client asking the server for a map of specific data).

On 12/22/2010 10:50 AM, Kyle Wilcox wrote:
The WMS 1.3.0 spec states that a server can optionally return a
"nearest value" for any Dimension (Table C.1 and Section C.4.3).

The WMS server should return the "nearest value" as part of the
response header, so the client can determine that the requested
Dimension was rounded.

Nearest value is interesting, but it is easy to see a slippery slope to abuse when combined with a weak getCapabilities. If the server weakens getCapabilities so that it just indicates the range of time for which data is available and indicates that the values are evenly spaced (e.g., every day), then the client will feel free to ask for a map for any day in the time range. That isn't so bad for a model dataset that has 4 missing days in 20 years, but it is real trouble for the SeaWIFS dataset that has big gaps in coverage: a request for a map for Jan 1, might return a "nearest value" map for Mar 10. Also, it's great that the header indicates the "nearest value" that was used, but the response is an image file (e.g., .jpg or .png). I'm not sure that many clients will be sophisticated enough to look at the header for the actual (nearest) value and display the information to the human user. Yes, that's the client's fault. But it is also the server's fault for falsely advertising what data it has.

Yes. That's an extreme case. I'm just saying: let's be wary of this extreme case when we decide how loosely to interpret the getCapabilities idea of "evenly" spaced.


If the server does not support "nearest value", it should just return
an exception of "InvalidDimensionValue".


Both "nearest value" and the issue Rich brought up (listing all
available times per layer) have patches ready for NcWMS, although
they shouldn't be considered production quality.  Currently the
"nearest value" implementation does nothing to the response headers.

"nearest time":
https://github.com/asascience/ncWMS/commit/99a542e80a169a50afee1ebf8540ae6d9f1ee206


"time intervals": https://github.com/asascience/ncWMS/commit/9e2925fc607a05d6484299e017db0180a2200fa4



--------- Kyle Wilcox, Engineer Applied Science Associates 55 Village
Square Drive South Kingstown, RI 02879 p: (401) 789-6224 e:
kwilcox@xxxxxxxxxxxxxx


-----Original Message----- From: thredds-bounces@xxxxxxxxxxxxxxxx
[mailto:thredds-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John Caron
Sent: Wednesday, December 22, 2010 1:22 PM To:
thredds@xxxxxxxxxxxxxxxx Subject: Re: [thredds] Simple fix =>  much
smaller TDS WMS GetCapabilities size (for model output)

Im wondering what the spec says (or doesnt say about this): How does
the response have to match the request? Can the response simply send
back what time values it has that match the request, and the client
can assume that any missing layers are, well, missing?

On 12/22/2010 11:14 AM, Doug Lindholm wrote:
I'd like to add to that. At what point is it sufficient to fill
gaps (assuming uniform time steps) with fill values? In some
cases, determining the precise time samples available is an
expensive task. I'd rather get a single time range quickly then
deal with fill values (such as NaNs which often require no special
handling).

Doug

On 12/22/10 11:01 AM, Bob Simons wrote:
Some other issues to consider:

* Where is the dividing line between a few missing values (where
it's "okay" to say the values are evenly spaced) and too many
missing values (where it isn't okay)?  Some number?  Some
percentage?  Does it matter if the missing values are adjacent or
scattered?

* getCapabilities purpose is to tell the client what is
available. It is probably fine for a human to read that the data
is evenly spaced (even if it isn't perfectly); humans are
sometimes forgiving.  But will there be problems if a computer
program client expects (perfectly) evenly spaced values and
requests data for those values?

On 12/22/2010 9:47 AM, Ethan Davis wrote:
Hi Rich,

I've added this feature request to our list. Jon Blower might
have some thoughts on this as well.

One thing that I wonder about is client support. In particular,
does Godiva2 support this? Again, a question for Jon.

Ethan

On 12/22/2010 6:00 AM, Rich Signell wrote:
For getCapabilities requests, it would be great if the TDS
would express the available times using the WMS multiple time
interval syntax if it is more efficient than listing each
time value separately.

This can result in huge (100 or more) savings in the WMS
getCapabilities size when dealing with model output, which is
usually equally spaced, but perhaps with a few gaps.
Instead of listing every available time step in ISO format,
as is done currently

In WMS 1.1.1, Annex C.3 states that multiple intervals are
allowed in the "Extent" element. In WMS 1.3.0, Annex C.2
states that multiple intervals are allowed in the "Dimension"
element.

Both list the sample format:
"min1/max1/res1,min2/max2/res2,..." (thanks to Kyle Wilcox
for digging out this info)

For example, we have a dataset

http://testbedapps.sura.org/thredds/clean.html?dataset=estuarine_hypoxia/ch3d/agg




that contains hourly output over 21 years (183984 time records) for 11
different variables.  There are only 4 gaps longer than 1
hour.

If you access the WMS getCapabilities document for this
dataset, be prepared to wait for a while, because it's
51Mb!!

The problem is that each time value is listed in ISO ASCII:
<Dimension name="time" units="ISO8601" multipleValues="true"
current="true" default="2006-01-01T00:00:00.000Z">
1985-01-01T01:00:00.000Z,1985-01-01T02:00:00.000Z,1985-01-01T03:00:00.000Z,1985-01-01T04:00:00.000Z,1985-01-01T05:00:00.000Z,1985-01-01T06:00:00.000Z,1985-01-01T07:00:00.000Z,1985-01-01T08:00:00.000Z,1985-01-01T09:00:00.000Z,1985-01-01T10:00:00.000Z,



...

which goes on for 5MB of ASCII values and then this whole
mess is repeated for each variable "layer".

Instead, the entire time record could be simply expressed
using 5 intervals of the form:

"1985-01-01T01:00:00.000Z/1988-12-31T00:00:00.000Z/PT3600S",
"1989-01-01T01:00:00.000Z/1992-12-31T00:00:00.000Z/PT3600S",
"1993-01-01T01:00:00.000Z/2000-12-31T00:00:00.000Z/PT3600S",
"2001-01-01T01:00:00.000Z/2004-12-31T00:00:00.000Z/PT3600S",
"2005-01-01T01:00:00.000Z/2006-01-01T00:00:00.000Z/PT3600S",


Existing way (every time step written out): 51MB New way
(specifying intervals):                100Kb   (500 times
smaller!!!!!)

This would greatly reduce the file sizes on Motherlode, our
IOOS testbed server, and every other TDS (or WMS, actually)
with aggregated model output.

_______________________________________________ thredds mailing
list thredds@xxxxxxxxxxxxxxxx For list information or to
unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/


_______________________________________________ thredds mailing
list thredds@xxxxxxxxxxxxxxxx For list information or to
unsubscribe,  visit: http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________ thredds mailing list
thredds@xxxxxxxxxxxxxxxx For list information or to unsubscribe,
visit: http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________ thredds mailing list
thredds@xxxxxxxxxxxxxxxx For list information or to unsubscribe,
visit: http://www.unidata.ucar.edu/mailing_lists/


--
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
1352 Lighthouse Ave
Pacific Grove, CA 93950-2079
Phone: (831)658-3205
Fax:   (831)648-8440
Email: bob.simons@xxxxxxxx

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric
Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: