Re: [thredds] data won't return from

Hey Sean,
Interesting, Thanks for digging into this.  I didn’t realize that a single 
variable (project in this case) could affect the rest of the variables in the 
aggregated dataset.  I’ll have to re-process this data set.

I appreciate the help!

Spicer

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Spicer Bak, Ph.D.
Research Coastal Engineer
USACE ERDC CHL
Field Research Facility
1261 Duck Road
Kitty Hawk, NC 27949
Office: 252 – 261 – 6840  x 238
Cell (personal): 252 – 305 – 9975
Cell (work): 252 – 751 – 7196
Website: frf.usace.army.mil
Email: Spicer.Bak@xxxxxxxxxxxxx

From: Sean Arms <sarms@xxxxxxxx>
Sent: Monday, February 3, 2020 9:14 AM
To: Spicer Bak <spicer.bak.frf@xxxxxxxxx>
Cc: THREDDS community <thredds@xxxxxxxxxxxxxxxx>; Bak, Spicer ERC-RDE-CHL-NC 
CIV <Spicer.Bak@xxxxxxxxxxxxx>; Dickhudt, Patrick J ERDC-RDE-CHL-NC CIV 
<Patrick.J.Dickhudt@xxxxxxxxxxxxx>
Subject: Re: [thredds] data won't return from

Greetings Spicer,

I dug into this more over the weekend, and it turns out two files are missing 
the project variable:

FRF_20161116_1128_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc
FRF_20200110_1180_FRF_NAVD88_LARC_GPS_UTC_v20200113_grid_latlon.nc

If you add a project variable to those files, the aggregation works (tested 
locally with your files, ncml, and original python code).

One thing I noticed - there are several files with the same time value, so in 
the aggregation you end up with duplicate time values without a way for users 
to distinguish where they came from (i.e. which version). A list of those files 
are at the end of this message.

Cheers!

Sean

Files with the same time:

FRF_19950420_0743_FRF_NAVD88_CRAB_Geodimeter_UTC_v20151115_grid_latlon.nc, 
FRF_19950420_0743_FRF_NAVD88_CRAB_Geodimeter_UTC_v20190326_grid_latlon.nc

FRF_20150429_1100_FRF_NAVD88_LARC_GPS_UTC_v20160323_grid_latlon.nc, 
FRF_20150429_1100_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc

FRF_20150618_1102_FRF_NAVD88_LARC_GPS_UTC_v20170328_grid_latlon.nc, 
FRF_20150618_1102_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc

FRF_20151014_1108_FRF_NAVD88_LARC_GPS_UTC_v20170328_grid_latlon.nc, 
FRF_20151014_1108_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc

FRF_20151221_1115_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc, 
FRF_20151221_1115_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc

FRF_20160817_1122_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc, 
FRF_20160817_1122_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc

FRF_20160926_1124_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc, 
FRF_20160926_1124_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc

FRF_20161003_1125_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc, 
FRF_20161003_1125_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc

FRF_20161020_1126_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc, 
FRF_20161020_1126_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc

FRF_20161116_1128_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc (also 
missing project variable), 
FRF_20161116_1128_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc

FRF_20170105_1129_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc, 
FRF_20170105_1129_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc

FRF_20171011_1142_FRF_NAVD88_LARC_GPS_UTC_v20171012_grid_latlon.nc, 
FRF_20171011_1143_FRF_NAVD88_LARC_GPS_UTC_v20171221_grid_latlon.nc

FRF_20171121_1143_FRF_NAVD88_LARC_GPS_UTC_v20171129_grid_latlon.nc, 
FRF_20171121_1144_FRF_NAVD88_LARC_GPS_UTC_v20171221_grid_latlon.nc, 
FRF_20171121_1144_FRF_NAVD88_LARC_GPS_UTC_v20180130_grid_latlon.nc

FRF_20180418_1149_FRF_NAVD88_LARC_GPS_UTC_v20180427_grid_latlon.nc, 
FRF_20180418_1149_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc

FRF_20190917_1170_FRF_NAVD88_CRAB_GPS_UTC_v20190919_grid_latlon.nc, 
FRF_20190917_1170_FRF_NAVD88_CRAB_GPS_UTC_v20191029_grid_latlon.nc

On Fri, Jan 31, 2020 at 3:42 PM Sean Arms 
<sarms@xxxxxxxx<mailto:sarms@xxxxxxxx>> wrote:
Greetings Spicer,

I think there is an issue with your new project variable. In previous files, 
it's a float,

Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/FRF_20191206_1179_FRF_NAVD88_LARC_GPS_UTC_v20191209_grid_latlon.nc.ascii?project%5B0:1:0%5DBlocked

Dataset {
    Float64 project[time = 1];
} 
frf/geomorphology/DEMs/surveyDEM/FRF_20191206_1179_FRF_NAVD88_LARC_GPS_UTC_v20191209_grid_latlon.nc;
---------------------------------------------
project[1]
-999.0

but in the new latest file, it's a string:

Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/FRF_20200110_1180_FRF_NAVD88_LARC_GPS_UTC_v20200113_grid_latlon.nc.ascii?projectBlocked

Dataset {
    String project;
} 
frf/geomorphology/DEMs/surveyDEM/FRF_20200110_1180_FRF_NAVD88_LARC_GPS_UTC_v20200113_grid_latlon.nc;
---------------------------------------------
project, "F"

That might cause new kinds of issues for a full variable read.

Cheers!

Sean


On Fri, Jan 31, 2020 at 3:33 PM Spicer Bak 
<spicer.bak.frf@xxxxxxxxx<mailto:spicer.bak.frf@xxxxxxxxx>> wrote:
Hey Sean,
Glad we were able to help find that bug, but I don't think the "project" 
variable (or lack of) is the root of our problem as i chose your option 3 (my 
mistake, this was supposed to be the same after the last one) and i have 
similar response.  Good news, when i add the #noprefetch option, it seems to 
fix it.  hopefully this helps provide answers. Demonstrated by below code.

# Failure with python (matlab as well)
import netCDF4 as nc
for url in urls:
    print(nc.Dataset(url)['time'])
    variables= nc.Dataset(url).variables.keys()
    for var in variables:
        try:
            nc.Dataset(url)[var][0]
            print('Success! {} from {}'.format(var, url))
        except IndexError:
            print("won't load variable {} from {}".format(var, url))
            url += "#noprefetch"
            nc.Dataset(url)[var][0]
            print('Success! {} from {}'.format(var, url))
        except IndexError as e:
            print("FAIL: load variable {} from {}".format(var, url))
            print('    {}'.format(e))



On Fri, Jan 31, 2020 at 4:25 PM Sean Arms 
<sarms@xxxxxxxx<mailto:sarms@xxxxxxxx>> wrote:
I thought it was working for me, but for the wrong reasons. Sorry about that. 
But, now I have it.

The error message from the server is...well...garbage. That's something we need 
to look into. The underlying problem here is that the new data file does not 
include the variable project. Because variable "project" exists in the other 
files (or more specifically, the first file of the aggregation), and 
specifically because it has time as it's outer dimension, currently it must 
exist in all files of the aggregation. So, we can make an ascii request for 
that variable specifically, and request everything up to the last value, and it 
works [0:1:424]:

Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml.ascii?project%5B0:1:424%5DBlocked

However, once we ask for the last value, it bombs [0:1:424] :

Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml.ascii?project%5B0:1:425%5DBlocked

It's only like this for the full read code path, though. If I slice it from 
[1:1:424] (skip the first value), it works

Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml.ascii?project%5B1:1:425%5DBlocked

...and the missing value gets filled with a zero, because why not (well, ok, I 
can think of a few reasons). So, from the TDS side, and more specifically 
netCDF-Java, this is a bug (well, bugs, because returning zero isn't quite the 
thing to do here either).

From what I can understand, netCDF-C tries to preload some data from remote 
servers (if the variable is considered "small enough"), and because the 
variable "project" is "small enough", the library tries to grab it all and the 
request bombs out. You can tell the C library to not do that by adding 
"#noprefetch" at the end of the url when opening the dataset, but that just 
delays things (unless you or the underlying library you are using never tries 
to fully read the variable "project"). So, in your sample python code, pass 
Netcdf4Python's Dataset the following URL and give it a spin:

Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncmlBlocked#noprefetch<Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml#noprefetchBlocked>

Cool...so what to do until I can get this fixed. Three things off the top of my 
head:

1. Add "#noprefetch" to your URL (and anyone else using the dataset). Barf.
2. Use NcML in the aggregation to remove the variable "project" all together, 
as it seems to be -999.0 when it does exist anyways.
3. Add the variable "project" to the latest file (NcML or by rewriting it).

Sorry this took so long to debug. There will be a fix, it will just take some 
time (maybe early next week since I believe I know exactly what needs done and 
where).

Cheers,

Sean


On Fri, Jan 31, 2020 at 12:27 PM Spicer Bak 
<spicer.bak.frf@xxxxxxxxx<mailto:spicer.bak.frf@xxxxxxxxx>> wrote:
Hey Sean,
Thanks for getting back to me.  I was still getting the same symptoms earlier 
today.  Did it work on second inquiry for you?

I did make a change to that last file so the dim's match up with previous.  
After I did that, checked again to make sure that wasn't causing the problem.  
Seems i still am getting the same IndexError i was getting, so this (as you 
mentioned) is not the root of the problem.

On Fri, Jan 31, 2020 at 11:03 AM Sean Arms 
<sarms@xxxxxxxx<mailto:sarms@xxxxxxxx>> wrote:
Greetings Spicer,

It looks like you have solved the issue? I was having problems the other day as 
well. I downloaded the files locally to see if I could reproduce the issue, and 
I am unable to do so now. However, I do see something that might cause an issue 
down the road. The dimension order on some of the variables in the latest file 
does not match what is in other files. For example, if we compare the latest 
file:

Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/FRF_20200110_1180_FRF_NAVD88_LARC_GPS_UTC_v20200113_grid_latlon.nc.ddsBlocked

with next latest file:

Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/FRF_20191206_1179_FRF_NAVD88_LARC_GPS_UTC_v20191209_grid_latlon.nc.ddsBlocked

The dimensionality of latitude, longitude, northing, and easting switch from 
(xFRF, yFRF) (20191206) to (yFRF, xFRF) (20200113).Might not be a big issue 
overall since the use of an NcML file on the server (as opposed to NcML 
directly in the catalog) uses the first file of the aggregation as template.

Cheers,

Sean


On Wed, Jan 29, 2020 at 3:03 PM Spicer Bak 
<spicer.bak.frf@xxxxxxxxx<mailto:spicer.bak.frf@xxxxxxxxx>> wrote:
hello TDS community,
I have a problem I'm quite pickled on.  We had a dataset that was working fine 
until (i think) we pushed the latest file.  Our server is continually updated 
with new files and all of the other datasets seem to work fine.

I'm able to get the data from the individual file URLS, but not the time 
concatenated ncml file.  I can display the file or variables, but not obtain 
any of the data through OPeNDAP, but i'm able to see the data returned via the 
"get ASCII" button on the OPeNDAP page.  The below python script demonstrates 
the problem:

# success with ncml:
Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml.ascii?latitude%5B0:1:0%5D%5B0:1:0%5D,time%5B0:1:425%5D,elevation%5B0:1:0%5D%5B0:1:0%5D%5B0:1:0%5DBlocked

# Failure with python (matlab as well)
import netCDF4 as nc
for url in urls:
    print(nc.Dataset(url)['time'])
    variables= nc.Dataset(url).variables.keys()
    for var in variables:
        try:
            nc.Dataset(url)[var][0]
            print('Success! {} from {}'.format(var, url))
        except IndexError as e:
            print("won't load variable {} from {}".format(var, url))
            print('    {}'.format(e))

Any help would be much appreciated!

--
+++++++++++++++++++++++++++
Spicer Bak, PhD
USACE CHL Field Research Facility
252-305-9975
_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


thredds mailing list
thredds@xxxxxxxxxxxxxxxx<mailto:thredds@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe,  visit: 
Blockedhttps://www.unidata.ucar.edu/mailing_lists/Blocked


--
+++++++++++++++++++++++++++
Spicer Bak, PhD
USACE CHL Field Research Facility
252-305-9975


--
+++++++++++++++++++++++++++
Spicer Bak, PhD
USACE CHL Field Research Facility
252-305-9975
  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: