Greetings Spicer,
Keep in mind that you only see this aggregation bug if a request for all of
the data from the variable "project" is made. That can happen if a user
actually asks for it (on the python side by calling [:]), or if the
underlying library tries to preemptively grab it to improve performance.
The netCDF-C library is doing that second one by default at the point you
try to read data from any variable via opendap. Adding the
"#noprefetch" fragment to the URL tells the C library to only grab data you
explicitly tell it to grab, and so you get a lot further along before
running into this issue. What I don't know is if there has been a change
in the C library default that allowed us to identify this bug right off the
bat. Regardless, I'll get the bug fixed on the netCDF-Java side, which will
certainly help :-)
Sean
On Mon, Feb 3, 2020 at 11:00 AM Bak, Spicer ERC-RDE-CHL-NC CIV <
Spicer.Bak@xxxxxxxxxxxxx> wrote:
> Hey Sean,
>
> Interesting, Thanks for digging into this. I didn’t realize that a single
> variable (project in this case) could affect the rest of the variables in
> the aggregated dataset. I’ll have to re-process this data set.
>
>
>
> I appreciate the help!
>
>
>
> Spicer
>
>
>
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
>
>
>
> Spicer Bak, Ph.D.
> Research Coastal Engineer
>
> USACE ERDC CHL
>
> Field Research Facility
>
> 1261 Duck Road
>
> Kitty Hawk, NC 27949
>
> Office: 252 – 261 – 6840 x 238
>
> Cell (personal): 252 – 305 – 9975
>
> Cell (work): 252 – 751 – 7196
>
> Website: frf.usace.army.mil
>
> Email: Spicer.Bak@xxxxxxxxxxxxx
>
>
>
> *From:* Sean Arms <sarms@xxxxxxxx>
> *Sent:* Monday, February 3, 2020 9:14 AM
> *To:* Spicer Bak <spicer.bak.frf@xxxxxxxxx>
> *Cc:* THREDDS community <thredds@xxxxxxxxxxxxxxxx>; Bak, Spicer
> ERC-RDE-CHL-NC CIV <Spicer.Bak@xxxxxxxxxxxxx>; Dickhudt, Patrick J
> ERDC-RDE-CHL-NC CIV <Patrick.J.Dickhudt@xxxxxxxxxxxxx>
> *Subject:* Re: [thredds] data won't return from
>
>
>
> Greetings Spicer,
>
>
>
> I dug into this more over the weekend, and it turns out two files are
> missing the project variable:
>
>
>
> FRF_20161116_1128_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc
> FRF_20200110_1180_FRF_NAVD88_LARC_GPS_UTC_v20200113_grid_latlon.nc
>
>
>
> If you add a project variable to those files, the aggregation works
> (tested locally with your files, ncml, and original python code).
>
>
>
> One thing I noticed - there are several files with the same time value, so
> in the aggregation you end up with duplicate time values without a way for
> users to distinguish where they came from (i.e. which version). A list of
> those files are at the end of this message.
>
>
>
> Cheers!
>
>
>
> Sean
>
>
>
> Files with the same time:
>
>
>
> FRF_19950420_0743_FRF_NAVD88_CRAB_Geodimeter_UTC_v20151115_grid_latlon.nc,
> FRF_19950420_0743_FRF_NAVD88_CRAB_Geodimeter_UTC_v20190326_grid_latlon.nc
>
>
> FRF_20150429_1100_FRF_NAVD88_LARC_GPS_UTC_v20160323_grid_latlon.nc,
> FRF_20150429_1100_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc
>
>
> FRF_20150618_1102_FRF_NAVD88_LARC_GPS_UTC_v20170328_grid_latlon.nc,
> FRF_20150618_1102_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc
>
>
> FRF_20151014_1108_FRF_NAVD88_LARC_GPS_UTC_v20170328_grid_latlon.nc,
> FRF_20151014_1108_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc
>
>
> FRF_20151221_1115_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc,
> FRF_20151221_1115_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc
>
>
> FRF_20160817_1122_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc,
> FRF_20160817_1122_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc
>
>
> FRF_20160926_1124_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc,
> FRF_20160926_1124_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc
>
>
> FRF_20161003_1125_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc,
> FRF_20161003_1125_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc
>
>
> FRF_20161020_1126_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc,
> FRF_20161020_1126_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc
>
>
> FRF_20161116_1128_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc (also
> missing project variable),
> FRF_20161116_1128_FRF_NAVD88_LARC_GPS_UTC_v20190330_grid_latlon.nc
>
>
> FRF_20170105_1129_FRF_NAVD88_LARC_GPS_UTC_v20170320_grid_latlon.nc,
> FRF_20170105_1129_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc
>
>
> FRF_20171011_1142_FRF_NAVD88_LARC_GPS_UTC_v20171012_grid_latlon.nc,
> FRF_20171011_1143_FRF_NAVD88_LARC_GPS_UTC_v20171221_grid_latlon.nc
>
>
> FRF_20171121_1143_FRF_NAVD88_LARC_GPS_UTC_v20171129_grid_latlon.nc,
> FRF_20171121_1144_FRF_NAVD88_LARC_GPS_UTC_v20171221_grid_latlon.nc,
> FRF_20171121_1144_FRF_NAVD88_LARC_GPS_UTC_v20180130_grid_latlon.nc
>
>
>
> FRF_20180418_1149_FRF_NAVD88_LARC_GPS_UTC_v20180427_grid_latlon.nc,
> FRF_20180418_1149_FRF_NAVD88_LARC_GPS_UTC_v20190326_grid_latlon.nc
>
>
> FRF_20190917_1170_FRF_NAVD88_CRAB_GPS_UTC_v20190919_grid_latlon.nc,
> FRF_20190917_1170_FRF_NAVD88_CRAB_GPS_UTC_v20191029_grid_latlon.nc
>
>
>
> On Fri, Jan 31, 2020 at 3:42 PM Sean Arms <sarms@xxxxxxxx> wrote:
>
> Greetings Spicer,
>
>
>
> I think there is an issue with your new project variable. In previous
> files, it's a float,
>
>
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/FRF_20191206_1179_FRF_NAVD88_LARC_GPS_UTC_v20191209_grid_latlon.nc.ascii?project%5B0:1:0%5DBlocked
>
>
>
> Dataset {
> Float64 project[time = 1];
> }
> frf/geomorphology/DEMs/surveyDEM/FRF_20191206_1179_FRF_NAVD88_LARC_GPS_UTC_v20191209_grid_latlon.nc;
> ---------------------------------------------
> project[1]
> -999.0
>
>
>
> but in the new latest file, it's a string:
>
>
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/FRF_20200110_1180_FRF_NAVD88_LARC_GPS_UTC_v20200113_grid_latlon.nc.ascii?projectBlocked
>
>
>
> Dataset {
> String project;
> }
> frf/geomorphology/DEMs/surveyDEM/FRF_20200110_1180_FRF_NAVD88_LARC_GPS_UTC_v20200113_grid_latlon.nc;
> ---------------------------------------------
> project, "F"
>
>
>
> That might cause new kinds of issues for a full variable read.
>
>
>
> Cheers!
>
>
> Sean
>
>
>
>
>
> On Fri, Jan 31, 2020 at 3:33 PM Spicer Bak <spicer.bak.frf@xxxxxxxxx>
> wrote:
>
> Hey Sean,
>
> Glad we were able to help find that bug, but I don't think the "project"
> variable (or lack of) is the root of our problem as i chose your option 3
> (my mistake, this was supposed to be the same after the last one) and i
> have similar response. Good news, when i add the #noprefetch option, it
> seems to fix it. hopefully this helps provide answers. Demonstrated by
> below code.
>
>
>
> # Failure with python (matlab as well)
> import netCDF4 as nc
> for url in urls:
> print(nc.Dataset(url)['time'])
> variables= nc.Dataset(url).variables.keys()
> for var in variables:
> try:
> nc.Dataset(url)[var][0]
> print('Success! {} from {}'.format(var, url))
> except IndexError:
> print("won't load variable {} from {}".format(var, url))
> url += "#noprefetch"
> nc.Dataset(url)[var][0]
> print('Success! {} from {}'.format(var, url))
> except IndexError as e:
> print("FAIL: load variable {} from {}".format(var, url))
>
> print(' {}'.format(e))
>
>
>
>
>
>
> On Fri, Jan 31, 2020 at 4:25 PM Sean Arms <sarms@xxxxxxxx> wrote:
>
> I thought it was working for me, but for the wrong reasons. Sorry about
> that. But, now I have it.
>
>
>
> The error message from the server is...well...garbage. That's something we
> need to look into. The underlying problem here is that the new data file
> does not include the variable project. Because variable "project" exists in
> the other files (or more specifically, the first file of the aggregation),
> and specifically because it has time as it's outer dimension, currently it
> must exist in all files of the aggregation. So, we can make an ascii
> request for that variable specifically, and request everything up to the
> last value, and it works [0:1:424]:
>
>
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml.ascii?project%5B0:1:424%5DBlocked
>
>
>
>
> However, once we ask for the last value, it bombs [0:1:424] :
>
>
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml.ascii?project%5B0:1:425%5DBlocked
>
>
>
> It's only like this for the full read code path, though. If I slice it
> from [1:1:424] (skip the first value), it works
>
>
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml.ascii?project%5B1:1:425%5DBlocked
>
>
>
> ...and the missing value gets filled with a zero, because why not (well,
> ok, I can think of a few reasons). So, from the TDS side, and more
> specifically netCDF-Java, this is a bug (well, bugs, because returning zero
> isn't quite the thing to do here either).
>
>
>
> From what I can understand, netCDF-C tries to preload some data from
> remote servers (if the variable is considered "small enough"), and because
> the variable "project" is "small enough", the library tries to grab it all
> and the request bombs out. You can tell the C library to not do that by
> adding "#noprefetch" at the end of the url when opening the dataset, but
> that just delays things (unless you or the underlying library you are using
> never tries to fully read the variable "project"). So, in your sample
> python code, pass Netcdf4Python's Dataset the following URL and give it a
> spin:
>
>
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncmlBlocked#noprefetch
>
>
>
> Cool...so what to do until I can get this fixed. Three things off the top
> of my head:
>
>
>
> 1. Add "#noprefetch" to your URL (and anyone else using the dataset). Barf.
>
> 2. Use NcML in the aggregation to remove the variable "project" all
> together, as it seems to be -999.0 when it does exist anyways.
>
> 3. Add the variable "project" to the latest file (NcML or by rewriting it).
>
>
>
> Sorry this took so long to debug. There will be a fix, it will just take
> some time (maybe early next week since I believe I know exactly what needs
> done and where).
>
>
>
> Cheers,
>
>
>
> Sean
>
>
>
>
>
> On Fri, Jan 31, 2020 at 12:27 PM Spicer Bak <spicer.bak.frf@xxxxxxxxx>
> wrote:
>
> Hey Sean,
>
> Thanks for getting back to me. I was still getting the same symptoms
> earlier today. Did it work on second inquiry for you?
>
>
>
> I did make a change to that last file so the dim's match up with
> previous. After I did that, checked again to make sure that wasn't causing
> the problem. Seems i still am getting the same IndexError i was getting,
> so this (as you mentioned) is not the root of the problem.
>
>
>
> On Fri, Jan 31, 2020 at 11:03 AM Sean Arms <sarms@xxxxxxxx> wrote:
>
> Greetings Spicer,
>
>
>
> It looks like you have solved the issue? I was having problems the other
> day as well. I downloaded the files locally to see if I could reproduce the
> issue, and I am unable to do so now. However, I do see something that might
> cause an issue down the road. The dimension order on some of the variables
> in the latest file does not match what is in other files. For example, if
> we compare the latest file:
>
>
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/FRF_20200110_1180_FRF_NAVD88_LARC_GPS_UTC_v20200113_grid_latlon.nc.ddsBlocked
>
>
>
> with next latest file:
>
>
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/FRF_20191206_1179_FRF_NAVD88_LARC_GPS_UTC_v20191209_grid_latlon.nc.ddsBlocked
>
>
>
> The dimensionality of latitude, longitude, northing, and easting switch
> from (xFRF, yFRF) (20191206) to (yFRF, xFRF) (20200113).Might not be a big
> issue overall since the use of an NcML file on the server (as opposed to
> NcML directly in the catalog) uses the first file of the aggregation as
> template.
>
>
>
> Cheers,
>
>
> Sean
>
>
>
>
>
> On Wed, Jan 29, 2020 at 3:03 PM Spicer Bak <spicer.bak.frf@xxxxxxxxx>
> wrote:
>
> hello TDS community,
>
> I have a problem I'm quite pickled on. We had a dataset that was working
> fine until (i think) we pushed the latest file. Our server is continually
> updated with new files and all of the other datasets seem to work fine.
>
>
>
> I'm able to get the data from the individual file URLS, but not the time
> concatenated ncml file. I can display the file or variables, but not
> obtain any of the data through OPeNDAP, but i'm able to see the data
> returned via the "get ASCII" button on the OPeNDAP page. The below python
> script demonstrates the problem:
>
>
>
> # success with ncml:
>
>
> Blockedhttps://chldata.erdc.dren.mil/thredds/dodsC/frf/geomorphology/DEMs/surveyDEM/surveyDEM.ncml.ascii?latitude%5B0:1:0%5D%5B0:1:0%5D,time%5B0:1:425%5D,elevation%5B0:1:0%5D%5B0:1:0%5D%5B0:1:0%5DBlocked
>
>
>
> # Failure with python (matlab as well)
>
> import netCDF4 as nc
>
> for url in urls:
> print(nc.Dataset(url)['time'])
> variables= nc.Dataset(url).variables.keys()
> for var in variables:
> try:
> nc.Dataset(url)[var][0]
> print('Success! {} from {}'.format(var, url))
> except IndexError as e:
> print("won't load variable {} from {}".format(var, url))
>
> print(' {}'.format(e))
>
>
> Any help would be much appreciated!
>
>
> --
>
> +++++++++++++++++++++++++++
>
> Spicer Bak, PhD
>
> USACE CHL Field Research Facility
> 252-305-9975
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> Blockedhttps://www.unidata.ucar.edu/mailing_lists/Blocked
>
>
>
> --
>
> +++++++++++++++++++++++++++
>
> Spicer Bak, PhD
>
> USACE CHL Field Research Facility
> 252-305-9975
>
>
>
> --
>
> +++++++++++++++++++++++++++
>
> Spicer Bak, PhD
>
> USACE CHL Field Research Facility
> 252-305-9975
>
>