Ken, As someone who works with netcdf every day, I wouldn't want you to grid your data if I were a user. Your data are basically 20,000 different time series at 20,000 fixed lat/lon/elev locations. It's not inherently on a grid, and there's no advantage to be had in gridding it into a mostly empty hypercube. I think netcdf is very well suited for these sorts of data as well, and in my work we exchange atmospheric observation data -- also time series at discrete locations and times -- as netcdf files all the time. Given the simple data structure you've gone for, it's easy enough for a user to write their own subsetting script should they want to. It's true that you see a lot of discussion in this forum about handling gridded netcdf data with NCO and viewing them with panoply/ncview, but I think that's just because netcdf tends to get used heavily in the world of modelers -- be it land surface, atmosphere or ocean -- and models typically output gridded fields. I personally wish more people with data like yours would consider netcdf, so I'm really happy that you are :-) Cheers, Sourish On 3/18/19 5:25 PM, Ken Mankoff wrote: > I could always grid the data - I have 20,000 outlets around the coast of > Greenland at 90 m resolution. The total number of cells is 500 million. This > seems inefficient, but the 499,980,000 empty cells do compress quite well. It > is ~15 MB/day (5 GB/year). Maybe compressing multiple days improves on this? > 5 GB/year is tractable for 40 years of data. > > But as someone who works with NetCDF, wouldn't this data presentation make > you cringe? On the other hand, NetCDF works best when gridded, right? > > -k. > > > On 2019-03-18 at 16:19 -0700, Gus Correa <gus@xxxxxxxxxxxxxxxxx> wrote... >> Hi Ken >> >> That is true. >> I suppose both CDO and NCO (ncks) assume the lat and lon are >> monotonic (increasing or decreasing) coordinate variables, and that >> runoff has (time,lat,lon) dimensions, not (time,ID). >> ID is not a coordinate, it is just a label for your observation stations, I >> guess. >> >> You could devise a more elaborate scheme to define lat and lon dimensions, >> then, >> lat(lat) and lon(lon) coordinate variables, and from there create a 3D >> runoff(time,lat,lon) variable. >> There are several hurdles though: >> 1) The values of lat and lon in your CSV file may have repetitions (this >> affects the actual lenght of each dimension, which may be <20000). >> 2) The values of lat and lon in your CSV file may not be monotonically >> ordered (either in increasing or decreasing order). >> I didn't spot any repetitions in the sample file you sent (but the full >> file may have repetitions), >> but the lat and lon are definitely not monotonically ordered, >> they can go up, then down, then up again ... >> Bona fide coordinate variables must be monotonic. >> 3) Even if you weed out repetitions in lat or lon, and sort them in >> increasing or decreasing order, >> you would have to exchange also the corresponding runoff values, so that >> they continue to belong to the correct station/location/ID, >> i.e. sort the whole file with (lat,lon) as primary and secondary keys. >> >> Maybe Python has a sort routine that does all that for you gracefully, some >> variant of qsort perhaps. >> >> Gus >> >> >> On Mon, Mar 18, 2019 at 6:50 PM Ken Mankoff <mankoff@xxxxxxxxx> wrote: >> >>> Hi Sourish, Gus, and Elizabeth, >>> >>> Thank you all for your suggestions. I think I've found something that >>> works, except for one issue. Please excuse my likely incorrect use of >>> terminology - being new to NetCDF creation I may say something incorrect, >>> but I hope the data dump below speaks for itself. >>> >>> Because my data is 2D (time, ID), then those are the dimensions, and >>> lon,lat,x,y become variables on the ID dimension. This means my standard >>> netcdf tools for slicing based on spatial dimension don't work. For example, >>> >>> cdo sellonlatbox,83.5,85,-27,-28 ds.nc bar.nc >>> >>> or >>> >>> ncks -d lat,83.5,85 -d lon,-27,-28 ds.nc bar.nc >>> # ncks: ERROR dimension lat is not in input file >>> >>> Is there a way to make the data 2D but have the 2nd dimension be >>> (lon,lat)? Even if yes, I don't imagine the cdo and ncks tools would work >>> on that dimension... Is there a cdo, nco, or ncks (or other) simple tool >>> I'm missing that can work with this non-gridded data the way those tools do >>> so easily work with gridded data? >>> >>> >>> Anway, here is the Python xarray code I got working to produce the NetCDF >>> file, reading in the 'foo.csv' from my previous email and generating ds.nc. >>> Once I understood the NetCDF structure from the file Sourish provided, I >>> was able to generate something similar using a higher level API - one that >>> takes care of time units, calendar, etc. I leave out (x,y,elev) for brevity. >>> >>> >>> -k. >>> >>> >>> >>> df = pd.read_csv('foo.csv', index_col=0, header=[0,1,2,3,4,5]) >>> df.index = pd.to_datetime(df.index) >>> >>> # Build the dataset >>> ds = xr.Dataset() >>> ds['lon'] = (('ID'), df.columns.get_level_values('lon')) >>> ds['lat'] = (('ID'), df.columns.get_level_values('lat')) >>> ds['runoff'] = (('time', 'ID'), df.values) >>> ds['ID'] = df.columns.get_level_values('ID') >>> ds['time'] = df.index >>> >>> # Add metadata >>> ds['lon'].attrs['units'] = 'Degrees East' >>> ds['lon'].attrs['long_name'] = 'Longitude' >>> ds['lat'].attrs['units'] = 'Degrees North' >>> ds['lat'].attrs['long_name'] = 'Latitude' >>> ds['runoff'].attrs['units'] = 'm^3/day' >>> ds['ID'].attrs['long_name'] = 'Basin ID' >>> >>> ds.to_netcdf('ds.nc') >>> >>> >>> >>> >>> And here is the ncdump of the file >>> >>> >>> >>> >>> >>> netcdf ds { >>> dimensions: >>> ID = 10 ; >>> time = 5 ; >>> variables: >>> string lon(ID) ; >>> lon:units = "Degrees East" ; >>> lon:long_name = "Longitude" ; >>> string lat(ID) ; >>> lat:units = "Degrees North" ; >>> lat:long_name = "Latitude" ; >>> double runoff(time, ID) ; >>> runoff:_FillValue = NaN ; >>> runoff:units = "m^3/day" ; >>> runoff:long_name = "RACMO runoff" ; >>> string ID(ID) ; >>> ID:long_name = "Basin ID" ; >>> int64 time(time) ; >>> time:units = "days since 1980-01-01 00:00:00" ; >>> time:calendar = "proleptic_gregorian" ; >>> >>> // global attributes: >>> :Creator = "Ken Mankoff" ; >>> :Contact = "kdm@xxxxxxx" ; >>> :Institution = "GEUS" ; >>> :Version = 0.1 ; >>> data: >>> >>> lon = "-27.983", "-27.927", "-27.894", "-28.065", "-28.093", "-28.106", >>> "-28.155", "-27.807", "-27.455", "-27.914" ; >>> >>> lat = "83.505", "83.503", "83.501", "83.502", "83.501", "83.499", >>> "83.498", >>> "83.485", "83.471", "83.485" ; >>> >>> runoff = >>> 0.023, 0.01, 0.023, 0.005, 0, 0, 0, 0, 0, 0, >>> 0.023, 0.01, 0.023, 0.005, 0, 0, 0, 0, 0, 0, >>> 0.024, 0.013, 0.023, 0.005, 0, 0, 0, 0, 0, 0, >>> 0.025, 0.012, 0.023, 0.005, 0, 42, 0, 0, 0, 0, >>> 0.023, 0.005, 0.023, 0.005, 0, 0, 0, 0, 0, 0 ; >>> >>> ID = "1", "2", "5", "8", "9", "10", "12", "13", "15", "16" ; >>> >>> time = 0, 1, 2, 3, 4 ; >>> } >>> >>> _______________________________________________ >>> NOTE: All exchanges posted to Unidata maintained email lists are >>> recorded in the Unidata inquiry tracking system and made publicly >>> available through the web. Users who post to any of the lists we >>> maintain are reminded to remove any personal information that they >>> do not want to be made public. >>> >>> >>> netcdfgroup mailing list >>> netcdfgroup@xxxxxxxxxxxxxxxx >>> For list information or to unsubscribe, visit: >>> http://www.unidata.ucar.edu/mailing_lists/ >>>
Attachment:
signature.asc
Description: OpenPGP digital signature
netcdfgroup
archives: