Re: [netcdfgroup] storing sparse matrices data in NetCDF

  • To: Ken Mankoff <mankoff@xxxxxxxxx>
  • Subject: Re: [netcdfgroup] storing sparse matrices data in NetCDF
  • From: Sourish Basu <Sourish.Basu@xxxxxxxxxxxx>
  • Date: Mon, 18 Mar 2019 17:04:31 -0600
  • Autocrypt: addr=Sourish.Basu@xxxxxxxxxxxx; prefer-encrypt=mutual; keydata= mQINBFZ40gQBEADvGCqRa5XGgaROg6TYUieMAh5GDTy6lclqxdKqu4oYSROUFYkEuT4tOHpV 4k6Ruhg1EYXMl0siTQ4VsTcvaFBR3RLKiOdRxsh4jPVrZI1TWJPJRWlvNg5iPXczTkH3diyA 2Pp4CBNfpw/M5uUHMgfL/A+1AAT0ciMUq9eR5U8bcjvfemg6Js/+tsNOwyHGlZEEXOrod/eb NrqOB1FA0WFHvEkFgJ1Ed/g3ulu0ylel0HoC/rCv0pU/PX+wNqucQbk00xjOw8ts02keX2z+ LQumHPWfSNrQpPh5u4L6XLAcc0RjEig6WkwHJQtjdEIoI+TNXtrDdQ09yrOg1dgQwz/kgnp4 oLBLjs2K3XMymsSorPCcInAoG3kchRadsmv69WX+YxIWPJaKOVrNs5K7Jf0nX9cyGe5Q1XuW 8sbk7IGKux9sy0S9HaYqU/w6HUhGl85522ogCeyZ7xnZoKuPthHGOBULsS8YD7BqYrxgNwnA JsDgefkSqLUiKixie/Tb8V/dSDpFfXEm6/ixNPm6iXZAza8ZVreQqQ3gs44VIHZi1c0qHtC0 kP2dx3IujfkdeUL3g2m84GMHmeG5Q207P50rqWk0kPzHRu5xMDQYICQ4wJwaEo69oM776sQ7 YxRXmFqzo2UX45rTVoq5xaOS0NRFteN4lJDB35911h5El46l+QARAQABtChTb3VyaXNoIEJh c3UgPFNvdXJpc2guQmFzdUBjb2xvcmFkby5lZHU+iQI4BBMBAgAiBQJWeNrBAhsjBgsJCAcD AgYVCAIJCgsEFgIDAQIeAQIXgAAKCRDdna2p+Lv9IC5NEADjZ9A1SZWzlN/trUcRIL9Vt2xZ oesJDGbv24gXUTbe7O0aSB6EfQCCBS2wRCjtvHGBaTEMbL0oGYTIBS2VZg/xL4LFXtKqwkWe 27Z+6erRGpIVCvO2gj5uVLBvi6MGrxc+TNfKSsH+6sxnL0lHZe2H9ptpn4+RXlSchQyH9x9D qf0o9C3iUxVthdwzfS6lpJsXnTM7DfLZz/2vr7eSfTYh106fQU++WUE4KcWcH/p/DG9R0hRg e2WJQ5oVuFS7tPKJuRyEBfhDuk92HAviLg/FgisfTNNRsrVaQJfBI3sDweTV/ueP7D9TqByF l+6Xl1h3gflMhTX6llQmTHXYtU30fjk8V8yjEr90mfpBbdnWbqbI0kqCqa4f+X1L0F20vSrf slhq0JWsZR96yljonXorW93aYu/4LCvO5AGtSx4LUTX7/jVM2DWQfa59/Ioqygz0V2EYDzQF poST5TznXPlsz+0kIUzUoLjv9+ES93idZt8rRNmklHdOyA9eEAIqv/b2mXSpYMjk7HMbxPgR h+jX7WiyfbFi12z4ApQR8MbHe5iZwTsrMhwlzoJEDp4tL3NcWf5nQoRnjZ+6i0zNzMy4T1Jp LIcK82J/Vam93eLXm4E8wJbz70VEuSGO8Ei1nevG4BwSNw9vbjo0t5GvCLt93RKBXsNaJ5y4 C9W/ewSg+rkCDQRWeNIEARAAypsoemKwvg4pZv54DXN/bkmWTgSiYHWQqrUhMyP5UTi3hWw6 yuXtcDJ8QlHE9TBzO+JIKmf5q8ueANV7Rj1XXk2HiqDLggHgFy7lT/Vjv/cxp7l36kSn9iFM Y/pkg5C297g/dOmuxP/igInh5tpkIHU9qbbAGjLmplR95MEowivJKPbgs6QIFGcfuHCxNz+w 9vgqG+oZmtG9yE34/vS651v/9qJc4WW2t/oywUCm5ti/FwLV0MJ7hXmK48DpTzAVo5bAwkWB ALFvIbgGShncg5Ubn2xxe2dkgUAdxhX6bWPA3P7mC+3xrHtV0uRCBbYDCDH8LOGPWKK0poRn iUcWlKY6PAGSiAXzBmgex3lv/EymYUHH4D1QJTxaoLpO/8O17AharvkuAD2Wi5s3j/9PiE5d ilxww5Df+43memityqJzFoFLgvlftXYsnQ5dsJGXOVhnf5IE+xzWnP/W5qTuDswlV8ZJ4OjT +KiZkePhirXiKLObcwpODZ97VCE0O1DHOWNfuvg6aQd97FHo51wRs2CI5SBa2xbpEhbwKu3p Py11Lkn0NQ/3qPrnKOs4bxb5nn+mUGkLMeQLantWmnWF8r7WxELkf+06jYliG6LTdCmVld6r PqW8E6/KQaZJXjRcbJ01b80IilyFCE9l2uA+ZgVieCHWHFuQ28+yNscvJVEAEQEAAYkCHwQY AQIACQUCVnjSBAIbDAAKCRDdna2p+Lv9IPQUEACPYzMYudTbWC9w615+fpW6kZdWXRByGCqJ G8fM2zkADi521ZH3nzWzdOjAxXZ94ujEUuNMeEBDlk4lmmb1i4jstyRWf5FJBqbGM52PiPMn 5mcI8GzIayvYMugDCoyMH1WGEI3lmQbIAr7kkyjLDbhTa74YmMvzgtmHRgDSHqHqAKCrBKde HqkvxEFu8clL50KRsUm45RU4BOupbzHnw2zxzhEmK1PJaJ5WqCMSX4icftGlkWNEJq3KmmSf JPmIO48ACXneslTzF3hRTslEreAHYvQJprYZDj3Cr1ttftCcrhs7L4Fz64BhYBoye+j78z4A 8EFbaWzGuZt1SRBvd8a2qiq9kj8g2FzzxqXsCluILwN9GNPZbYY2aXPemZcbbUYtqEA4yTRs vVvd68NXxIINTfXDlAYHr1DcCHvwqr+oZuG+J70zl1vVxjCl6BvdsG6VdMM0ag6RGhuRms81 EQj1oyFg3EBFJoJ/6vXV0rTTs+Yw6DsaFlNrM/xnGUf1hD5uh7utRyYfGpLssdJQzn/dVsRU iUPL5w35WZ/za2VJIs7Mv6f9DmxaRd6FrtCmc1GoXYspcj95ytrcFHKi+MviOUkhuEnOz/tB odfNf7h8Mkb5mVHONfWrFdIyF2ZqngD+Lx2YgITXdakyBq9WOFwGoyHblqQbO4PKxD38b7av /A==
Ken,

You could make lat, lon etc. netcdf dimensions by having variables
called lat, lon etc. and defining values for them. E.g., lat(lat) =
(30., 40., 50., ...), so a variable 'lat' that has the length 'lat' and
the values of latitude inside. However, you won't be able to slice with
standard tools unless your data array actually uses that dimension as
one of the dimensions. And given the nature of your data, I don't see
how that's possible. It seems like you're trying to subset from the
command line, i.e., "which points have lat between L1 and L2?" I don't
know if there's an NCO tool that does that, maybe someone else has a
better idea.

Any reason why you're making lon, lat and ID strings? They're all
numbers, no?

-Sourish

On 3/18/19 4:50 PM, Ken Mankoff wrote:
> Hi Sourish, Gus, and Elizabeth,
>
> Thank you all for your suggestions. I think I've found something that works, 
> except for one issue. Please excuse my likely incorrect use of terminology - 
> being new to NetCDF creation I may say something incorrect, but I hope the 
> data dump below speaks for itself.
>
> Because my data is 2D (time, ID), then those are the dimensions, and 
> lon,lat,x,y become variables on the ID dimension. This means my standard 
> netcdf tools for slicing based on spatial dimension don't work. For example,
>
> cdo sellonlatbox,83.5,85,-27,-28 ds.nc bar.nc
>
> or
>
> ncks -d lat,83.5,85 -d lon,-27,-28 ds.nc bar.nc
> # ncks: ERROR dimension lat is not in input file
>
> Is there a way to make the data 2D but have the 2nd dimension be (lon,lat)? 
> Even if yes, I don't imagine the cdo and ncks tools would work on that 
> dimension... Is there a cdo, nco, or ncks (or other) simple tool I'm missing 
> that can work with this non-gridded data the way those tools do so easily 
> work with gridded data?
>
>
> Anway, here is the Python xarray code I got working to produce the NetCDF 
> file, reading in the 'foo.csv' from my previous email and generating ds.nc. 
> Once I understood the NetCDF structure from the file Sourish provided, I was 
> able to generate something similar using a higher level API - one that takes 
> care of time units, calendar, etc. I leave out (x,y,elev) for brevity.
>
>
>   -k.
>
>
>
> df = pd.read_csv('foo.csv', index_col=0, header=[0,1,2,3,4,5])
> df.index = pd.to_datetime(df.index)
>
> # Build the dataset
> ds = xr.Dataset()
> ds['lon'] = (('ID'), df.columns.get_level_values('lon'))
> ds['lat'] = (('ID'), df.columns.get_level_values('lat'))
> ds['runoff'] = (('time', 'ID'), df.values)
> ds['ID'] = df.columns.get_level_values('ID')
> ds['time'] = df.index
>
> # Add metadata
> ds['lon'].attrs['units'] = 'Degrees East'
> ds['lon'].attrs['long_name'] = 'Longitude'
> ds['lat'].attrs['units'] = 'Degrees North'
> ds['lat'].attrs['long_name'] = 'Latitude'
> ds['runoff'].attrs['units'] = 'm^3/day'
> ds['ID'].attrs['long_name'] = 'Basin ID'
>
> ds.to_netcdf('ds.nc')
>
>
>
>
> And here is the ncdump of the file
>
>
>
>
>
> netcdf ds {
> dimensions:
>       ID = 10 ;
>       time = 5 ;
> variables:
>       string lon(ID) ;
>               lon:units = "Degrees East" ;
>               lon:long_name = "Longitude" ;
>       string lat(ID) ;
>               lat:units = "Degrees North" ;
>               lat:long_name = "Latitude" ;
>       double runoff(time, ID) ;
>               runoff:_FillValue = NaN ;
>               runoff:units = "m^3/day" ;
>               runoff:long_name = "RACMO runoff" ;
>       string ID(ID) ;
>               ID:long_name = "Basin ID" ;
>       int64 time(time) ;
>               time:units = "days since 1980-01-01 00:00:00" ;
>               time:calendar = "proleptic_gregorian" ;
>
> // global attributes:
>               :Creator = "Ken Mankoff" ;
>               :Contact = "kdm@xxxxxxx" ;
>               :Institution = "GEUS" ;
>               :Version = 0.1 ;
> data:
>
>  lon = "-27.983", "-27.927", "-27.894", "-28.065", "-28.093", "-28.106", 
>     "-28.155", "-27.807", "-27.455", "-27.914" ;
>
>  lat = "83.505", "83.503", "83.501", "83.502", "83.501", "83.499", "83.498", 
>     "83.485", "83.471", "83.485" ;
>
>  runoff =
>   0.023, 0.01, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
>   0.023, 0.01, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
>   0.024, 0.013, 0.023, 0.005, 0, 0, 0, 0, 0, 0,
>   0.025, 0.012, 0.023, 0.005, 0, 42, 0, 0, 0, 0,
>   0.023, 0.005, 0.023, 0.005, 0, 0, 0, 0, 0, 0 ;
>
>  ID = "1", "2", "5", "8", "9", "10", "12", "13", "15", "16" ;
>
>  time = 0, 1, 2, 3, 4 ;
> }

Attachment: signature.asc
Description: OpenPGP digital signature

  • 2019 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: