Re: Old Problem - Sparse Data

[Jeremy Beal wrote that he has large quantities of both spatially and
temporally irregular/sparse data that he needs to store and retrieve 
efficiently in a platform independent manner, and wonders how best to 
do this.]

One additional question comes to mind immediately:  

    Do you want fast selective random access?

Be sure of your answer:  in many cases it can make an astounding
difference to the style of work you do.  Fast selective random access
makes an enormous difference for analysis and visualization. (I've also
seen too many met and met-related) models built around sequential
files that have become vast conspiracies to manipulate a complex
shared state centered around the positions of a multiplicity of
sequential file pointers.)

If you don't need/want fast selective random access, then the
XDR'ed binary file is an acceptable solution.  Otherwise, for sparse
data you need files with built-in indexing.  HDF VSets are a partial 
solution to this, provided you don't have very many time steps:  they 
have a doubly-linked list of index blocks interspersed with data blocks.
Be aware, though, that the overhead of sequential access to those index 
blocks can kill you if you do have lots of time steps.  If you have a 
year's worth of hourly met observations stored this way and you want
to look at the 0Z Dec 1 observations, be prepared to sit for five or
ten minutes while your disk drive grinds through the 8000 or so index
blocks for Jan 1-Dec 1 before it can even begin to think about data.

Something else worth checking is PDB, which is part of Livermore's 
Portable Application Code Toolkit; see 

    http://www.llnl.gov/def_sci/pact/hact_homepage.html

It seems to be a lower-level interface than netCDF, but does have support
for building efficient index structures.

fwiw

                                                 xcc@xxxxxxxxxxxxxxxx
Carlie J. Coats, Jr.                                   coats@xxxxxxxx
MCNC Environmental Programs                      phone: (919)248-9241
North Carolina Supercomputing Center               fax: (919)248-9245
3021 Cornwallis Road                                  P. O. Box 12889
Research Triangle Park, N. C.  27709-2889                         USA
"My opinions are my own, and I've got *lots* of them!"


  • 1997 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: