A while back I polled the user's of Splus, HDF and NetCDF for programs
that interface Splus to a common data format. I'd like to thank
everyone for their timely and helpful responses. Below, I have
compiled the comments from the various newsgroups.
Thanks again,
Andrea Wood
Pacific Northwest Laboratory
-------------------------------------------------------------------
***From phil@xxxxxxxxxxx Wed Mar 16 14:05:18 1994***
Why don't you take a look (anonymous ftp) at the code in:
leghorn.geog.ubc.ca:/pub/splus? We'd be interested in hearing
from anyone else that may contact you on this, as well.
Our interface is currently under development, and it crashes
occasionally due to a bug (we believe) in Splus, but we use it daily
to get random access to large netCDF arrays. (Note that CDFchange
isn't working yet). I've appended the file CDF_SPLUS.doc, and the
letter we have written to someone at Statsci about the bug.
<phil@coot: pub/splus> cat CDF_SPLUS.doc
# $Id: CDF_SPLUS.doc,v 1.2 1993/09/09 06:27:49 mintha Exp mintha $
#
# Documentation for the CDF->Splus->CDF interface
#
# $Log: CDF_SPLUS.doc,v $
# Revision 1.2 1993/09/09 06:27:49 mintha
# Prelim SPLUS->CDF docs & clean up
#
# Revision 1.1 1993/08/13 22:50:06 mintha
# Initial revision
#
CDF->Splus To read files into splus.
User functions:
CDFgetinfo(filename, close=T)
- Reads in all the information about the given cdf file and
returns a structure with all the global attibutes and a
component for each variable, and all its dimensions and
attributes. If 'close' if FALSE then the file is not closed
(used internally)
CDFget(filename, varname, start=NULL, count=NULL)
- Reads in the data for the given variable (variable id can be
substituted for name) The dimnames are attached and any
attributes are added as Splus attributes. If 'start' is NULL
(default) the origin (1,1,...) is used. If 'count' is NULL
then the rest of the variable is used. If any dimension in
'count' is -1 then that is replaced by the remaining amount of
that dimension.
Internal functions:
CDFgetdata(info, var, start=NULL, count=NULL)
- Does the actual work of reading in the data and attaching
the dimnames and attributes. 'info' is the structure returned
by CDFgetinfo, 'var' is the variable name or ID. 'start' and
'count' are as above.
CDFginfo(cdfp)
- Reads in all the global information about a cdf file.
Including dimension names, values, and labels, as well as
global attributes. 'cdfp' is the structure returned from the
cdf_open call.
CDFvinfo(ginfo, var)
- Reads in all the information for the given variable ID
('var'). Including dimensions, labels, and attributes.
'ginfo' is the structure returned by CDFginfo.
CDFgetinfo returns the folloing structure:
(note this is a ficitious example)
$global:
$ncid
23
$numvars
4
$dims
(5, 10, 20)
$dnames
("Time", "Droplets", "Ozone")
$dimlabs
(NULL, "rx ry rz", "r1 r2 r3 r4 dydt dzdt")
$unlimdim
4
$history
"Processed by ..."
$doc
"Ozone samples"
$comment
"Created by Jim"
$variable1
$name
"variable1"
$id
5
$type
5
$dimids
(1, 2)
$dims
(20, 10)
$labels
("r1 r2 r3 r4 dydt", "rx ry rz")
$title
"Just a variable"
$variable2
$name
"variable2"
$id
1
$type
6
$dimids
(2, 3)
$dims
(5, 10)
$labels
( NULL, "rx ry rz")
$title
"Bond, James Bond"
$FillValue
-999
CDFget returns something like this:
rx ry rz
dy 2 3 4
dz 5 11 9
dydz 13 12 11
attr(,"title"):
[1] "Hello"
attr(,"file"):
[1] "test2.nc"
SPLUS <- CDF To write files out of splus
CDFappend(filename, varname, data, coord=NULL)
- Add data to a variable in a netCDF file. The variable must
already exist and one of its dimensions must be unlimited. If
'coord' is not NULL the data is appended to the coordinate
variable which has the same name as the unlimited dimension
(see documentation on ncdump).
CDFchange(filename, varname, data, start=NULL, count=NULL)
- Changes an existing netCDF file. If the variable named does
not exist then it is added to the file. If it does exist then
the data given replaces that in the file. When adding a new
variable the 'dnames' attribute must be present. In both cases
the dimension must match. A hyperslab can be specified with
'start' and 'count' as above.
CDFput(filename, varname, data, clobber=F, unlim=NULL)
- Create a new netCDF file named 'filename'. One variable
'varname' is created, and filled with 'data'. 'data' must
have the 'dnames' attribute. The named dimensions are created
and given the sizes of the array unless an attribute 'dims' is
defined. If the variable has dimnames these are added as
labels for the dimensions. This is also overrided with the
'labels' attribute. Unless 'clobber' is (T)rue this function
will not overwrite an existing file. If 'unlim' is not NULL
the dimension given is made unlimited.
Cheers, Phil
---------------------------------------------------------------------
***From tam@xxxxxxxxxxx Thu Mar 17 12:49:20 1994***
A couple of years ago I was asked to interface Splus and HDF so that users
could retrieve and put data into HDF files much like they did in fortran.
The code is still here although I don't know whether it still being used.
It should still work. The package consists of
C code
Glue code that was called by Splus and then in turn, calls
HDF routines after fiddling with the parameters so they'll
work right.
This code was compiled into a module (S2hdf.o) that was
dynamically loaded into Splus via the dyn.load() command.
Splus code
These were a set of Splus routines that made working with
HDF more palatable (instead of trying to the .C() call
yourself and having to mangle the output into the usual
Splus convention).
Here's the README file (which upon reading myself is not really friendly and
looks half-baked). I hope you can get the gist of the operations that this
package offers.
--------------------------------------------------------------------------------
Notes on interface to use HDF library for input-ouput of floating-point arrays
into the Splus data analysis package.
noctiluca:/usr/local/src/hdf/splus/README
laplante@xxxxxxxxxxx 90may09
HDFload() makes the HDF library routine available by dynamically loading
/usr/local/lib/S2hdf.o . NB: UBC fixes to HDF library are necessary
for correct operation in cases where rank>2.
HDFget() reads in an entire HDF Scientific Data Set into an S "object"
(array). By specifying some of parameters 'start' and 'end' (or 'axis') the
user can get only a part of the SDS. For example
'start=c(1,1,1),end=c(7,5,1)' would get a 2-dimensional array. The form
'axis=2' would get a vector along the second dimension from the specified
start. The resulting array is collapsed to a lower rank if some dimensions
are 1. All parameters have reasonable defaults:
file=HDFcurrentfile (the last file loaded by this user)
ref=1 (select one of several SDS in file)
start=c(1,1,...) (start at beginning)
end=c(...) (end at end)
HDFaddata() appends the specified S array as an SDS to the named HDF file.
Error messages should be self-explanatory ;-) . The functions normally reside
in /usr/local/splus/s/.Functions/HDF* .
=====================================================================
1) EXAMPLE OF USE:
File "753.hdf" is a 7*5*3 test matrix containing 111.0 112.0 ... 752.0 753.0
% Splus
S-PLUS : Copyright (c) 1988, 1989, 1990 Statistical Sciences, Inc.
S : Copyright AT&T.
Version 2.3 Release 1 for Sun4 : 1990
> HDFload()
S - HDF interface library loaded
> ls()
[1] "HDFadddata" "HDFcurrentfile" "HDFget" "HDFgetarray"
[5] "HDFgetdata" "HDFgetdims" "HDFgetslice" "HDFload"
[9] "last.dump"
> HDFget(file="753.hdf",ref=1)
HDFget: start= 1 1 1
slicedims= 7 5 3
, , 1
[,1] [,2] [,3] [,4] [,5]
[1,] 111 121 131 141 151
[2,] 211 221 231 241 251
[3,] 311 321 331 341 351
[4,] 411 421 431 441 451
[5,] 511 521 531 541 551
[6,] 611 621 631 641 651
[7,] 711 721 731 741 751
, , 2
[,1] [,2] [,3] [,4] [,5]
[1,] 112 122 132 142 152
[2,] 212 222 232 242 252
[3,] 312 322 332 342 352
[4,] 412 422 432 442 452
[5,] 512 522 532 542 552
[6,] 612 622 632 642 652
[7,] 712 722 732 742 752
, , 3
[,1] [,2] [,3] [,4] [,5]
[1,] 113 123 133 143 153
[2,] 213 223 233 243 253
[3,] 313 323 333 343 353
[4,] 413 423 433 443 453
[5,] 513 523 533 543 553
[6,] 613 623 633 643 653
[7,] 713 723 733 743 753
> HDFget(end=c(7,5,1))
HDFget: using file 753.hdf
HDFget: start= 1 1 1
slicedims= 7 5 1
[,1] [,2] [,3] [,4] [,5]
[1,] 111 121 131 141 151
[2,] 211 221 231 241 251
[3,] 311 321 331 341 351
[4,] 411 421 431 441 451
[5,] 511 521 531 541 551
[6,] 611 621 631 641 651
[7,] 711 721 731 741 751
> HDFget(start=c(3,4,1),axis=3)
HDFget: using file 753.hdf
HDFget: start= 3 4 1
slicedims= 1 1 3
[,1] [,2] [,3]
341 342 343
=====================================================================
2- FUNCTION LISTINGS
> HDFload
function()
{
dyn.load("S2hdf.o")
cat("S - HDF interface library loaded\n")
invisible()
}
> HDFgetdims
function(file = HDFcurrentfile, maxrank = 10)
{
if(missing(file))
cat("Using file: ", file, "\n")
else HDFcurrentfile <<- file
z <- .C("HDFgetdims",
file,
rank = integer(1),
shape = integer(maxrank),
as.integer(maxrank))
z$shape[1:z$rank]
}
> HDFget
function(file = HDFcurrentfile, ref = 1, start = NULL, end = NULL, axis = NULL
)
{
# HDFget(file,ref,start,end,axis)
# file (default last name, saved in global string HDFcurrentfile)
# ref (serial number of data sets in the file)
# start (vector: coordinates of starting point)
# end (vector: slicedims = end - start + 1)
# -or-
# axis (dimension number for extracting a vector from array)
if(missing(file)) cat("HDFget: using file ", file, "\n") else
HDFcurrentfile <<- file
.C("DFSDrestart") # rewind
for(i in 1:ref)
filedims <- HDFgetdims(file) # step through data sets
filerank <- length(filedims)
if(missing(start))
start <- rep(1, filerank)
else if(filerank!=length(start)) {
cat("HDFget dimension mismatch: start=", start, "but", file,
"has", filedims, "\n")
return(NULL)
}
if(missing(end)) {
if(missing(axis)) {
end <- filedims
}
else {
end <- start # End = start except along axis.
end[axis] <- filedims[axis]
}
}
else if(filerank!=length(end)) {
cat("HDFget: end=", end, "but", file, "has", filedims, "\n")
return(NULL)
}
slicedims <- end - start + 1
cat("HDFget: start=", start, "\n")
cat("\tslicedims=", slicedims, "\n")
z <- .C("HDFgetslice",
file,
as.integer(start),
as.integer(slicedims),
dat = single(prod(slicedims)),
as.integer(slicedims))
array(z$dat, slicedims[slicedims!=1])
}
> HDFadddata
function(file, data, append = F)
{
if(missing(file))
cat("File must be supplied\n")
else if(missing(data))
cat("Missing data parameter\n")
else {
if(append)
cat("Appending data to ", file, " ...\n")
else cat("Replacing data in ", file, " ...\n")
d <- dim(data)
.C("HDFadddata",
file,
as.integer(length(d)),
as.integer(d),
as.single(data),
as.integer(append))
}
invisible()
}
--------------------------------------------------------------------------------
Still interested?
Joseph Tam <tam@xxxxxxxxxxx> \___ \___________
Department of Oceanography \___ \___________
University of British Columbia \___ \___________
Vancouver, BC, Canada \___ \___
V6T 1Z4 \___ \___ \___
\___ \___ \___
Phone: (604) 822-3911 \_________ \__ \___ \__
Fax: (604) 822-6091 \______ \__ \___ \__
---------------------------------------------------------------
***From russ@xxxxxxxxxxxxxxxx Wed Mar 16 12:36:02 1994***
I got an evaluation copy of Splus and looked into the feasibility of
building a netCDF interface. I also evaluated the possibility of using
netCDF for the external representation of Splus datasets.
I liked the powerful expression language, the fully worked out semantics for
"NA" missing values, and C and Fortran interfaces, and much else about
Splus.
I decided netcdf was not powerful enough to represent some kinds of Splus
recursive data structures, e.g. lists of lists or nested lists with named
elements. This meant it would be impractical to try to export arbitrary
Splus data structures into netCDF files.
However, an interface that allowed the import of netCDF data into Splus
seemed quite practical. That's about as far as I got during the evaluation.
__________________________________________________________________________
Russ Rew UCAR Unidata Program
russ@xxxxxxxxxxxxxxxx P.O. Box 3000
(303)497-8645 Boulder, Colorado 80307-3000
---------------------------------------------------------------
***From maa017@xxxxxxxxxxxxxxxxx Wed Mar 16 13:21:50 1994***
I wrote some code to send HDF SDS data to the DTM outport so that I
could use XDataslice to visualise 3-dimensional arrays. It works quite
nice:
Splus> arr_array(runif(50*50*50),dim=c(50,50,50))
Splus> sendarraydtm(arr)
and bingo, up it comes in Data Slice for animation, slicing, whatever.
My needs didnt go past SDS data, so I havent coded raster images,
pallettes or anything else!!
My one worry is that the HDF/DTM library uses malloc and so gobbles up
all the memory eventually.
Anyway, if you want to see my S/C code for it, I'll mail it to you!
Barry Rowlingson
Research Associate, Maths/Stats Dept
Lancaster University
----------------------------------------------------------------
***From maclean@xxxxxxxxxxxxxxxxxx Wed Mar 16 12:51:35 1994***
We have a function, fun.readnetcdf(),which will read a scalar or
multidimensioned NetCDF variable into S. This function calls C
functions which in turn make the NetCDF calls.
Right now it must read an entire variable - you cannot give the indices
of the NetCDF "hyperslab" that you want to read. However, it should
give you the basic functionality and would be easy to upgrade for your
needs.
Let me know if you want it and I can send the C and S code to you.
Gordon Maclean
Surface and Sounding Systems Facility
Atmospheric Tech. Div.
National Center for Atmospheric Research
P.O. Box 3000, Boulder CO 80307-3000
email: maclean@xxxxxxxxxxxxx
phone: (303) 497-8794
FAX: (303) 497-8770
-----------------------------------------------------------------
***From Jaason.Haapakoski@xxxxxx Wed Mar 16 22:47:44 1994***
I have build a read-only interface from Splus to netCDF files.
If that's enough then you can get the C routines, Splus functions and
the help pages on request. The interface utilizes generic Splus database
functions so that you can attach to a netcdf file and then read the
variables in the file just refering to them in the same way you
refer to ordinary Splus objects in a directory. Possible netcdf attributes
are read and assigned to the resulting object. Netcdf dimensions are
translated to the dim-attribute of Splus. Missing values are translated
to NA's.
Jaason Haapakoski
I forgot to mention in my previous message that our interface supports only
variables of fixed dimensions (uses the function ncvarget), not records (does
not use the function ncrecget). We have had no use for record-type netcdf
variables yet. I think it would be fairly simple to add missing routines,
however, but I have no time for that now. On the contrary, adding a write
interface from Splus to netcdf is not as simple because there are more types of
data structures in Splus than in netcdf. I think it would be wise to restrict
oneself only to Splus arrays, for example (a data frame as a special case).
Jaason Haapakoski
ATBC Cancer Prevention Study
The National Institute of Public Health in Finland