Hi,
I actually have a rough implementation that does this. The approach I used
is a bit different that what has been said so far but I thought I'd
introduce it just in case it could be useful.
To provide the context, my objective is to develop an efficient
infrastructure for data mining and visualization. I selected VisAD for my
data model.
I created a DataSet class which stores the data in a FlatField with
MathType:
(observation_index -> (co1, col2, ..., last_col))
The columns could be text or numerical. The DataSet class deals with text
columns through a TextComponent class which maps distinct textual values to
numerical values. Internally, the DataSet class stores the numerical values
and does the conversion back to text values as required (e.g., to display
the content of the dataset, to access a specific value, ...)
The DataSet provides many other features including:
- adding observations
- deleting observations
- changing specific value of a column in a given observation
- changing all values in a given observations
- changing all values in a given columns
- automatic growing of the internal structures when adding observations
- compacting datasets
- support for trigger facilities similar to what is found in DBMS
- return internal FlatField
- ...
To help interface with the DataSet class, I defined two java interfaces:
InDataSetAdapter and OutDataSetAdapter. I implemented these interfaces for
some types of data sources. For example, I've a class called InDBAdapter
that does interface with DBMS through JDBC. I also have InWEKAAdapter and
OutWEKAAdapter classes which allow me to interface (in both directions) with
the WEKA system (http://www.cs.waikato.ac.nz/~ml/weka/).
The InDBAdatper class has the following constructor methods:
public InDBAdapter(Connection con, String sqlQuery, int ds_size)
public InDBAdapter(String user, String passwrd, String dbName,
String sqlQuery,int ds_size)
public InDBAdapter(DataSet dataSet, String user, String passwrd,
String dbName, String sqlQuery)
The first 2 methods create a default DataSet given the SQL query or the
table name, respectively. Note that the ds_size parameter in these two
methods can be safely ignored. It is only there to provide an initial
starting size for the dataset that needs to be created. The dataset extends
automatically as needed if more data is provided. The last constructor fills
an already defined DataSet with the data returned from an SQL query. Here
are some simple examples of usage.
Example 1:
String sqlCmd = "SELECT * from table1";
InDBAdapter inAdapter = new InDBAdapter("username", "password",
"dbName", sqlCmd, 100);
// Get all observations at once
inAdapter.getAll();
// Get the newly created dataset
DataSet ds = inAdapter.getDataSet();
System.out.println(ds);
Example 2:
String sqlCmd = "SELECT * from table1";
InDBAdapter inAdapter = new InDBAdapter("username", "password",
"dbName", sqlCmd, 100);
// Get observations one at a time
for (int i=0; i<100; i++) {
Object ob = inAdapter.get();
if (ob == null) {
System.out.println("Null object, something went wrong " +
"during automatic conversion???");
break;
}
if (ob instanceof EndOfStream) {
System.out.println("No more observation.");
break;
}
}
// Get the newly created dataset
DataSet ds = inAdapter.getDataSet();
System.out.println(ds);
Example 3:
DataSet ds;
MathType [] mt;
TupleType tt;
// Create a dataset
mt = new MathType[4];
mt[0] = RealComponent.getRealComponent("EGT",CommonUnit.degree);
mt[1] = RealComponent.getRealComponent("ALT",CommonUnit.meter);
mt[3] = RealComponent.getRealComponent("IN_TEMP",CommonUnit.degree);
mt[2] = TextComponent.getTextComponent("AIRCRAFT_CD");
tt = new TupleType(mt);
ds = new DataSet("test", tt, 100);
// Define the datasource to get the data (an SQL query)
String sqlCmd = "SELECT EXHAUST_GAS_TEMP, " +
" ALTITUDE, " +
" INLET_TEMP, " +
" AIRCRAFT_CD " +
"FROM ENGINE_CRUISE_REPORT " +
"WHERE AIRCRAFT_ID = 201 " +
"ORDER BY AIMS_DATE";
InDBAdapter inAdapter = new InDBAdapter(ds, "username", "password",
"dbName", sqlCmd, 100);
// Fill the above dataset from an SQL query
inAdapter.getAll();
// Print the dataset
System.out.println(ds);
My system is far from being completely tested and documented. Also part of
it has been developed at work and I'd need approval before making it
publicly available. I don't think that would be an issue. If there is some
interests, I'd be quite pleased to try to make it happen . In the best
case, I think it could be released in March (I'm now in a hurry to complete
the writing of my PhD)
All the best,
Sylvain
Sylvain Létourneau
Tel/tél: 613 990-1178 | Facsimile/télécopieur: 613-952-0215
sylvain.letourneau@xxxxxxxxxxxxxx
http://iit-iti.nrc-cnrc.gc.ca
Research Officer Agent de Recherche
Integrated Reasoning Raisonnement Intégré
Institute for Information Technology Institut de technologie de
l`information
National Research Council Conseil national de recherches Canada
M-50, 1200 Montreal Road M-50, 1200, chemin Montréal
Ottawa, Ontario K1A 0R6 Ottawa, Ontario K1A 0R6
Government of Canada | Gouvernement du Canada
http://www.nrc-cnrc.gc.ca