Shishir S. Bharathi wrote:
OK. I assumed that the PICats were also services. This clarifies things.
What I meant by mapping was that at what level is the actual search
performed based on the required keywords ?
I'm trying to summarize how to get from a set of keywords to a data item
(or set) that satisfies those conditions. Is this what happens ?
1. The data arrives from it's source and stored on a storage device.
yes. a lot of data is archival data, so it doesnt need to arrive.
2. Catalog generators mine this data and generate PICats (and also Dataset
catalogs ? Are these different ?)
PICats are all the various THREDDS XML documents, including catalogs,
aka "dataset catalogs". The Catalogs are pretty well defined, the other
PICats we are still experimenting with.
2.1. Since the data can be of different forms, you generate metadata
according to different schema, but the PICat itself adheres to a single
schema.
yes. there are a lot of details here we are still prototyping.
3. PICat servers pull this information from the PICats
So what do PICat servers store ? XML documents like InvCatalog.0.6.xml,
which is the PICat itself ?
Currently our prototype "PICAT Server", now called "Dataset Searcher"
replicates the entire catalog. We will probably revisit this when
scaleability becomes an issue. So it creates an in-memory database.
Obviously this wont scale either. We are considering relational
databases, simple BTrees, and text indexing tools such as Lucene.
4. Query the PICat server with the keywords required
5. PICat server looks at the PICats and returns id of a Dataset Catalog
How is this done ?
Currently just look for keyword matches. That part is easy. The
space/time filtering is a bit harder. Our prototype just fits it all in
memory, so scanning everything is no big deal. We are considering how to
make this scaleable for the next funding cycle.
we return a catalog of matches.
6. Query the dataset catalog if needed.
same step as 5.
Is this about right ?
yup.
Thanks,
Shishir