Re: FW: Catalog generator crawling remote OPenDAP/DODS servers

To: Bas Retsios <retsios@xxxxxx>
Subject: Re: FW: Catalog generator crawling remote OPenDAP/DODS servers
From: Ethan Davis <edavis@xxxxxxxxxxxxxxxx>
Date: Mon, 19 Jun 2006 13:03:23 -0600

Hi Bas,

Your plan sound good. Here are a few comments and answers:

1) We have some JUnit tests in the test/src directory that might helpyou get going. Take a look attest/src/thredds/cataloggen/TestCollectionLevelScanner.java. If you canget that running and play with it some it should help you understandCollectionLevelScanner and CrawlableDataset. Some of the files intest/src/thredds/crawlabledataset might be helpful to look at as well.Most of the tests use data from the test/data directory (including somedata to crawl) so you should have everything you need to get the testsworking for you. There are a few spots in the crawlabledataset teststhat use data local to my machine (sorry, we're still trying to clean upour testing).

2) The ant build script, build.xml, contains a "makeWar" target forbuilding the thredds.war file. Also, "compile" and "test-setup" mightcome in handy for getting the above tests running ("cleanCompileTest"does both after a clean). If you have trouble with that, I can certainlybuild a .war with your files included. Just let me know.

3) Hmm. Nothing jumps out at me as to why your "Test" class wouldn'twork. Does it actually fail to generate a catalog? Or just generate askeleton catalog?

4) Yeah, the collection, catalog, and current levels are kind ofconfusing. Basically, if you have a hierarchical data collection (I'lltalk in terms of CrawlableDatasetFile where the collection isrepresented by a local file directory structure containing the datafiles), the directory at the top of your data collection would be thecollection level. CollectionLevelScanner only scans one level(directory) at a time. The current level is the directory that you wantto scan. The catalog level is were it gets confusing. The issue is ifyou want to build a catalog that shows multiple levels of thecollection. To do this, you need to construct a CollectionLevelScannerand call generateCatalog() on it for each level to be cataloged and thenpiece the resulting catalogs together (see StandardCatalogBuilder for anexample of this). Because the service base URLs might be relative, youneed to know the relative location of the datasets currently beingcrawled to construct the dataset URL. The catalog level specifies therelative location. You leave catalog level null if you are not buildinga multi-level catalog.

If I ever get around to refactoring this stuff, this is one of thethings I might rethink. Sorry it is confusing. Hope this explanation helps.

5) You can see the XML representation of an InvCatalog by creating anInvCatalogFactory


     InvCatalogFactory fac = InvCatalogFactory.getDefaultFactory( false );

and calling one of the writeXML() methods on it.

Hope this all helps. Let me know if you have more questions.

Ethan

Bas Retsios wrote:

Hello Ethan,

Thank you for your e-mail.
Now I hope you have some time and patience for helping me, as I wasconvinced that the best approach is to go for a CrawlableDataset.
I have come up with the following approach:
- Understand how the class CrawlableDatasetFile works
- For this, I will need to easily "run" a part of Thredds from withinEclipse. As I can not find out if there is existing code that doesthis, I will make my own "Test" class for performing some callsinvolving CrawlableDatasetFile.- After understanding the existing functionality, I will make asimilar class, named e.g. CrawlableDatasetDods. If more classes areneeded, I will also make them. As much as possible, I will copy codefrom DodsDirDatasetSource and similar classes.- I will change my "Test" class, to make calls that involve the newCrawlableDatasetDods classes.- All new CrawlableDatasetDods-related classes will be testedextensively with my "Test" class, using at least two different DODSservers as a source.- As I do not know how to make the thredds.war file, I will submit allclasses to you, then I would like you to send me a new .war filecontaining the entire thredds.- If you think the classes I have created are useful, you can includethem in the official Thredds release. I have to discuss the copyrightwith my boss, but we do have an "open-source" policy for everything wedevelop.- Unless I have made a terrible mistake in estimating the time I wouldneed for the above, my intention is to have everything ready within 1week.
Please inform me if you have a better approach.
I have started making a simple "Test" class. See attachment:Test.java. In this file, you will see that I could not figure out howto properly use CollectionLevelScanner, CrawlableDataset, InvServiceand InvCatalogImpl.Could you please help me improve the "Test" class, and in particularin the following aspects:1. I have no data that CrawlableDatasetFile could crawl. Do you have alink to some data that I can extract on my harddisk?2. Help me understand collectionPath, collectionLevel and catalogLevelby changing my code in Test.java (note that I already read the JavaDocof CollectionLevelScanner before writing Test.java)3. Point me to some code with which I could see that a properInvCatalogImpl was generated (other than catalog.getName()).
Thanks in advance,

Bas Retsios.

=
Software Developer
IT Department, Sector Remote Sensing & GIS
International Institute for Geo-information Science and EarthObservation (ITC)
P.O. Box 6,  7500 AA Enschede, The Netherlands
Phone +31 (0)53 4874 573, telefax +31 (0)53 4874 335
E-mail retsios@xxxxxx, Internet http://www.itc.nl
------------------------------------------------------------------------

package thredds.cataloggen;

import java.io.IOException;
import java.lang.reflect.InvocationTargetException;

import thredds.catalog.InvCatalogImpl;
import thredds.catalog.InvService;
import thredds.crawlabledataset.CrawlableDatasetFactory;
import thredds.crawlabledataset.CrawlableDataset;
import thredds.crawlabledataset.CrawlableDatasetFilter;

public class Test {

        public static void main(String[] args) throws IOException,
                        IllegalArgumentException, ClassNotFoundException,
                        NoSuchMethodException, IllegalAccessException,
                        InvocationTargetException, InstantiationException {

                String collectionPath = "file:///d:/thredds/";
                CrawlableDataset collectionLevel = CrawlableDatasetFactory
                                .createCrawlableDataset("file:///d:/thredds/", 
null, null);
                CrawlableDataset catalogLevel = CrawlableDatasetFactory
                                .createCrawlableDataset("file:///d:/thredds/", 
null, null);

                CrawlableDataset currentLevel = null;
                CrawlableDatasetFilter filter = null;

                InvService service = new InvService("my_service", "File",
                                "file:///d:/", null, null);

                CollectionLevelScanner cls = new 
CollectionLevelScanner(collectionPath,
                                collectionLevel, catalogLevel, currentLevel, 
filter, service);
                cls.scan();
                InvCatalogImpl catalog = cls.generateCatalog();
                String s = catalog.getName();
                int i = 0;
        }

}


--
Ethan R. Davis                                Telephone: (303) 497-8155
Software Engineer                             Fax:       (303) 497-8690
UCAR Unidata Program Center                   E-mail:    edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO  80307-3000                       http://www.unidata.ucar.edu/
---------------------------------------------------------------------------

References:
- Re: FW: Catalog generator crawling remote OPenDAP/DODS servers
  - From: Ethan Davis

2006 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: