Re: Parsing and missparsing XML

Hi Benno,

Benno Blumenthal wrote:
> 
> In looking through my logs, I noticed that fastsearch.net has managed
> (somehow) to find  my thredds directory, but seems to be misparsing it.
> 
> 
> The top of the directory is at
> 
> http://iridl.ldeo.columbia.edu/SOURCES/thredds.xml
> 
> and that file has lines in it like
> 
> <catalogRef xlink:title="DASILVA"
> xlink:href="http://iridl.ldeo.columbia.edu/SOURCES/.DASILVA/thredds.xml"/>
> 
> The robot is hitting  urls like
> 
> http://iridl.ldeo.columbia.edu/SOURCES/.DASILVA/thredds.xml"/
> 
> 
> which I am presuming to mean that it does not understand the   '/>' notation
> to end the tag.
> 
> 
> Are we using a non-standard xml notation?   I was just following the example I
> was given.

No, that is standard XML. Perhaps the crawler is confused by the differences
between XML and HTML. But even so, I would think it would stop for the '"'.
Maybe the crawler would do better if there were a space before the "/>" but
either way is valid XML.

Ethan


> Benno
> 
> 
> 
> --
> Dr. M. Benno Blumenthal          benno@xxxxxxxxxxxxxxxx
> International Research Institute for climate prediction
> The Earth Institute at Columbia University
> Lamont Campus, Palisades NY 10964-8000   (845) 680-4450
> 
> 

-- 
Ethan R. Davis                       Telephone: (303) 497-8155
Software Engineer                    Fax:       (303) 497-8690
UCAR Unidata Program Center          E-mail:    edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO  80307-3000              http://www.unidata.ucar.edu/