[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000926: query about reclass messages from UWisconsin



>From: Pete Pokrandt <address@hidden>
>Organization: University of Wisconsin-Madison
>Keywords: 200009260222.e8Q2Mcb03262 LDM reclass

Hi Pete,

> I did see the gazillion reclass statements from around 01 UTC:
> 
> Sep 26 01:06:15 5Q:sunset cirrus(feed)[5349]: Starting Up: 20000926010103.644 
> TS_ENDT {{IDS|DDPLUS,  "(^[A-OQ-X])|(^[YZ].[^AHIJRU])"
> }}
> Sep 26 01:06:15 5Q:sunset cirrus(feed)[5349]: topo:  cirrus.lsc.vsc.edu 
> IDS|DDPLUS
> Sep 26 01:06:16 5Q:sunset cirrus(feed)[5349]: RECLASS: 20000926000615.969 
> TS_ENDT {{IDS|DDPLUS,  "(^[A-OQ-X])|(^[YZ].[^AHIJRU])"}}
> Sep 26 01:06:16 5Q:sunset cirrus(feed)[5349]: RECLASS: 20000926000616.211 
> TS_ENDT {{IDS|DDPLUS,  "(^[A-OQ-X])|(^[YZ].[^AHIJRU])"}}
> Sep 26 01:06:16 5Q:sunset cirrus(feed)[5349]: RECLASS: 20000926000616.337 
> TS_ENDT {{IDS|DDPLUS,  "(^[A-OQ-X])|(^[YZ].[^AHIJRU])"}}

 ...

Without the log from the downstream site, we can't tell for sure, but
this just looks like the results of a somewhat flaky network
connection in combination with the fact that sunset was catching up
itself, so downstream sites are even further behind until sunset
catches up to current data feeds from its upstream sites.

It doesn't appear to be connected with the alignment bug that you
found, since I'm fairly confident that no misaligned access would ever
occur subsequently once pqcreate works successfully.  Also, if a
misaligned access ever occurred in a feeder process, it would exit
with a bus error, as pqcreate did, and I don't think that would
generate RECLASS messages on either the upstream or downstream hosts.

I' pretty sure I fixed the misaligned access bug of pqcreate
yesterday.  We've been testing it here on an SGI MIPS 4000 platform,
but still need to test it on other platforms in both 32-bit and 64-bit
mode before we can make a release.  I'd give you the small patch to
src/pq/pq.c if you want it, but you might want to wait until we've
finished more testing here and declare it fixed, since you have a
workaround.

--Russ