[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000923: Sunset.meteor.wisc.edu major problems



Below is the set of messages relevant to the diagnosis of a problem
with LDM 5.1.2 pqcreate resulting in a SIGBUS error on SGI/IRIX 32-bit
platforms for certain combinations of queue size and number of
products.

--Russ

  To: address@hidden, address@hidden
  From: address@hidden (Pete Pokrandt)
  Reply-to: address@hidden
  Subject: Sunset.meteor.wisc.edu major problems
  Date: Sat, 23 Sep 2000 11:10:50 -0500

  >To: address@hidden
  >From: address@hidden (Pete Pokrandt) 
  >Subject: Re: 20000923: Sunset.meteor.wisc.edu major problems
  >Organization: Dept of Atmos & Oceanic Sciences, University of 
Wisconsin-Madison
  >Keywords: sigbus, bus error, SGI/IRIX

  Hi all,

  Anyone feeding from sunset.meteor.wisc.edu, please fail over
  to your backup until further notice. I'm having major problems
  with the ldm and/or machine crashing regularly. I suspect either
  a bad disk or perhaps a memory problem, but I can't go in
  to deal with it right now, since there's a UW/Northwestern Football
  game happening 2 blocks away from our building. 

  I'll try to get in tonight to have a look and try to see what's
  going on.  If it looks like an extended outage, I'll try to
  get everyone set up on profhorn.meteor.wisc.edu as a backup.

  Unidata support: can you verify that profhorn.meteor.wisc.edu
  is allowed to feed from motherlode?  And if not, can it be
  added until I figure out what's up with sunset? Thanks.


  Sorry for the hassles..

  Pete

  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  +<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+


  To: address@hidden
  Subject: Re: Sunset.meteor.wisc.edu major problems
  Date: Sat, 23 Sep 2000 10:55:58 -0600
  From: Russ Rew <address@hidden>

  Hi Pete,

  > Unidata support: can you verify that profhorn.meteor.wisc.edu
  > is allowed to feed from motherlode?  And if not, can it be
  > added until I figure out what's up with sunset? Thanks.

  I've verified that you should be able to feed from motherlode, because
  it's ldmd.conf contains the following line:

  allow   UNIDATA|FSL2    ^(sunset|profhorn)\.meteor\.wisc\.edu$

  --Russ


  To: address@hidden, address@hidden
  From: address@hidden (Pete Pokrandt)
  Subject: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
  Date: Sat, 23 Sep 2000 13:34:21 -0500


  Hi all,

  Thanks to Unidata support, working hard on a Saturday, anyone who
  normally feeds from sunset.meteor.wisc.edu can instead feed from
  profhorn.meteor.wisc.edu until sunset is fixed and happy again.

  One word of caution, I'm going to be slowly piping through the data from
  the UIUC archive site that I've missed since 0800 UTC, so you may end
  up getting more data than you are expecting until the backlog flushes
  through.

  ALso, profhorn.meteor.wisc.edu is the machine that I use to also
  ingest the high bandwidth NMC2 feed, so I'm not sure if the 10 mbps
  line into profhorn will handle the load of everyone feeding from
  it in addition to the NMC2 feed.  I'll keep an eye on it and
  let you all know if it seems to be a problem.

  I'll be in this evening to try to figure out what's up on sunset. Very
  frustrating, at first, the ldm was crashing, but now I can't even get
  pqcreate to run. It dumps a core as soon as the queue file has grown to
  it's complete size.. I've tried it on different disk drives as well, so
  it's not a bad disk. Strange.. 

  I'm going to try first swapping in some different RAM, and if that
  doesn't work, maybe a new mother board.. Nice to just happen to have a
  few spare parts lying around.. Unidata Support: does this sound to you
  like a memory problem? I have not seen any bad memory info in my system
  logs.

  Pete


  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  +<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+


  To: address@hidden
  Cc: support-ldm
  Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
  Date: Sat, 23 Sep 2000 15:44:12 -0600
  From: Russ Rew <address@hidden>

  Pete,

  > I'll be in this evening to try to figure out what's up on sunset. Very
  > frustrating, at first, the ldm was crashing, but now I can't even get
  > pqcreate to run. It dumps a core as soon as the queue file has grown to
  > it's complete size.. I've tried it on different disk drives as well, so
  > it's not a bad disk. Strange.. 

  Please send us (address@hidden) the command line you use to
  invoke pqcreate and if possible also a traceback from when it crashes.
  You can get the traceback by running it until it crashes and leaves a
  "core" file, then running "dbx" (or whatever debugger you use, I'm not
  sure what platform you are running this on) giving as arguments the
  pqcreate executable and the core file, something like:

    % dbx /usr/local/ldm/bin/pqcreate core

  At this point dbx may produce a bunch of output, but when it finally
  gives you a prompt, type "where" and then cut and paste the output to
  me, along with how you invoked pqcreate.

  Also it's just worth checking that you are creating the product queue
  on a local disk rather than a remotely mounted disk.  The latter won't
  work, but it should give an error message rather than just dumping
  core ...

  > I'm going to try first swapping in some different RAM, and if that
  > doesn't work, maybe a new mother board.. Nice to just happen to have a
  > few spare parts lying around.. Unidata Support: does this sound to you
  > like a memory problem? I have not seen any bad memory info in my system
  > logs.

  Good luck.  It doesn't sound like a memory problem to me, but I
  haven't had any memory problems recently, so I'm not sure what the
  symptoms would be.  The system should do a memory check when you
  reboot it, which should catch most memory errors.

  --Russ


  To: Russ Rew <address@hidden>
  cc: address@hidden
  Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu 
  In-reply-to: Your message of "Sat, 23 Sep 2000 15:44:12 MDT."
               <address@hidden> 
  Date: Sat, 23 Sep 2000 17:53:07 -0500
  From: Pete Pokrandt <address@hidden>

  Russ,

  This is the same exact setup that has been running mostly flawlessly
  for months. Every so often the ldm will die, usually seems to be
  related to an increase in the data volume. Usually deleting and
  re-making the queue will solve the problem and it'll run for weeks 
  with no problems.

  Yesterday the ldm crashed, so I redid the queue and restarted, then
  last night the machine hung, so I rebooted, redid the queue and 
  started again. It ran for about 1/2 hour and died, so I redid
  the queue again, then again.. you get the picture.. Then this
  morning after another reboot I tried again to make the queue and
  started getting the core dumps.  

  One strange thing is, it works ok for a 2.5 Mb (yes, that small, I've
  tried lots of things :) queue, but 5 Mb, 25 Mb, 250 Mb, 400 Mb, and 
  600 Mb (my normal queue size as of late) all dump a core.

  It is on a local disk, not an nfs mounted one.

  I'm running on an SGI R4000 with IRIX 6.5, 192 Mb of RAM, roughly 200 Mb of 
swap

  As for starting it, I'm just running a normal ldmadmin mkqueue.

  I believe the command that it is spawning is:

  pqcreate -q /usr2/ldm/ldm.pq -s 25000000

  sunset 10% ldmadmin mkqueue
  Sep 23 22:43:37 UTC sunset.meteor.wisc.edu : make_pq: mkqueue failed


  Here's the output from dbx:

  unset 31% dbx /usr/local/ldm/bin/pqcreate core
  dbx version 7.2.1 patch 2991 May 14 1998 17:09:10
  Core from signal SIGBUS: Bus error
  (dbx) where
  >  0 sx_init(sx = 0x5833a64, nalloc = 6103) 
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":2200, 0x1000a418]
     1 ctl_init(pq = 0x10033fe0, align = 8) 
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":3783, 0x1000ec4c]
     2 pq_create(path = 0x7fff3012 = "/usr2/ldm/ldm.pq", mode = 438, pflags = 
0, align = 8, initialsz = 25000000, nproducts = 6103, pqp = 0x7fff2dec) 
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":4306, 0x100105c4]
     3 main(ac = 7, av = 0x7fff2e64) 
["/usr/local/ldm/ldm-5.1.2/src/pqcreate/pqcreate.c":186, 0x10003790]
     4 __start() 
["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/csu/crt1text.s":177, 
0x10003184]

  Let me know if this helps at all, I'm still plannign to go in tonight
  to start swapping hardware to see if that makes a difference.

  You think maybe I should recompile the ldm? Perhaps some of the binaries
  got fu-bar'd somehow?

  Thanks for the help!

  Pete


  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+


  Date: Sat, 23 Sep 2000 19:43:32 -0600 (MDT)
  From: Steve Chiswell <address@hidden>
  To: Pete Pokrandt <address@hidden>
  cc: address@hidden, address@hidden
  Subject: 20000923: sunset downstream sites may feedfrom 
profhorn.meteor.wisc.edu
  In-Reply-To: <address@hidden>


  Pete,

  pqcreate would core dump if you ran out of disk space while trying to create 
the
  queue.....or if creating the queue was excercising some bad disk blocks.
  Assuming you have plenty of disk space, you might want to try the format
  utility to test the disk for bad sectors - and map them out if found.


  Steve Chiswell


  To: Steve Chiswell <address@hidden>
  cc: address@hidden, address@hidden
  Subject: Re: 20000923: sunset downstream sites may feedfrom 
profhorn.meteor.wisc.edu 
  In-reply-to: Your message of "Sat, 23 Sep 2000 19:43:32 MDT."
               <address@hidden> 
  Date: Sat, 23 Sep 2000 21:00:15 -0500
  From: Pete Pokrandt <address@hidden>


  In a previous message to me, you wrote: 

   >
   >
   >Pete,
   >
   >pqcreate would core dump if you ran out of disk space while trying to 
create the
   >queue.....or if creating the queue was excercising some bad disk blocks.
   >Assuming you have plenty of disk space, you might want to try the format
   >utility to test the disk for bad sectors - and map them out if found.
   >
   >
   >Steve Chiswell
   >

  Steve,

  The disk is not full, and in fact I have tried it on more than one
  disk, and get the same results on both.  I'll try it on a third and
  see if it still happens. 

  I also just recompiled the ldm, I'll see if that makes any difference.

  Thanks,

  Pete


  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+


  To: Steve Chiswell <address@hidden>
  cc: address@hidden, address@hidden
  Subject: Re: 20000923: sunset downstream sites may feedfrom 
profhorn.meteor.wisc.edu 
  In-reply-to: Your message of "Sat, 23 Sep 2000 19:43:32 MDT."
               <address@hidden> 
  Date: Sat, 23 Sep 2000 21:15:07 -0500
  From: Pete Pokrandt <address@hidden>


  Steve and all,

  Recompiled the ldm, still dumps core.

  Tried to build the queue on yet a third disk, still dumps core.

  Here's the stack from dbx on the pqcreate core file:

  sunset 18% dbx ~/bin/pqcreate core
  dbx version 7.2.1 patch 2991 May 14 1998 17:09:10
  where
  Core from signal SIGBUS: Bus error
  (dbx) >  0 sx_init(sx = 0x5833a64, nalloc = 6103) 
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":2200, 0x1000a418]
     1 ctl_init(pq = 0x10033fe0, align = 8) 
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":3783, 0x1000ec4c]
     2 pq_create(path = 0x7fff300c = "/cool.pretty/ldm/ldm.pq", mode = 438, 
pflags = 1, align = 8, initialsz = 25000000, nproducts = 6103, pqp = 
0x7fff2dec) ["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":4306, 0x100105c4]
     3 main(ac = 5, av = 0x7fff2e64) 
["/usr/local/ldm/ldm-5.1.2/src/pqcreate/pqcreate.c":186, 0x10003790]
     4 __start() 
["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/csu/crt1text.s":177, 
0x10003184]
  (dbx) 

  If I make the queue size small enough - rediculously small, 
  280000 bytes, then it is successful. 

  Check out this sequence of pqcreate commands (I deleted the ldm.pq in
  between each one from a different window):

  sunset 22% pqcreate -q /cool.pretty/ldm/ldm.pq -v -f -s 250000
  Creating /cool.pretty/ldm/ldm.pq, 250000 bytes, 61 products.
  pqcreate: create "/cool.pretty/ldm/ldm.pq" failed: File exists
  sunset 23% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 400000
  Creating /cool.pretty/ldm/ldm.pq, 400000 bytes, 97 products.
  Bus error (core dumped)
  sunset 24% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 300000
  Creating /cool.pretty/ldm/ldm.pq, 300000 bytes, 73 products.
  Bus error (core dumped)
  sunset 25% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 260000
  Creating /cool.pretty/ldm/ldm.pq, 260000 bytes, 63 products.
  sunset 26% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 270000
  Creating /cool.pretty/ldm/ldm.pq, 270000 bytes, 65 products.
  sunset 27% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 280000
  Creating /cool.pretty/ldm/ldm.pq, 280000 bytes, 68 products.
  Bus error (core dumped)
  sunset 28% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 275000
  Creating /cool.pretty/ldm/ldm.pq, 275000 bytes, 67 products.
  sunset 29% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 278000
  Creating /cool.pretty/ldm/ldm.pq, 278000 bytes, 67 products.
  sunset 30% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 279000
  Creating /cool.pretty/ldm/ldm.pq, 279000 bytes, 68 products.
  Bus error (core dumped)

  For whatever reason, 67 products is ok, but 68 is a no-go.

  The exact same behavior is exhibited no matter what local disk
  I try to create the queue on:

  sunset 32% pqcreate -q /usr2/ldm/ldm.pq -v -s 278000
  Creating /usr2/ldm/ldm.pq, 278000 bytes, 67 products.
  sunset 33% pqcreate -q /usr2/ldm/ldm.pq -v -s 279000
  Creating /usr2/ldm/ldm.pq, 279000 bytes, 68 products.
  Bus error (core dumped)


  I'm really stumped here.. could it be something with the
  memory mapping?  In all cases, it seems to create the entire
  length of the file, and right at the very end, when the queue
  size is almost at, or at it's proper size, that's when the core
  dump occurs.

  I'm going to swap in some different RAM and if that doesn't work,
  a new mother board, to see if either of those make any difference.

  Pete

  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+

  To: Steve Chiswell <address@hidden>
  cc: address@hidden, address@hidden
  Subject: Re: 20000923: sunset downstream sites may feedfrom 
profhorn.meteor.wisc.edu 
  In-reply-to: Your message of "Sat, 23 Sep 2000 19:43:32 MDT."
               <address@hidden> 
  Date: Sat, 23 Sep 2000 21:27:19 -0500
  From: Pete Pokrandt <address@hidden>


  In a previous message to me, you wrote: 

   >
   >
   >Pete,
   >
   >pqcreate would core dump if you ran out of disk space while trying to 
create the
   >queue.....or if creating the queue was excercising some bad disk blocks.
   >Assuming you have plenty of disk space, you might want to try the format
   >utility to test the disk for bad sectors - and map them out if found.
   >
   >
   >Steve Chiswell
   >

  Steve and all,

  Update number n+3.. new motherboard, new memory, same problem. Still
  dumping core.  I suppose it is possible that all 3 of the disks that
  I am running this on have bad blocks on them, I'll give the format
  util a try and see if I can find anything along those lines.

  Pete

  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+

  To: Pete Pokrandt <address@hidden>
  Cc: support-ldm, chiz, rkambic
  Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
  Date: Sat, 23 Sep 2000 23:41:23 -0600
  From: Russ Rew <address@hidden>

  Pete,

  Sorry to hear replacing the memory and the other things you've tried
  haven't fixed the problem.  The dbx traceback you sent showing the bus
  seems to indicate an alignment problem, as if something is being
  stored at an address that is not properly aligned for the type of data
  that is stored there, for example trying to store a 32-bit integer at
  an odd byte address.

  I can't remember seeing anything quite like that, and I couldn't
  reproduce the problem on an SGI/IRIX 6.5 platform here.

  Your experiment with changing queue sizes to show that 67 products
  works but 68 products doesn't leads me to believe you might be able to
  explicitly set the number of products to a larger number using the
  "-S" option to pqcreate.  While you're at it, you should probably be
  using the "-c" (clobber) option as well, so you don't have to manually
  delete the queue each time before you create a new one.

  pqcreate just divides the queue size by 4096 to get the number of
  product slots to use, but you can specify a different number with the
  -S option, something like:

    pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6101

  for example to make the queue have 6101 product slots instead of 6103.
  If you played around with this, you might find a value that worked
  with a large queue and there might be a pattern to the bus errors that
  depends on the number of product slots.

  This is pure speculation since I can't reproduce the problem, but
  maybe you are compiling with a compiler flag or optimization level
  that changes the alignment restrictions.  For example, if you set the
  highest level of optimization when compiling, maybe that requires
  strict alignment, whereas if you don't specify optimization but
  instead use the debugging flag "-g", looser alignment works.

  I'm afraid I'll have to wait until Monday to pursue this, but a little
  more information might help:

   - Do you have the CFLAGS environment variable set when you build the
     LDM?  If so, what value?

   - Is this the first time you've tried LDM 5.1.2 on this SGI/IRIX
     platform (sunset)?  If so, what version were you running with
     successfully before?

   - What kind of platform is profhorn?  Are you using LDM 5.1.2 on it?

  You may have found a platform-specific bug in LDM 5.1.2, but until we
  can reproduce it, we'll have trouble fixing it ...

  --Russ


  To: Russ Rew <address@hidden>
  cc: address@hidden
  Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu 
  In-reply-to: Your message of "Mon, 25 Sep 2000 12:44:23 MDT."
               <address@hidden> 
  Date: Mon, 25 Sep 2000 14:11:22 -0500
  From: Pete Pokrandt <address@hidden>


  In a previous message to me, you wrote: 

   >Pete,
   >
   >> I'm running on an SGI R4000 with IRIX 6.5, 192 Mb of RAM, roughly 200 Mb 
of
    > swap
   >
   >> Recompiled the ldm, still dumps core.
   >
   >Could you please try using our precompiled binary for SGI/IRIX
   >platforms on sunset, instead of what you compiled?  Maybe just use
   >pqcreate out of our binary to see if it fails the same way yours
   >does.  This would eliminate a lot of the possible sources of problems,
   >such as which compiler with which flags and libraries you used to
   >build LDM 5.1.2.
   >

  Russ,

  Your pqcreate also dumps core.  Also rebuilt the kernel and
  no luck.

   >
   >Also, did you get the message I sent Saturday night?  If not, I've
   >appended another copy.

  Yes, but in the flurry of things I was trying I totally forgot about
  it.  I'll go through that now and get back to you.

  Thanks again,

  Pete


  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+


  To: Russ Rew <address@hidden>
  Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu 
  In-reply-to: Your message of "Mon, 25 Sep 2000 12:44:23 MDT."
               <address@hidden> 
  Date: Mon, 25 Sep 2000 14:57:07 -0500
  From: Pete Pokrandt <address@hidden>


  In a previous message to me, you wrote: 

   >
   > Pete,
   >
   >
   > Your experiment with changing queue sizes to show that 67 products
   > works but 68 products doesn't leads me to believe you might be able to
   > explicitly set the number of products to a larger number using the
   > "-S" option to pqcreate.  While you're at it, you should probably be
   > using the "-c" (clobber) option as well, so you don't have to manually
   > delete the queue each time before you create a new one.
   >
   > pqcreate just divides the queue size by 4096 to get the number of
   > product slots to use, but you can specify a different number with the
   > -S option, something like:
   >
   >   pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6101
   >
   > for example to make the queue have 6101 product slots instead of 6103.
   > If you played around with this, you might find a value that worked
   > with a large queue and there might be a pattern to the bus errors that
   > depends on the number of product slots.


  Russ,

  sunset 35% pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6101
  Creating /cool.pretty/ldm/ldm.pq, 25000000 bytes, 6101 products.

  No core dump (woohoo!).

  I'll play around with it some more and see if that queue actually
  works with the ldm..   Shouldn't be any reason why it wouldn't, right?


   > I'm afraid I'll have to wait until Monday to pursue this, but a little
   > more information might help:
   >
   >  - Do you have the CFLAGS environment variable set when you build the
   >    LDM?  If so, what value?

  Shouldn't be, I'm just running with a straight ./configure with
  no CFLAGS env variable set.

   >
   >  - Is this the first time you've tried LDM 5.1.2 on this SGI/IRIX
   >    platform (sunset)?  If so, what version were you running with
   >    successfully before?

  I have been running ldm-5.1.2 on sunset since Sept 2, and a beta
  version before that since August 4.  Both ran just fine up until
  Friday.  That's the most bizzare part of this, I didn't change
  anything, it just stopped working.. Kinda scary.


   >
   >  - What kind of platform is profhorn?  Are you using LDM 5.1.2 on it?

  profhorn is RedHat Linux 

  Red Hat Linux Red Hat Linux release 6.1 (Cartman)
  Kernel 2.2.14 on an i686

  It is running ldm-5.1.2 beta1 since August 7.

   >
   > You may have found a platform-specific bug in LDM 5.1.2, but until we
   > can reproduce it, we'll have trouble fixing it ...
   >
   > --Russ
   >


  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+

  To: Russ Rew <address@hidden>
  cc: address@hidden
  Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu 
  In-reply-to: Your message of "Mon, 25 Sep 2000 12:44:23 MDT."
               <address@hidden> 
  Date: Mon, 25 Sep 2000 16:03:57 -0500
  From: Pete Pokrandt <address@hidden>


  Russ,

  I have been playing a bit more with the queue sizes.. It seems
  that you are correct, that only certain values for the number
  of products work.

  I have had success with these:

  -----
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6101
  (where the default would have been 6103)

  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6099
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6098
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6097
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6094
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6093
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6089


  and
  pqcreate -c -q /usr3/ldm/data/ldm.pq -v -s 650000000 -S 158689
  (where the default would have been 158691)
  -----

  The following all failed:

  Default:
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000
  Creating /cool.pretty/ldm/ldm.pq, 25000000 bytes, 6103 products.
  Bus error (core dumped)

  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6102
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6100
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6096
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6095
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6092
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6091
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6090
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6088
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6087
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6086
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6085
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6084
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6083
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6082
  pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6081


  Pete

  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+

  To: Russ Rew <address@hidden>
  Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu 
  In-reply-to: Your message of "Mon, 25 Sep 2000 15:56:31 MDT."
               <address@hidden> 
  Date: Mon, 25 Sep 2000 17:05:48 -0500
  From: Pete Pokrandt <address@hidden>


  In a previous message to me, you wrote: 

   >Pete,
   >
   >Thanks for trying our binary and for reporting back on LDM 5.1.2
   >pqcreate values that worked and the ones that caused bus errors on
   >SGI/IRIX.  You're the first one to report this bug, and we have now
   >reproduced it here so we have a chance of fixing it.  The bus error
   >occurs under the following circumstances:
   >
   > - SGI/IRIX 32-bit platform (things seems to work fine on 64-bit IRIX
   >   platforms when compiled with -64 flag)
   >
   > - LDM 5.1.2 (things seem to work with LDM 5.1.2beta3, so this bug was
   >   introduced late in development)
   >
   > - certain values of queue size and number of products, as you have
   >   reported
   >
   >The workaround, to try different values of number of products with
   >"-S" option to pqcreate, will get you going until I can deliver the
   >real fix.

  Russ,

  Got it, yeah, I am running now with the 650 Mb queue I produced with
  Default - 2 and it's running fine.  I must have just been lucky
  with the previous size queues I had been running with.

  Glad I could help find the bug.. well, kinda.. :)

   >
   >I'll put some sort of announcement onto the ldm-users mailing list
   >about this bug and the work-around soon.
   >
   >Thanks again for your persistence, and sorry we didn't catch this
   >during testing ...

  No problem, I'm just happy to have a solution that works..

  Pete


  --
  +>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
  ^ Pete Pokrandt                    V 1447  AOSS Bldg  1225 W Dayton St^
  ^ Systems Programmer               V Madison,         WI     53706    ^
  ^                                  V      address@hidden       ^
  ^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
  ^ University of Wisconsin-Madison  V       262-0166 (Fax)             ^
  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+