[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000216: [Fwd: LDM version 5.0.9]



Jeff Ator wrote:

> Anne Wilson wrote:
>
> > Jeff,
> >
> > We have a hypothesis.  It is the case that the
> > products you are receiving are relatively small.
> > The LDM assumes a default product size of 2048 and
> > calculates the number of product slots by
> > <queueSize>/2048. Your products tend to be
> > signficantly smaller.  Thus, it is possible that
> > your queue could run out of slots.
> >
> > I understand that everything was working under
> > IRIX 5.0.5.  But, since you allocated 75Mb perhaps
> > that gave you enough slots that the queue would
> > not need to grow.  Please look at the log entries
> > for this configuration and see what the queue
> > usage is there.
> >
>
> Actually, even the 5.0.5 queue has grown some, albeit over a long period
> of time, under IRIX64.  I apologize if I didn't make this clear in our
> earlier communications.  It was initially allocated with "ldmadmin
> mkqueue -s 75000000" on 7/29/1999, and is currently at around 84.3Mb.
> So it has grown some, but not very much, and all this time we have been
> running pqexpire from cron with "-a 12" to always keep the most recent
> 12 hours of data in the queue.  A recent output from one of these
> pqexpire runs shows that the queue usage is good, around 75Mb:
>
> Feb 18 14:51:04 pqexpire[4932]: Starting Up
> Feb 18 14:56:24 pqexpire[4932]: Exiting
> Feb 18 14:56:24 pqexpire[4932]: > Up since:      20000218145104.783
> Feb 18 14:56:24 pqexpire[4932]: > Queue usage (bytes):75005952
> Feb 18 14:56:24 pqexpire[4932]: >          (nregions):  128589
> Feb 18 14:56:24 pqexpire[4932]: > nbytes recycle:      4068520 (
> 4040.195 kb/hr)
> Feb 18 14:56:24 pqexpire[4932]: > nprods deleted:         8158 (
> 8295.634 per hour)
> Feb 18 14:56:24 pqexpire[4932]: > First deleted: 20000218015200.014
> Feb 18 14:56:24 pqexpire[4932]: > Last  deleted: 20000218025100.286
>
> >
> > Also, we would like you to test 5.0.9 on IRIX
> > using identical parameters to 5.0.5.  That is,
> > start with a 75Mb queue.  Please let us know the
> > results, especially the queue usage.
>
> OK, I began this test at around 1630z this morning.  Specifically, I
> created new ldm.pq files at 75Mb under 5.0.9 on our AIX4.3 and IRIX64
> machines and have been forking a copy of our regular data feed to each
> (via the 5.0.9 version of pqing) since 1632z and 1649z, respectively.
> On the AIX4.3 box, I began seeing "Deleting oldest to get a queue slot"
> messages at 2032z (after exactly 4 hours); whereas, on the IRIX64 box, I
> began seeing "Growing index at 16384" messages at 2058z (after 4 hours,
> 9 minutes).  So the "filling up" of the queue appears comparable on both
> systems under 5.0.9.  It will be late tonight before 12 hours of the
> test have elapsed and I can get a good feel for how big the IRIX64 5.0.9
> queue grows to hold that much data.  At that point, we could compare its
> size with that of the IRIX64 5.0.5 queue and see if it has also grown to
> around 85Mb.  If so, then it may be that the 5.0.5 and 5.0.9 versions
> are comparable in how big a queue it takes to hold 12 hours of data, and
> we can then shift our focus to the peculiarities of mmap under AIX.
> Unfortunately, I will be away from the office (and probably not able to
> check on the test results!) until next Thursday, so I will email you
> those results at that time.  The test will have run for almost a week at
> that point, so we should have some fairly reliable results.  Anyway, for
> now, here are the pqexpire results (showing queue usage statistics) for
> the past few hours (i.e. since the start of the test) on the:
>
> AIX4.3 (5.0.9)
>
> Feb 18 16:51:01 nco3n03 pqexpire[28758]: Starting Up
> Feb 18 16:51:06 nco3n03 pqexpire[28758]: Exiting
> Feb 18 16:51:06 nco3n03 pqexpire[28758]: > Up since:
> 20000218165101.196
> Feb 18 16:51:06 nco3n03 pqexpire[28758]: > Queue usage (bytes): 1341360
> Feb 18 16:51:06 nco3n03 pqexpire[28758]: >          (nregions):    3907
> Feb 18 16:51:06 nco3n03 pqexpire[28758]: > nprods deleted 0
> Feb 18 17:51:01 nco3n03 pqexpire[19050]: Starting Up
> Feb 18 17:51:23 nco3n03 pqexpire[19050]: Exiting
> Feb 18 17:51:23 nco3n03 pqexpire[19050]: > Up since:
> 20000218175101.317
> Feb 18 17:51:23 nco3n03 pqexpire[19050]: > Queue usage (bytes): 5542640
> Feb 18 17:51:23 nco3n03 pqexpire[19050]: >          (nregions):   12558
> Feb 18 17:51:23 nco3n03 pqexpire[19050]: > nprods deleted 0
> Feb 18 18:51:00 nco3n03 pqexpire[28656]: Starting Up
> Feb 18 18:51:47 nco3n03 pqexpire[28656]: Exiting
> Feb 18 18:51:47 nco3n03 pqexpire[28656]: > Up since:
> 20000218185100.566
> Feb 18 18:51:47 nco3n03 pqexpire[28656]: > Queue usage (bytes):10600544
> Feb 18 18:51:47 nco3n03 pqexpire[28656]: >          (nregions):   22034
> Feb 18 18:51:47 nco3n03 pqexpire[28656]: > nprods deleted 0
> Feb 18 19:51:00 nco3n03 pqexpire[28508]: Starting Up
> Feb 18 19:51:59 nco3n03 pqexpire[28508]: Exiting
> Feb 18 19:51:59 nco3n03 pqexpire[28508]: > Up since:
> 20000218195100.187
> Feb 18 19:51:59 nco3n03 pqexpire[28508]: > Queue usage (bytes):15133528
> Feb 18 19:51:59 nco3n03 pqexpire[28508]: >          (nregions):   31078
> Feb 18 19:51:59 nco3n03 pqexpire[28508]: > nprods deleted 0
> Feb 18 20:51:00 nco3n03 pqexpire[18072]: Starting Up
> Feb 18 20:52:06 nco3n03 pqexpire[18072]: Exiting
> Feb 18 20:52:06 nco3n03 pqexpire[18072]: > Up since:
> 20000218205100.977
> Feb 18 20:52:06 nco3n03 pqexpire[18072]: > Queue usage (bytes):18254888
> Feb 18 20:52:06 nco3n03 pqexpire[18072]: >          (nregions):   36692
> Feb 18 20:52:06 nco3n03 pqexpire[18072]: > nprods deleted 0
> Feb 18 21:51:00 nco3n03 pqexpire[30516]: Starting Up
> Feb 18 21:52:07 nco3n03 pqexpire[30516]: Exiting
> Feb 18 21:52:07 nco3n03 pqexpire[30516]: > Up since:
> 20000218215100.355
> Feb 18 21:52:07 nco3n03 pqexpire[30516]: > Queue usage (bytes):18719608
> Feb 18 21:52:07 nco3n03 pqexpire[30516]: >          (nregions):   36692
> Feb 18 21:52:07 nco3n03 pqexpire[30516]: > nprods deleted 0
> Feb 18 22:51:01 nco3n03 pqexpire[25094]: Starting Up
> Feb 18 22:52:07 nco3n03 pqexpire[25094]: Exiting
> Feb 18 22:52:07 nco3n03 pqexpire[25094]: > Up since:
> 20000218225101.028
> Feb 18 22:52:07 nco3n03 pqexpire[25094]: > Queue usage (bytes):18750328
> Feb 18 22:52:07 nco3n03 pqexpire[25094]: >          (nregions):   36692
> Feb 18 22:52:07 nco3n03 pqexpire[25094]: > nprods deleted 0
>
> and IRIX64 (5.0.9)
>
> Feb 18 17:51:04 5Q:sgi98 pqexpire[19576]: Starting Up
> Feb 18 17:51:06 5Q:sgi98 pqexpire[19576]: Exiting
> Feb 18 17:51:06 5Q:sgi98 pqexpire[19576]: > Up since:
> 20000218175104.455
> Feb 18 17:51:06 5Q:sgi98 pqexpire[19576]: > Queue usage (bytes): 4222752
>
> Feb 18 17:51:06 5Q:sgi98 pqexpire[19576]: >          (nregions):    8814
>
> Feb 18 17:51:06 5Q:sgi98 pqexpire[19576]: > nprods deleted 0
> Feb 18 18:51:05 5Q:sgi98 pqexpire[22089]: Starting Up
> Feb 18 18:51:20 5Q:sgi98 pqexpire[22089]: Exiting
> Feb 18 18:51:20 5Q:sgi98 pqexpire[22089]: > Up since:
> 20000218185105.247
> Feb 18 18:51:20 5Q:sgi98 pqexpire[22089]: > Queue usage (bytes): 9175760
>
> Feb 18 18:51:20 5Q:sgi98 pqexpire[22089]: >          (nregions):   18377
>
> Feb 18 18:51:20 5Q:sgi98 pqexpire[22089]: > nprods deleted 0
> Feb 18 19:51:04 5Q:sgi98 pqexpire[2105]: Starting Up
> Feb 18 19:51:16 5Q:sgi98 pqexpire[2105]: Exiting
> Feb 18 19:51:16 5Q:sgi98 pqexpire[2105]: > Up since:
> 20000218195104.958
> Feb 18 19:51:16 5Q:sgi98 pqexpire[2105]: > Queue usage (bytes):13677872
> Feb 18 19:51:16 5Q:sgi98 pqexpire[2105]: >          (nregions):   27255
> Feb 18 19:51:16 5Q:sgi98 pqexpire[2105]: > nprods deleted 0
> Feb 18 20:51:04 5Q:sgi98 pqexpire[11247]: Starting Up
> Feb 18 20:51:33 5Q:sgi98 pqexpire[11247]: Exiting
> Feb 18 20:51:33 5Q:sgi98 pqexpire[11247]: > Up since:
> 20000218205104.436
> Feb 18 20:51:33 5Q:sgi98 pqexpire[11247]: > Queue usage (bytes):17734192
>
> Feb 18 20:51:33 5Q:sgi98 pqexpire[11247]: >          (nregions):   35737
>
> Feb 18 20:51:33 5Q:sgi98 pqexpire[11247]: > nprods deleted 0
> Feb 18 21:51:05 5Q:sgi98 pqexpire[13854]: Starting Up
> Feb 18 21:51:25 5Q:sgi98 pqexpire[13854]: Exiting
> Feb 18 21:51:25 5Q:sgi98 pqexpire[13854]: > Up since:
> 20000218215105.125
> Feb 18 21:51:25 5Q:sgi98 pqexpire[13854]: > Queue usage (bytes):22075696
>
> Feb 18 21:51:25 5Q:sgi98 pqexpire[13854]: >          (nregions):   44431
>
> Feb 18 21:51:25 5Q:sgi98 pqexpire[13854]: > nprods deleted 0
> Feb 18 22:51:04 5Q:sgi98 pqexpire[12959]: Starting Up
> Feb 18 22:51:28 5Q:sgi98 pqexpire[12959]: Exiting
> Feb 18 22:51:28 5Q:sgi98 pqexpire[12959]: > Up since:
> 20000218225104.659
> Feb 18 22:51:28 5Q:sgi98 pqexpire[12959]: > Queue usage (bytes):26171472
>
> Feb 18 22:51:28 5Q:sgi98 pqexpire[12959]: >          (nregions):   52523
>
> Feb 18 22:51:28 5Q:sgi98 pqexpire[12959]: > nprods deleted 0
>
> Again, I'll email you again next Thursday, and we'll see what the queue
> usage statistics are at that time.
>
> >
> >
> > Regardless of the outcome of the above test, you
> > can use the -S option in pqcreate to specify the
> > number of slots you want for your queue.  Give
> > this a try, but, if possible, please do the above
> > tests first so we can confirm our suspicions.
> >
> > By the way, we are aware that the -s option to
> > pqcreate is not working.  Our apologies for that.
> >
> > Also, I understand that it is often easier to talk
> > rather than send email.  However, email is
> > important to us because we use it to track our
> > efforts, interact with each other, and populate
> > our database.  So, please send email in the
> > future.  Thank you!
>
> Hey, no problem!  I appreciate all of your help!
>
> Thanks again,
> -Jeff
>
> --
> __________________________________
>
> Jeffrey B. Ator
>
> National Centers for
>    Environmental Prediction
> Central Operations
> 5200 Auth Road, Rm. 307
> Camp Springs, MD  20746
>
> Phone:  (301) 763-8000, Ext. 7104
> Fax:    (301) 763-8381
>
> E-Mail:  address@hidden
>
> __________________________________

Anne,

It looks like the problem indeed lies within mmap (and/or something else?)
under AIX4.3.  As the below results show, the 5.0.9 version of the LDM on
IRIX64 is behaving very much like 5.0.5 on that same machine.  For example,
the 5.0.9 LDM queue on IRIX64 has grown to about 83.8Mb, and the recent
pqexpire output in the LDM log(s) shows that it is indeed deleting products
from the queue:

Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: Exiting
Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > Up since:
20000224105104.875
Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > Queue usage (bytes):60797864
Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: >          (nregions):  122570
Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > nbytes recycle:      4008832 (
3946.235 kb/hr)
Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > nprods deleted:         8432 (
8499.544 per hour)
Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > First deleted:
20000223215105.341
Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > Last  deleted:
20000223225036.733
Feb 24 11:51:04 5Q:sgi98 pqexpire[12505]: Starting Up
Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: Exiting
Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > Up since:
20000224115104.333
Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > Queue usage (bytes):60797864
Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: >          (nregions):  122570
Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > nbytes recycle:      3916416 (
3872.162 kb/hr)
Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > nprods deleted:         8444 (
8548.952 per hour)
Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > First deleted:
20000223225138.772
Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > Last  deleted:
20000223235054.576
Feb 24 12:51:04 5Q:sgi98 pqexpire[22392]: Starting Up
Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: Exiting
Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > Up since:
20000224125104.915
Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > Queue usage (bytes):60797864
Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: >          (nregions):  122570
Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > nbytes recycle:      5242864 (
5198.195 kb/hr)
Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > nprods deleted:         9790 (
9939.549 per hour)
Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > First deleted:
20000223235144.751
Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > Last  deleted:
20000224005050.586
Feb 24 13:51:04 5Q:sgi98 pqexpire[20045]: Starting Up
Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: Exiting
Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > Up since:
20000224135104.316
Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > Queue usage (bytes):60797864
Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: >          (nregions):  122570
Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > nbytes recycle:      5175560 (
5165.656 kb/hr)
Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > nprods deleted:         9557 (
9767.640 per hour)
Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > First deleted:
20000224005149.562
Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > Last  deleted:
20000224015031.928
Feb 24 14:51:04 5Q:sgi98 pqexpire[19577]: Starting Up
Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: Exiting
Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > Up since:
20000224145104.805
Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > Queue usage (bytes):60797864
Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: >          (nregions):  122570
Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > nbytes recycle:      3985352 (
3955.097 kb/hr)
Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > nprods deleted:         8323 (
8458.050 per hour)
Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > First deleted:
20000224015132.717
Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > Last  deleted:
20000224025035.236
Feb 24 15:51:04 5Q:sgi98 pqexpire[23671]: Starting Up
Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: Exiting
Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > Up since:
20000224155104.894
Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > Queue usage (bytes):60797864
Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: >          (nregions):  122570
Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > nbytes recycle:      4284560 (
4216.266 kb/hr)
Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > nprods deleted:         8742 (
8809.120 per hour)
Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > First deleted:
20000224025132.273
Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > Last  deleted:
20000224035104.843

The "queue usage" statistics look reasonable, in that about 61Mb of the
queue are being utilized.  Furthermore, the most recent "Growing index by
16384" messages appeared in the LDM logs back at around 0050z on 2/23/2000,
suggesting that the LDM queue has indeed reached somewhat of a "stable"
size to always hold the most recent 12 hours of data.

However, on the AIX4.3 box, the 5.0.9 LDM continues "Deleting oldest to get
a queue slot", the LDM queue remains at the original size of around 75Mb at
which it was allocated, and the pqexpire output still shows that it has
nothing to delete *and* that only around 20Mb of the queue are actually
being utilized:

Feb 24 11:51:00 nco3n03 pqexpire[41322]: Starting Up
Feb 24 11:52:08 nco3n03 pqexpire[41322]: Exiting
Feb 24 11:52:08 nco3n03 pqexpire[41322]: > Up since:
20000224115100.340
Feb 24 11:52:08 nco3n03 pqexpire[41322]: > Queue usage (bytes):19862056
Feb 24 11:52:08 nco3n03 pqexpire[41322]: >          (nregions):   36692
Feb 24 11:52:08 nco3n03 pqexpire[41322]: > nprods deleted 0
Feb 24 12:51:00 nco3n03 pqexpire[38614]: Starting Up
Feb 24 12:52:12 nco3n03 pqexpire[38614]: Exiting
Feb 24 12:52:12 nco3n03 pqexpire[38614]: > Up since:
20000224125100.655
Feb 24 12:52:12 nco3n03 pqexpire[38614]: > Queue usage (bytes):19862056
Feb 24 12:52:12 nco3n03 pqexpire[38614]: >          (nregions):   36692
Feb 24 12:52:12 nco3n03 pqexpire[38614]: > nprods deleted 0
Feb 24 13:51:00 nco3n03 pqexpire[38818]: Starting Up
Feb 24 13:52:08 nco3n03 pqexpire[38818]: Exiting
Feb 24 13:52:08 nco3n03 pqexpire[38818]: > Up since:
20000224135100.537
Feb 24 13:52:08 nco3n03 pqexpire[38818]: > Queue usage (bytes):19862056
Feb 24 13:52:08 nco3n03 pqexpire[38818]: >          (nregions):   36692
Feb 24 13:52:08 nco3n03 pqexpire[38818]: > nprods deleted 0
Feb 24 14:51:00 nco3n03 pqexpire[41062]: Starting Up
Feb 24 14:52:05 nco3n03 pqexpire[41062]: Exiting
Feb 24 14:52:05 nco3n03 pqexpire[41062]: > Up since:
20000224145100.194
Feb 24 14:52:05 nco3n03 pqexpire[41062]: > Queue usage (bytes):19862056
Feb 24 14:52:05 nco3n03 pqexpire[41062]: >          (nregions):   36692
Feb 24 14:52:05 nco3n03 pqexpire[41062]: > nprods deleted 0
Feb 24 15:51:00 nco3n03 pqexpire[9244]: Starting Up
Feb 24 15:52:07 nco3n03 pqexpire[9244]: Exiting
Feb 24 15:52:07 nco3n03 pqexpire[9244]: > Up since:      20000224155100.787

Feb 24 15:52:07 nco3n03 pqexpire[9244]: > Queue usage (bytes):20033792
Feb 24 15:52:07 nco3n03 pqexpire[9244]: >          (nregions):   36692
Feb 24 15:52:07 nco3n03 pqexpire[9244]: > nprods deleted 0

Thus, we know that, under IRIX64, the 5.0.9 LDM queue will grow to hold the
amount of data needed (just like it did in version 5.0.5), whereas, under
AIX4.3, pqing simply deletes the oldest entry in the queue when it needs a
queue slot, resulting in a queue containing much less overall data.  So,
what I will try now for AIX4.3 is to make the queue hold more by making it
a little bigger but also by using "pqcreate" (instead of the "ldmadmin
mkqueue" that we had been using) so that I can use the -S option to specify
a larger number of "slots".

Anyway, one last question for you:  you explained previously that mmap
works differently under AIX4.3 in that allocates in regions, and also that
it's "entirely possible that there's a problem related to mmap under
AIX".   Is there anything you can think of that I could inquire of our IBM
sysadmins that maybe they can tweak (or fix?) to get mmap to work more like
it does on other systems?  Or is this just a platform-dependent thing that
I'll have to live with if I want to run under AIX?  I'm asking because I
don't personally know a whole lot about mmap.

Again, thanks for all of your help in this matter!

Best regards,
-Jeff

--
__________________________________

Jeffrey B. Ator

National Centers for
   Environmental Prediction
Central Operations
5200 Auth Road, Rm. 307
Camp Springs, MD  20746

Phone:  (301) 763-8000, Ext. 7104
Fax:    (301) 763-8381

E-Mail:  address@hidden

__________________________________