[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000216: [Fwd: LDM version 5.0.9]



Hi Jeff,

I have some comments interspersed, below.

Jeff Ator wrote:

> Anne,
>
> It looks like the problem indeed lies within mmap (and/or something else?)
> under AIX4.3.  As the below results show, the 5.0.9 version of the LDM on
> IRIX64 is behaving very much like 5.0.5 on that same machine.  For example,
> the 5.0.9 LDM queue on IRIX64 has grown to about 83.8Mb, and the recent
> pqexpire output in the LDM log(s) shows that it is indeed deleting products
> from the queue:
>
> Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: Exiting
> Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > Up since:
> 20000224105104.875
> Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > Queue usage (bytes):60797864
> Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: >          (nregions):  122570
> Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > nbytes recycle:      4008832 (
> 3946.235 kb/hr)
> Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > nprods deleted:         8432 (
> 8499.544 per hour)
> Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > First deleted:
> 20000223215105.341
> Feb 24 11:01:38 5Q:sgi98 pqexpire[21631]: > Last  deleted:
> 20000223225036.733
> Feb 24 11:51:04 5Q:sgi98 pqexpire[12505]: Starting Up
> Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: Exiting
> Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > Up since:
> 20000224115104.333
> Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > Queue usage (bytes):60797864
> Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: >          (nregions):  122570
> Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > nbytes recycle:      3916416 (
> 3872.162 kb/hr)
> Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > nprods deleted:         8444 (
> 8548.952 per hour)
> Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > First deleted:
> 20000223225138.772
> Feb 24 12:00:41 5Q:sgi98 pqexpire[12505]: > Last  deleted:
> 20000223235054.576
> Feb 24 12:51:04 5Q:sgi98 pqexpire[22392]: Starting Up
> Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: Exiting
> Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > Up since:
> 20000224125104.915
> Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > Queue usage (bytes):60797864
> Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: >          (nregions):  122570
> Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > nbytes recycle:      5242864 (
> 5198.195 kb/hr)
> Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > nprods deleted:         9790 (
> 9939.549 per hour)
> Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > First deleted:
> 20000223235144.751
> Feb 24 13:01:52 5Q:sgi98 pqexpire[22392]: > Last  deleted:
> 20000224005050.586
> Feb 24 13:51:04 5Q:sgi98 pqexpire[20045]: Starting Up
> Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: Exiting
> Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > Up since:
> 20000224135104.316
> Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > Queue usage (bytes):60797864
> Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: >          (nregions):  122570
> Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > nbytes recycle:      5175560 (
> 5165.656 kb/hr)
> Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > nprods deleted:         9557 (
> 9767.640 per hour)
> Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > First deleted:
> 20000224005149.562
> Feb 24 14:00:06 5Q:sgi98 pqexpire[20045]: > Last  deleted:
> 20000224015031.928
> Feb 24 14:51:04 5Q:sgi98 pqexpire[19577]: Starting Up
> Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: Exiting
> Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > Up since:
> 20000224145104.805
> Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > Queue usage (bytes):60797864
> Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: >          (nregions):  122570
> Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > nbytes recycle:      3985352 (
> 3955.097 kb/hr)
> Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > nprods deleted:         8323 (
> 8458.050 per hour)
> Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > First deleted:
> 20000224015132.717
> Feb 24 14:59:18 5Q:sgi98 pqexpire[19577]: > Last  deleted:
> 20000224025035.236
> Feb 24 15:51:04 5Q:sgi98 pqexpire[23671]: Starting Up
> Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: Exiting
> Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > Up since:
> 20000224155104.894
> Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > Queue usage (bytes):60797864
> Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: >          (nregions):  122570
> Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > nbytes recycle:      4284560 (
> 4216.266 kb/hr)
> Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > nprods deleted:         8742 (
> 8809.120 per hour)
> Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > First deleted:
> 20000224025132.273
> Feb 24 16:01:18 5Q:sgi98 pqexpire[23671]: > Last  deleted:
> 20000224035104.843
>
> The "queue usage" statistics look reasonable, in that about 61Mb of the
> queue are being utilized.  Furthermore, the most recent "Growing index by
> 16384" messages appeared in the LDM logs back at around 0050z on 2/23/2000,
> suggesting that the LDM queue has indeed reached somewhat of a "stable"
> size to always hold the most recent 12 hours of data.
>
> However, on the AIX4.3 box, the 5.0.9 LDM continues "Deleting oldest to get
> a queue slot", the LDM queue remains at the original size of around 75Mb at
> which it was allocated, and the pqexpire output still shows that it has
> nothing to delete *and* that only around 20Mb of the queue are actually
> being utilized:

First, note that the queue will *not* grow under AIX.  It is only the IRIX
queue that will grow.  (I might have told you earlier that the queue will grow
under IRIX and some other OS, but I've come to find out it's only under IRIX.)



>
> Feb 24 11:51:00 nco3n03 pqexpire[41322]: Starting Up
> Feb 24 11:52:08 nco3n03 pqexpire[41322]: Exiting
> Feb 24 11:52:08 nco3n03 pqexpire[41322]: > Up since:
> 20000224115100.340
> Feb 24 11:52:08 nco3n03 pqexpire[41322]: > Queue usage (bytes):19862056
> Feb 24 11:52:08 nco3n03 pqexpire[41322]: >          (nregions):   36692
> Feb 24 11:52:08 nco3n03 pqexpire[41322]: > nprods deleted 0
> Feb 24 12:51:00 nco3n03 pqexpire[38614]: Starting Up
> Feb 24 12:52:12 nco3n03 pqexpire[38614]: Exiting
> Feb 24 12:52:12 nco3n03 pqexpire[38614]: > Up since:
> 20000224125100.655
> Feb 24 12:52:12 nco3n03 pqexpire[38614]: > Queue usage (bytes):19862056
> Feb 24 12:52:12 nco3n03 pqexpire[38614]: >          (nregions):   36692
> Feb 24 12:52:12 nco3n03 pqexpire[38614]: > nprods deleted 0
> Feb 24 13:51:00 nco3n03 pqexpire[38818]: Starting Up
> Feb 24 13:52:08 nco3n03 pqexpire[38818]: Exiting
> Feb 24 13:52:08 nco3n03 pqexpire[38818]: > Up since:
> 20000224135100.537
> Feb 24 13:52:08 nco3n03 pqexpire[38818]: > Queue usage (bytes):19862056
> Feb 24 13:52:08 nco3n03 pqexpire[38818]: >          (nregions):   36692
> Feb 24 13:52:08 nco3n03 pqexpire[38818]: > nprods deleted 0
> Feb 24 14:51:00 nco3n03 pqexpire[41062]: Starting Up
> Feb 24 14:52:05 nco3n03 pqexpire[41062]: Exiting
> Feb 24 14:52:05 nco3n03 pqexpire[41062]: > Up since:
> 20000224145100.194
> Feb 24 14:52:05 nco3n03 pqexpire[41062]: > Queue usage (bytes):19862056
> Feb 24 14:52:05 nco3n03 pqexpire[41062]: >          (nregions):   36692
> Feb 24 14:52:05 nco3n03 pqexpire[41062]: > nprods deleted 0
> Feb 24 15:51:00 nco3n03 pqexpire[9244]: Starting Up
> Feb 24 15:52:07 nco3n03 pqexpire[9244]: Exiting
> Feb 24 15:52:07 nco3n03 pqexpire[9244]: > Up since:      20000224155100.787
>
> Feb 24 15:52:07 nco3n03 pqexpire[9244]: > Queue usage (bytes):20033792
> Feb 24 15:52:07 nco3n03 pqexpire[9244]: >          (nregions):   36692
> Feb 24 15:52:07 nco3n03 pqexpire[9244]: > nprods deleted 0
>
> Thus, we know that, under IRIX64, the 5.0.9 LDM queue will grow to hold the
> amount of data needed (just like it did in version 5.0.5), whereas, under
> AIX4.3, pqing simply deletes the oldest entry in the queue when it needs a
> queue slot, resulting in a queue containing much less overall data.  So,
> what I will try now for AIX4.3 is to make the queue hold more by making it
> a little bigger but also by using "pqcreate" (instead of the "ldmadmin
> mkqueue" that we had been using) so that I can use the -S option to specify
> a larger number of "slots".

We will be interested in seeing the results of this.

FYI, you can sample your queue to see how many products you have in it by
doing 'pqcat -vl - > /dev/null' .  After running through all the products,
pqcat will tell you how many products are in the queue at the time it was
run.  This might be helpful in setting the value for the -S option.  You said
you want to keep 12 hours worth of data in your queue, so I suggest testing
this after the LDM has been running for 12 hours.

>
> Anyway, one last question for you:  you explained previously that mmap
> works differently under AIX4.3 in that allocates in regions, and also that
> it's "entirely possible that there's a problem related to mmap under
> AIX".   Is there anything you can think of that I could inquire of our IBM
> sysadmins that maybe they can tweak (or fix?) to get mmap to work more like
> it does on other systems?  Or is this just a platform-dependent thing that
> I'll have to live with if I want to run under AIX?  I'm asking because I
> don't personally know a whole lot about mmap.
>

I think now that what I said earlier about mmap being suspect was wrong and
has become a red herring.  I apologize for that.  The fact that the queue
grows under IRIX and not AIX has further confused the issue because it makes
the comparison between the two less valid.  If you get the number of slots set
properly using the -S option and the queue usage improves significantly, then
the problem was indeed the small size of your products in relation to the
default size set by the LDM, and *not* mmap.  If mmap was indeed causing a
problem, the symptom would more likely be a core dump.


> Again, thanks for all of your help in this matter!
>
> Best regards,
> -Jeff
>

You're  welcome!  I hope it's useful.

Anne

--
***************************************************
Anne Wilson                     UCAR Unidata Program
address@hidden                  P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************