[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: 20001130: Stretching LDM limits]



Hi Brenden,

------- Forwarded Message

>From: Brendon Hoch <address@hidden>
>Subject: Stretching LDM limits
>Organization: University of Columbia/Lamont Doherty
>Keywords: 200011301554.eAUFsFo02514 LDM
>
>Hello,
>
>We wish to do the unthinkable.  We need to have an LDM connection set up
>between our institution (IRI/LDEO) and Central Weather Bureau of Taiwan
>(CWB).  Basically, we want data (primarily WMO data feeds) to get to
>CWB, but we don't really care if it takes a while (1-2 hours) for it to
>get there, as the nature of the work in question is climate related. 
>The idea is that it would be easier to push data to CWB with LDM (then
>they can manage/store as they see fit) than have them run cron FTP
>scripts to try to regularly download data from us.  Obviously, one
>option is to expand queue size so that products don't get lost.  Would
>running rpc.ldmd with customized max latency settings also be a way to
>resolve the issue?  What settings would be appropriate?  Would
>ldmadmin/ldmd.conf settings require this adjustment or would it mean
>starting LDM on a manual basis?
>
>

Except for the quality of the connection between you and CWB
there should be no problem in doing this.  Unavco here in
Boulder is using the LDM in a similar way, transfering data
between Boulder and Hawaii.  Fortunately, the data is used
more for archival purposes than real time purposes.  This is
a good thing because transfer across the ocean can be
problematic due to connectivity problems, as you have seen.

To do this you probably do need to change the default max
latency settings.  Use the -m (max latency) flag to rpc.ldmd
to do this.  The -o (offset) flag refers to what data you
want to request upon start up, and the -m flag refers to
what data you are willing to accept. (You might find
yourself accepting older data than you would request if your
ldm falls behind for some reason.)  By default, the offset
will be the same as the max latency, which is what I assume
you want, so using the -m flag should work.

You can set the max latency value at your downstream site to
be very large, in which case the upstream queue is serving
like an archive.  Keep in mind that the arguments to
rpc.ldmd are in seconds, so you'll have to calculate how
many hours or days worth of data you want in terms of
seconds.  (In the future we may add a suffix flag to the
time argument so that you could specify times in other
units.)

I would suggest editing the ldmadmin script to implement
this change from the default set up.  That way your settings
will be captured in the script, as opposed to always running
everything from the command line.  The downside is that
every time you get a new distribution you'll have to migrate
your changes into the new ldmadmin script that came with the
distribution.

Regarding settings, you will have to play with the queue
size and max latency settings to see what works to actually
get products across. I really can't say what to start with,
knowing nothing about your data products.  It appears that
large products have greater difficulty getting across, so
they may need to be kept around longer to ensure a
successful transfer.  Another possibility is to break them
into pieces.

Robb has had some conversations with the Unavco people
regarding getting large products across.  If you try this
and find problems let us know - Robb may have some more
ideas.

>Below are some stats/log entries.  Any advice you can provide is greatly
>appreciated.
>
>Thanks,
>Brendon Hoch
>
>
>For reference, we're running in Linux with LDM 5.1.2.  I'm unsure as to
>whether CWB is also using LDM 5.1.2
>
>CWB tried connecting their LDM to us unsuccessfully.  I suspect the
>connection was being timed out:
>
>more ldmd.log.1
>Nov 29 05:28:09 arnie rpc.ldmd[20200]: Denying connection from
>dcsdat2.cwb.gov.tw
>Nov 29 06:20:08 arnie fxsvc02a[9624]: Connection from
>fxsvc02a.cwb.gov.tw
>Nov 29 06:20:12 arnie fxsvc02a[9624]: Connection reset by peer
>Nov 29 06:20:12 arnie fxsvc02a[9624]: Exiting
>Nov 29 06:53:37 arnie fxsvc02a[9681]: Connection from
>fxsvc02a.cwb.gov.tw
>Nov 29 06:53:51 arnie fxsvc02a[9681]: Connection reset by peer
>Nov 29 06:53:51 arnie fxsvc02a[9681]: Exiting
>Nov 29 08:41:53 arnie fxsvc02a[10029]: Connection from
>fxsvc02a.cwb.gov.tw
>Nov 29 08:42:05 arnie fxsvc02a[10029]: Connection reset by peer
>Nov 29 08:42:05 arnie fxsvc02a[10029]: Exiting
>Nov 29 08:46:27 arnie fxsvc02a[10039]: Connection from
>fxsvc02a.cwb.gov.tw
>Nov 29 08:46:31 arnie fxsvc02a[10039]: Connection reset by peer
>Nov 29 08:46:31 arnie fxsvc02a[10039]: Exiting
>Nov 29 08:46:38 arnie fxsvc02a[10040]: Connection from
>fxsvc02a.cwb.gov.tw
>Nov 29 08:46:42 arnie fxsvc02a[10040]: Connection reset by peer
>Nov 29 08:46:42 arnie fxsvc02a[10040]: Exiting
>Nov 29 08:46:49 arnie fxsvc02a[10041]: Connection from
>fxsvc02a.cwb.gov.tw
>Nov 29 08:46:51 arnie fxsvc02a[10041]: Connection reset by peer
>Nov 29 08:46:51 arnie fxsvc02a[10041]: Exiting
>Nov 29 08:46:56 arnie fxsvc02a[10042]: Connection from
>fxsvc02a.cwb.gov.tw
>Nov 29 08:49:25 arnie fxsvc02a[10042]: Connection reset by peer
>Nov 29 08:49:25 arnie fxsvc02a[10042]: Exiting                     
>

I suspect your right about the connection being timed out. 
Were they actually trying to get data?  If so, was there any
success?


>Doing an ldmping indicates that they are in fact up & running, though
>the elapsed time a bit on the high side:
>
>[ldm@arnie]# ldmping fxsvc02a.cwb.gov.tw
>Nov 30 15:12:29      State    Elapsed Port   Remote_Host          
>rpc_stat
>Nov 30 15:12:34 RESPONDING   4.319114  388   fxsvc02a.cwb.gov.tw
>Nov 30 15:12:59 RESPONDING   0.259661  388   fxsvc02a.cwb.gov.tw
>Nov 30 15:13:24 RESPONDING   0.324599  388   fxsvc02a.cwb.gov.tw
>Nov 30 15:13:49 RESPONDING   0.259368  388   fxsvc02a.cwb.gov.tw
>Nov 30 15:14:15 RESPONDING   0.261767  388   fxsvc02a.cwb.gov.tw
>Nov 30 15:14:40 RESPONDING   0.259633  388   fxsvc02a.cwb.gov.tw
>Nov 30 15:15:05 RESPONDING   0.259241  388   fxsvc02a.cwb.gov.tw    
>

Yes, these elapsed times are somewhat high.


>Traceroute doesn't get all the way to fxsvc02a.cwb.gov.tw, but comes
>close probably because there may be a firewall on their end.  Doesn't
>matter too much though, as if ldmping is able to function, the firewall
>is configured correctly
>
>[ldm@arnie]# traceroute fxsvc02a.cwb.gov.tw
>traceroute to fxsvc02a.cwb.gov.tw (163.29.179.202), 30 hops max, 38 byte
>packets
> 1  ph-iri-iri.ldgo.columbia.edu (129.236.110.1)  1.357 ms  0.672 ms 
>0.651 ms
> 2  vortex-ldeo.ldgo.columbia.edu (129.236.3.250)  2.851 ms  2.738 ms 
>1.998 ms
> 3  nyser-gw.net.columbia.edu (128.59.1.4)  3.644 ms  4.003 ms  3.010 ms
> 4  169.130.253.133 (169.130.253.133)  3.217 ms  3.779 ms  3.196 ms
> 5  sl-gw18-nyc-7-0.sprintlink.net (144.232.235.153)  3.940 ms  4.107
>ms  3.599 ms
> 6  sl-bb22-nyc-3-0.sprintlink.net (144.232.13.165)  3.018 ms  3.675 ms 
>3.104 ms
> 7  sl-bb20-rly-15-0.sprintlink.net (144.232.18.26)  23.740 ms  23.217
>ms  24.296 ms
> 8  144.232.9.90 (144.232.9.90)  24.741 ms  26.775 ms  26.111 ms
> 9  gbr3-p50.wswdc.ip.att.net (12.123.9.50)  23.724 ms  25.173 ms 
>23.451 ms
>10  gbr3-p80.sl9mo.ip.att.net (12.122.2.145)  42.008 ms  39.798 ms 
>39.892 ms
>11  gbr3-p20.sffca.ip.att.net (12.122.2.74)  80.623 ms  82.149 ms 
>81.499 ms
>12  gbr5-p60.sffca.ip.att.net (12.122.5.141)  80.821 ms  86.566 ms 
>83.433 ms
>13  ar1-p380.sffca.ip.att.net (12.123.13.73)  81.379 ms  82.603 ms 
>85.067 ms
>14  12.127.193.34 (12.127.193.34)  239.960 ms  239.221 ms  239.758 ms
>15  211.22.33.142 (211.22.33.142)  242.282 ms  244.850 ms  242.611 ms
>16  168.95.2.226 (168.95.2.226)  250.803 ms  250.459 ms  250.561 ms
>17  210.65.161.147 (210.65.161.147)  243.416 ms  244.254 ms  242.652 ms
>18  210.69.250.209 (210.69.250.209)  243.540 ms  244.921 ms  244.320 ms
>19  163.29.154.37 (163.29.154.37)  248.257 ms !X  247.512 ms !X *    
>

I think both ldmping and traceroute show that the
connectivity is not that great.  But, the alternatives, such
as ftp, all have to use the same pipes and thus probably
aren't any better.

You're trying to send the data from IRI to CWB and not vice
versa, right?  Because if you ldmping and/or traceroute in
the opposite direction you're likely to get different
results.

You might take a look at netcheck.  It checks network
connectivity over time. This can be informative, but may or
may not actually help in transferring the data.


>End of email
>______________________________________________________
>Brendon Hoch
>International Research Institute for climate prediction
>Lamont Doherty Earth Observatory of Columbia University
>141/208 Monell, 61 Route 9W, Palisades, NY 10964
>-------------------------------------------------------
>Phone: (845)680-4444 Fax:(845)680-4488
>Email: address@hidden  
>WWW:   http://iri.ldeo.columbia.edu/~bhoch
>-------------------------------------------------------

Good luck!

Anne
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************