*Both sides would need to have jumbo (not jump). I often cannot type :)
On Tue, Feb 16, 2021 at 10:47 AM Evan Breznyik <evan@xxxxxxxxxxxx> wrote:
> Patrick raises a plausible scenario. (We don't move any data -- even
> across our local interfaces -- that I think is large enough to benefit from
> or require a higher MTU than 1500. However, if this is configured in an
> unexpected manner...it can, indeed, cause issues.)
>
> Both sides would need to have jump frames enabled (but the network would
> also need to allow it across the switches/routers). You can uncover the
> MTU from ifconfig or similar -- just make sure you're looking at the right
> interface, as many heavy duty servers have 2 or more (e.g.):
>
> evan@sanjose-ca-1:~$ ifconfig
>> enp0s25 Link encap:Ethernet HWaddr [snip]
>> [snip]
>> UP BROADCAST RUNNING MULTICAST *MTU:1500* Metric:1
>
>
> On Tue, Feb 16, 2021 at 9:47 AM Patrick Finnegan <vax@xxxxxxxxxx> wrote:
>
>> This sounds like a link MTU size problem. Try a "ping -s 1500" between
>> the two machines, and see if that works.
>>
>> It's likely that something is set to do jumbo frames (>1500 byte MTU) and
>> something in the middle is limiting that to the standard 1500 byte MTU
>> size. (or something like 1500 byte MTU size on a link that has VLAN tags
>> or VPLS headers).
>>
>> Patrick Finnegan
>> Data Center Architect
>> Research Computing
>> Purdue University
>>
>>
>> On Tue, Feb 16, 2021 at 11:52 AM Karen Cooper - NOAA Affiliate via
>> ldm-users <ldm-users@xxxxxxxxxxxxxxxx> wrote:
>>
>>> tl;dr -- LDM setup which worked fine last week, now will not
>>> send/receive any files larger than 1292 bytes.
>>>
>>> Full story:
>>> We get data via LDM from another system. This setup/connection was
>>> working fine until last week. As far as we know, no changes were made --
>>> but obviously something has changed, because now we can't get data.
>>>
>>> Both ends are running ldm-6.13.11, which is recent and has been working
>>> well (except for pqact issues, which don't apply here).
>>>
>>> I see connectivity at both ends, and I have restarted and rebuilt the
>>> queues on both ends multiple tiles during troubleshooting.
>>>
>>> I have enabled traffic both ways, and can ldmping and run notifyme
>>> against the other machines queue(s).
>>>
>>> Interestingly enough the issue seems to have something to do with
>>> filesize. In my testing I tried using ldmsend to send files to the
>>> downstream server. I have an "accept" line there, and I *AM* able to send
>>> files* IF* they are <1293 bytes. The downstream server receives data
>>> from many other servers, and many of the files it receives are larger than
>>> 1293 bytes.
>>>
>>> Interestingly, smaller files make it through, but are taking a
>>> significantly long time. For instance a file of 1274 bytes can take more
>>> than a minute.
>>>
>>> When trying to send the larger file, there is nothing in the downstream
>>> logs, but the upstream logs show:
>>>
>>> 20210216T163901.154847Z dontpanic.nssl.noaa.gov(feed)[20925]
>>> up6.c:up6_run:445NOTE Starting Up(6.13.11/6): 20210216162900.110949
>>> TS_ENDT {{EXP, "/home/operator/ALAtest"}},
>>> SIG=d40ffc815fd74a96c2d7c726dc7012d3, Primary
>>> 20210216T163901.154950Z dontpanic.nssl.noaa.gov(feed)[20925]
>>> up6.c:up6_run:448NOTE topo: dontpanic.nssl.noaa.gov {{EXP, (.*)}}
>>> 20210216T164000.271093Z 140.172.25.37[20982]ldmd.c:cleanup:192NOTE
>>> Exiting
>>> 20210216T164001.213937Z dontpanic.nssl.noaa.gov(feed)[20925]
>>> ldmd.c:cleanup:192NOTE Exiting
>>>
>>> I tried setting up a second downstream system, but had the same
>>> results.
>>>
>>> I have also tried using ldmsend to send data, but again, the small files
>>> make it through, but larger packets fail. In verbose mode for ldmsend I
>>> see:
>>>
>>> ldmsend -xxx -h dontpanic.nssl.noaa.gov ALAtestfile7
>>> 20210216T164634.300292Z ldmsend[21540] error.c:err_log:236
>>> INFO Resolving dontpanic.nssl.noaa.gov to
>>> 140.172.25.37 took 0.000755 seconds
>>> 20210216T164634.329557Z ldmsend[21540] ldmsend.c:main:437
>>> DEBUG version 6
>>> 20210216T164634.359151Z ldmsend[21540]
>>> ldmsend.c:ldmsend:281 INFO Sending ALAtestfile7, 1293 bytes
>>> 20210216T164634.359234Z ldmsend[21540]
>>> LdmProxy.c:my_hereis_6:549 DEBUG Sending file via HEREIS_6
>>> 20210216T164734.361874Z ldmsend[21540]
>>> LdmProxy.c:getStatus:68 ERROR NULLPROC_6 failure to host "
>>> dontpanic.nssl.noaa.gov": RPC: Unable to recei
>>> ve; errno = Connection reset by peer
>>> 20210216T164734.361940Z ldmsend[21540]
>>> ldmsend.c:ldmsend:309 ERROR Couldn't flush connection
>>> 20210216T164734.362006Z ldmsend[21540] ldmsend.c:cleanup:82
>>> ERROR Message-queue isn't empty
>>>
>>>
>>>
>>> --
>>> *"Outside of a dog, a book is a man's best friend. Inside of a dog,
>>> it's too dark to read."*
>>> *--Groucho Marx*
>>>
>>> -------------------------------------------
>>> Karen.Cooper@xxxxxxxx
>>>
>>> Phone#: 405-325-6456
>>> Cell: 405-834-8559
>>> National Severe Storms Laboratory
>>>
>>> _______________________________________________
>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>> recorded in the Unidata inquiry tracking system and made publicly
>>> available through the web. Users who post to any of the lists we
>>> maintain are reminded to remove any personal information that they
>>> do not want to be made public.
>>>
>>>
>>> ldm-users mailing list
>>> ldm-users@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe, visit:
>>> https://www.unidata.ucar.edu/mailing_lists/
>>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> ldm-users mailing list
>> ldm-users@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> https://www.unidata.ucar.edu/mailing_lists/
>>
>