Patrick raises a plausible scenario. (We don't move any data -- even
across our local interfaces -- that I think is large enough to benefit from
or require a higher MTU than 1500. However, if this is configured in an
unexpected manner...it can, indeed, cause issues.)
Both sides would need to have jump frames enabled (but the network would
also need to allow it across the switches/routers). You can uncover the
MTU from ifconfig or similar -- just make sure you're looking at the right
interface, as many heavy duty servers have 2 or more (e.g.):
evan@sanjose-ca-1:~$ ifconfig
> enp0s25 Link encap:Ethernet HWaddr [snip]
> [snip]
> UP BROADCAST RUNNING MULTICAST *MTU:1500* Metric:1
On Tue, Feb 16, 2021 at 9:47 AM Patrick Finnegan <vax@xxxxxxxxxx> wrote:
> This sounds like a link MTU size problem. Try a "ping -s 1500" between
> the two machines, and see if that works.
>
> It's likely that something is set to do jumbo frames (>1500 byte MTU) and
> something in the middle is limiting that to the standard 1500 byte MTU
> size. (or something like 1500 byte MTU size on a link that has VLAN tags
> or VPLS headers).
>
> Patrick Finnegan
> Data Center Architect
> Research Computing
> Purdue University
>
>
> On Tue, Feb 16, 2021 at 11:52 AM Karen Cooper - NOAA Affiliate via
> ldm-users <ldm-users@xxxxxxxxxxxxxxxx> wrote:
>
>> tl;dr -- LDM setup which worked fine last week, now will not
>> send/receive any files larger than 1292 bytes.
>>
>> Full story:
>> We get data via LDM from another system. This setup/connection was
>> working fine until last week. As far as we know, no changes were made --
>> but obviously something has changed, because now we can't get data.
>>
>> Both ends are running ldm-6.13.11, which is recent and has been working
>> well (except for pqact issues, which don't apply here).
>>
>> I see connectivity at both ends, and I have restarted and rebuilt the
>> queues on both ends multiple tiles during troubleshooting.
>>
>> I have enabled traffic both ways, and can ldmping and run notifyme
>> against the other machines queue(s).
>>
>> Interestingly enough the issue seems to have something to do with
>> filesize. In my testing I tried using ldmsend to send files to the
>> downstream server. I have an "accept" line there, and I *AM* able to send
>> files* IF* they are <1293 bytes. The downstream server receives data
>> from many other servers, and many of the files it receives are larger than
>> 1293 bytes.
>>
>> Interestingly, smaller files make it through, but are taking a
>> significantly long time. For instance a file of 1274 bytes can take more
>> than a minute.
>>
>> When trying to send the larger file, there is nothing in the downstream
>> logs, but the upstream logs show:
>>
>> 20210216T163901.154847Z dontpanic.nssl.noaa.gov(feed)[20925]
>> up6.c:up6_run:445NOTE Starting Up(6.13.11/6): 20210216162900.110949
>> TS_ENDT {{EXP, "/home/operator/ALAtest"}},
>> SIG=d40ffc815fd74a96c2d7c726dc7012d3, Primary
>> 20210216T163901.154950Z dontpanic.nssl.noaa.gov(feed)[20925]
>> up6.c:up6_run:448NOTE topo: dontpanic.nssl.noaa.gov {{EXP, (.*)}}
>> 20210216T164000.271093Z 140.172.25.37[20982]ldmd.c:cleanup:192NOTE
>> Exiting
>> 20210216T164001.213937Z dontpanic.nssl.noaa.gov(feed)[20925]
>> ldmd.c:cleanup:192NOTE Exiting
>>
>> I tried setting up a second downstream system, but had the same results.
>>
>> I have also tried using ldmsend to send data, but again, the small files
>> make it through, but larger packets fail. In verbose mode for ldmsend I
>> see:
>>
>> ldmsend -xxx -h dontpanic.nssl.noaa.gov ALAtestfile7
>> 20210216T164634.300292Z ldmsend[21540] error.c:err_log:236
>> INFO Resolving dontpanic.nssl.noaa.gov to 140.172.25.37
>> took 0.000755 seconds
>> 20210216T164634.329557Z ldmsend[21540] ldmsend.c:main:437
>> DEBUG version 6
>> 20210216T164634.359151Z ldmsend[21540] ldmsend.c:ldmsend:281
>> INFO Sending ALAtestfile7, 1293 bytes
>> 20210216T164634.359234Z ldmsend[21540]
>> LdmProxy.c:my_hereis_6:549 DEBUG Sending file via HEREIS_6
>> 20210216T164734.361874Z ldmsend[21540]
>> LdmProxy.c:getStatus:68 ERROR NULLPROC_6 failure to host "
>> dontpanic.nssl.noaa.gov": RPC: Unable to recei
>> ve; errno = Connection reset by peer
>> 20210216T164734.361940Z ldmsend[21540] ldmsend.c:ldmsend:309
>> ERROR Couldn't flush connection
>> 20210216T164734.362006Z ldmsend[21540] ldmsend.c:cleanup:82
>> ERROR Message-queue isn't empty
>>
>>
>>
>> --
>> *"Outside of a dog, a book is a man's best friend. Inside of a dog, it's
>> too dark to read."*
>> *--Groucho Marx*
>>
>> -------------------------------------------
>> Karen.Cooper@xxxxxxxx
>>
>> Phone#: 405-325-6456
>> Cell: 405-834-8559
>> National Severe Storms Laboratory
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> ldm-users mailing list
>> ldm-users@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> https://www.unidata.ucar.edu/mailing_lists/
>>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> ldm-users mailing list
> ldm-users@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> https://www.unidata.ucar.edu/mailing_lists/
>