Why can't SwarmAgent share work with other Agents?

Hello people

I got problems configuring the Swarm Agent to run together on multiple devices, first my current setup:

  • PC A runs the Coordinator, Agent and UE4, It is connected to the router over LAN and got a static IP, also can it see all other PC’s. It runs Windows 8.1 Pro x64
  • PC B runs the Agent, it is connected to the router over LAN too and got also a static IP, it also can see all other PC’s. It runs Windows 8.1 x32
  • PC C runs the Agent too, it is connected over WLAN and uses DHCP, it also can see all other PC’s. It runs Windows 8.1 x64

I use PC A to work on code and maps, my problem is that i get for each other device a different error:

If i got PC A and B started and inside the Coordinator list and start building lightning, i get the following output on PC A:

16:56:03: [Connect] Successfully opened a remote connection with INGRID-PC
16:56:03: [EndJobSpecification] Tried to begin the job on a valid remote connection, but failed
16:56:03: [CloseConnection] Closing connection 1ADC0C6B using handle 1ADC0C6B
16:56:03: [CloseConnection] Connection confirmed for disconnection 1ADC0C6B
16:56:03: [CloseConnection] Closing bi-directional remote connection (1ADC0C6B)
16:56:03: [CloseConnection] Connection disconnected 1ADC0C6B
16:56:03: [MaintainConnections] Remote connection has closed (1ADC0C6B)
16:56:03: [MaintainConnections] Removed connection 1ADC0C6B

But PC B gives me this output:

16:55:47: [Connect] Remote agent connection object obtained: OS-TERMINAL
16:55:47: [Connect] Remote agent connection confirmed: OS-TERMINAL
16:55:47: [Job] Accepted Job C47A5F45-4B7F7640-86BC5CB9-CE0C84C1
16:55:49: [MaintainConnections] Detected dropped remote connection, cleaning up (1ADC0C5D)
16:55:49: [CloseConnection] Closing connection 1ADC0C5D using handle 1ADC0C5D
16:55:49: [CloseConnection] Connection confirmed for disconnection 1ADC0C5D
16:55:49: [CloseConnection] Closing bi-directional remote connection (1ADC0C5D)
16:55:49: [CloseConnection] Connection disconnected 1ADC0C5D
16:55:50: [MaintainConnections] Remote connection has closed (1ADC0C5D)
16:55:50: [MaintainConnections] Removed connection 1ADC0C5D
16:55:50: [MaintainConnections] All connections have closed

It does not matter if the firewall is enabled or disabled on both devices, it always happens and both outputs loogs in the logs until lightmass has completed.

If i try to use C for building light, i get the following output on A:

17:09:07: [Connect] Successfully opened a remote connection with OS-TABLET
17:09:08: [PushChannel] Pushing the channel has failed!

where the second line repeats infinitely.
C on the other sides gives me this output:

17:12:26: [Connect] Remote agent connection object obtained: OS-TERMINAL
17:12:26: [Connect] Remote agent connection confirmed: OS-TERMINAL
17:12:26: [CloseConnection] Rejected, unrecognized or inactive connection (1ADC0C79)
17:12:26: [Job] Accepted Job E0720E6D-418C4B2C-C0EA1B98-48B4BCB5

I am really clueless why that happens, firewall disabling didnt worked, static ip also does not work (all were dynamic at the begining).

At last i am attaching an image of the configuration screen of the swarm agent, all Agents got the same configuration (Except for the cache folder)

I had this same problem. Messed with it. I have 4 computers for my farm working great with UDK swarm just didn’t want to work with this version. I even changed all versions forward and back.

"[PushChannel] Pushing the channel has failed!"

Put these three files to Unreal Engine\4.0\Engine\Binaries\DotNET\ , your problem may be solved.

Alright, this patch only influences the connection between A and C (B is not affected at all):
The current problem now is that it seems that C detects a dropped connection and then terminates the job locally.
I attach both logs inside a .zip file and here is screenshot of the agent on PC A:

I have read the information about your computers.
UnrealLightmass does not support x32.
Some wireless routers have promlem with communication between computers.
So let’s swap PC B and C and try again.

Having the same thing on an internal network. All the machines can see each other. All can run swarm on their own UE4 projects. The coordinator schedules jobs. But all jobs block with: [PushChannel] Pushing the channel has failed!

I can solve this now
Hold on the guy below has a half right answer let me get it together give me two minutes.

Okay the real answer.

Put the three autoreporter files in the folder ----- Unreal Engine\4.X\Engine\Binaries\DotNET\

Put UnrealLightmass file in the folder Unreal Engine\4.X\Engine\Binaries\Win64

The link below is to dropbox for the files.

Please vote this answer up if it works for you.

In fact I have found the same solution when I’m answering another similar question: cluster problem with swarm agent and two pc - AnswerHub - Unreal Engine Forums

I gave up with that setup and instead started using a rented server where i communicate over VPN with it. It throws the same error as before but that file sloved it for this case, maybe i will test the original setup again but for now this works.
( Server is running Windows Server 2012 R2, CPU: Intel Xeon E31230 and 16GB RAM )

Tried the above setup. Machines get registered to the coordinator but still don’t distribute the jobs. No custom build of the Swarm tool is required, no?

Any ports need to be open between machines to allow this to happen?

To add that I can’t ping any external machines from a client - but they are all listed in the coordinator.

Our current lighting builds are taking 12 hours - which isn’t ideal from a design iteration standpoint!

Hello Naughty Spirit,

I have the same problem:
Detected dropped remote connection, cleaning up
Have you found an answer?

I use the 4.5.1 version.

I reply myself if somebody comes here and see the same behavior.
My computer are connected on a switch which is connected on the router (Livebox Play). If I disconnect the Livebox, so it become a pure local network, all problems of “detected dropped remote connection” disappear as well the other problem of “DNS Mismatch”.

I also had a problem with “Detected dropped remote connection, cleaning up” and for me the problem was solved by changing a setting in the DeveloperSettings tab of the swarm agent:

  • On the “Settings” tab and change “ShowDeveloperMenu” to “true”
  • then on the now visible “DeveloperSettings” tab in the “Distribution Settings” section change “UpdateAutomatically” to “false”

I changed this setting both in the swarm agent of the Coordinator and the remote PC.

There are still other problems with other agents, but not the one relevant for this post. Baby steps…