Swarm Coordinator is not coordinating

I’ve set up Swarm and Coordinator to run with my computer (GOG) and my brother’s (BOB). This is the setup I used on both machines:

Apparently it worked. If I hit Network --> Ping Remote Agents, this is the output (all green text):

16:57:06: [Network] Pinging remote agents...
16:57:06: [Network] Remote Agent ping complete

Network --> Ping Coordinator from the client machine (BOB) also works.

But when I build a map…

Only the local machine works. The other stays available but unassigned.

Why?

Hi Jefferson,

I think tasks will only be distributed if they’re sufficiently large. Can you try to start a large job that would take several minutes on your local machine?

You can also enable verbose logging in the settings. Perhaps that will provide some information about what is happening. Your settings look ok.

I tried a High quality build now on a complex map (build takes 5 minutes on Preview quality on my machine). After about 10 minutes, it’s still working alone, it’s at 3.26% and BOB (the remote machine) is still “Available, unassigned”.

Attached the log with verbosity = Verbose. I couldn’t identify any problem. Logging with SuperVerbose now. link text

SuperVerbose log now, with quality set to Porduction. The host is still working alone, and that task is certainly large (0.44% after 6 minutes).link text

I also set SuperVerbose log on the other machine (BOB), all it says is “Determined that we shouldn’t ping the coordinator right now” and clear cache.

Oh, wait. There’s something more on the log file saved in SwarmCache/Logs:

[PostJobStatsToDB] Database error:
[PostJobStatsToDB] A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)

Here are the logs. link text

Hmm, that’s odd. That SQL Server message doesn’t matter - it’s only for stats reporting, which you don’t have to use.

Let’s do a quick sanity check. Can the two machines resolve each other’s names and ping each other by hostname from the command line, i.e. on BOB do: Command Prompt → “ping GOG”, and on GOG do Command Prompt → “ping BOB”.

Also make sure that you don’t have any personal firewalls on those computers blocking incoming/outgoing network connections.

Yes, ping to machine name works. It resolves IPv6 address.

We had Windows Firewall only, with SwarmAgent allowed. We tried disabling Win Firewall though, same issues.

We can also access each other’s shared folders normally.

Finally, we tried entering our IPv4 addresses instead of machine names. Still nothing.

But then… I see this on my brother’s log:

 ...... initializing connection to SwarmCoordinator
 ......... using SwarmCoordinator on GOG
[PingRemoteHost] Successfully pinged GOG with GOG
[Ping] Communication with the coordinator failed, job distribution will be disabled until the connection is established
Exception details: System.Net.Sockets.SocketException (0x80004005): No connection could be made because the target machine actively refused it 192.168.0.110:8009

I go on my router’s config and add a port forward on port 8009 to my computer (192.168.0.110). My brother’s log now reads:

 ...... initializing connection to SwarmCoordinator
 ......... using SwarmCoordinator on 192.168.0.110
[PingRemoteHost] Successfully pinged 192.168.0.110 with 192.168.0.110
 ......... SwarmCoordinator successfully initialized

But it still won’t distribute! My brother’s machine is still “Available, unassigned”.

Attached my brother’s logs.link text

The opposite is also true. If my brother starts a lighting build, it reads here on my machine’s coordinator:

BOB (my brother): Working for BOB, assigned to BOB

GOG (my comp): Available, unassigned

Are both computers on the same network? Can you tell me more about how they are connected? You shouldn’t have to forward ports on your router if they’re on the same subnet.

We are connected via wireless to the same router. The router and both network cards are 802.11n . The router’s IP is 192.168.0.1, and our IPs are fixed at 192.168.0.110 and .111, subnet mask 255.255.255.0.

Ah, wireless. Does your router have an option to isolate wireless clients from each other? On many consumer grade routers this is enabled by default to protect wireless clients from each other and to reduce network traffic over the radio.

As a quick test, you could try to connect the computers to the router using Ethernet cables. I bet the problem goes away then.

What is the brand and model of the router?

The isolation option is already off (“WLAN Partition” on my router).

My router is a DLINK DIR-615.

I’ll look for cables and how it goes.

Ahhh, it still refuses to work. :frowning:

I’ve connected my computer directly to my brother’s with a cable, and checked that it was working (accessed shared folders normally). But when I hit build, same result.

ps. I disabled the wireless cards before doing that.

I’ll try connecting through the router now, with wires.

I don’t know if name resolution is going to work properly for anything other than Windows networking if you connect the PCs directly. When you connect through the router, can you also try to enter the IP addresses in the settings? Also try “*” for AllowedRemoteAgentNames.

Also try the Network → Ping Remote Agents feature in SwarmAgent and see what you get in the log. It should say something like:

1:16:39 PM: [Network] Pinging remote agents...
1:16:39 PM:     RENDER-01 (User = BUILDMACHINE, IP = 10.1.10.145, Version = 1.7.4718.0) is currently Available, Unassigned
1:16:39 PM:     RENDER-02 (User = BUILDMACHINE, IP = 10.1.10.146, Version = 1.7.4718.0) is currently Available, Unassigned

Connecting with cables to the router didn’t work too. I had also tried using the other computer as the coordinator.

I’ll try your latest 2 suggestions now.

I put on CoordinatorRemotingHost 127.0.0.1 for the coordinator and 192.168.0.110 for the other machine. On AllowedRemoteAgentNames I put * on both. Same problems.

This is what I get from Ping Remote Agents (using SuperVerbose):

15:45:35: [Network] Pinging remote agents...
15:45:35: [Network] Remote Agent ping complete

Same thing on the other computer.

Hmm, it looks like the agent still isn’t able to see the other machine. SwarmAgent is running on both computers, yes?