Client Disconnect after Rejoining Session with Steam

Note that this is only an issue when using Steam online subsystem.

After successfully connecting to a session, disconnecting, and rejoining that same server, client will be connected for about 10 seconds before disconnecting, saying it simply lost connection to server. Other players connected to server are unaffected, and this only happens after client has already disconnected from that server.

It happens almost every time, and I’ve successfully recreated it in a clean project.
Additionally, this happens whether or not using blueprint nodes or manually via C++ and interacting with session interface.

Steps to reproduce (in blueprint, for simplicity):

  1. Client A creates a new server via Create Session BP node, traveling to map once Create Session was successful.
  2. Client B Searches, finds, and joins above server, loading map and joining. client can play as long as it can with no issues.
  3. Client B disconnects via DestroySession node and returning to a menu state.
  4. Client B Immediately performs Step 2 and reconnects to server.
  5. Client B is connected, spawned, and visible to everyone for about 10 seconds
  6. Client B loses connection to server, with no further information given.

If 6 does not happen, it almost always will happen by third or fourth rejoin.
I tried looking into steam subsystem’s source if it was an issue with authenticating via steam, but I couldn’t find anything, only that, according to logs, server mysteriously decides to close their connection, resulting in disconnect.

I’d also like to add that we’ve had this issue for as long as we can remember (at least since 4.7), but assumed it was simply us misusing sessions.

In a simplified test project, these are log outputs with LogNet and LogOnline set to Verbose
Note that in this example, it happened upon first reconnect of Client.

Server

Client

If you need test project, I can provide it if necessary. This is a crucial issue as our game relies heavily upon steam and steam sessions.

My initial guess is some bad interaction within p2p socket code that is trying to cleanup your previous disconnection.

Adding some more logging in bool FSocketSubsystemSteam::Tick(float DeltaTime) might help identify issue.

code relies on socket receiving data so that P2PTouch can be called an keep connection alive. When a player logs out, they get added to DeadConnections list and eventually removed. While in that list, new connections from that player shouldn’t be accepted in AcceptP2PConnection. P2PRemove might be trying to clean up new connection based on existence of old. Figuring out order of operations here would be helpful.

When you reconnect quickly, do you see them in DeadConnections list? Have they been completely cleaned up by then (ie has server completely removed all traces of this player)?

If you take a much longer period of time to reconnect, it sounds like this problem isn’t there? Or did I misread?

It’s possible that lower level Steam p2p API is having trouble, but let’s see what more logging in engine will reveal.

I inserted some log prints at top of that tick, and it appears AcceptedConnections and DeadConnections are behaving as they should. By time second client reconnects, their connection is gone from DeadConnections.

Something interesting I found, was inserting a breakpoint at P2PRemove just before client is forcefully disconnected, reveals that it might be that UNetConnection getting garbage collected:


Could this be previously dead connection being GC’d, but because it’s same client it’s disconnecting their new connection?

And to answer your question for if client waits to rejoin, so far in my tests it appears that yeah, if they wait before rejoining (like 2 minutes or so), they can successfully remain on server without issue. How long they wait is just entirely dependent on how long until GC runs (assuming that’s what’s happening).

Alright I think I’ve managed to fix this, but I’ll post here just to make sure there aren’t any unintended side effects.

In steam socket code, USteamNetConnection overrides CleanUp() method and calls relevant function for unregistering and removing steam P2P connection.

problem, is that this is called twice; by both NetworkDriver’s TickDispatch when it’s found to be a straggling connection, but also when that USteamNetConnection is garbage collected. By time user rejoins it’s about time that object is garbage collected, and will destroy same user’s steam P2P connection.

So naturally, a simple fix I found was to simply ensure steam cleanup code was called exactly once:
(SteamNetConnection.cpp)

void USteamNetConnection::CleanUp()
{
	Super::CleanUp();

    /* Insert new code here */
	//Only unregister  steam socket if it's  first time we've been told to CleanUp
	if (bAlreadyCleaned) return;
	bAlreadyCleaned = true;
    /* End new code */

	if (!bIsPassthrough)
	{

So far in my testing, this has fixed all issues, and still appear to be properly removing connections, but I’ve not done more extensive/thorough tests. Will this hold up in long term, would this be a proper fix?

Do you think there is a fix for this without messing with C++ ?
Because I have same Issue in a BP-only project. I mean I could get into code, but I’d rather stick with BP’s (also to see to what extend they can be used)

Hi Foohy and ooParanoia,

Josh M is out of office, but I’d like to get a report in for his return. Unfortunately, I can’t reproduce this in a test project, following instructions above. Foohy, you mentioned you’d be able to share test project. If you still have that or wouldn’t mind creating another for us, could you it up and upload it somewhere and get me a download link? I’d really appreciate it. Thanks!

Yep I’ve still got it, I’ve gone ahead and changed appid to spacewar ‘480’.

Additionally, test project is incredibly bare bones – When joining someone, I’ve not bothered to do any proper replication on actors and characters. ‘server’ state is a specific map, and they’re then returned to a ‘menu’ map when they’re disconnected.

https://dl.dropboxusercontent.com/u/1179448/SteamSessionExample.

Thanks!

Thanks! I’ve created UE-24229 and included your project as an example. I’m still not able to reproduce in a new project, but I can in yours. I suspect there’s something I’m missing, but this should help Josh M locate source of issue. If he has anything to add about your suggested fix or any other information, we’ll post an update here. Thanks again!

Hello Foohy and ooParanoia,

Since Josh has been out and we’ve gotten some other users reporting same problem, I went ahead and took a look. You all are right about problem, and your fix will work, although I’ve decided to fix it a slightly different way. I changed FSocketSubsystemSteam::UnregisterConnection to be following:

void FSocketSubsystemSteam::UnregisterConnection(USteamNetConnection* Connection)
{
	check(!Connection->bIsPassthrough);

	FWeakObjectPtr ObjectPtr = Connection;
	int32 NumRemoved = SteamConnections.RemoveSingleSwap(ObjectPtr);

	// Don't call P2PRemove again if we didn't actually remove a connection. This 
	// will get called twice - once  connection is closed and when  connection
	// is garbage collected. It's possible that  player who left rejoined before garbage
	// collection runs (their connection object will be different), so P2PRemove would kick
	// them from  session when it shouldn't.
	if (NumRemoved > 0 && Connection->RemoteAddr.IsValid())
	{
		FInternetAddrSteam& SteamAddr = (FInternetAddrSteam&)(*Connection->RemoteAddr);
		P2PRemove(SteamAddr.SteamId, SteamAddr.SteamChannel);
	}
}

Like I said, your fix is totally fine as well, but I prefer this one just because it avoids adding some redundant state to USteamNetConnection object.

If you decide to give my fix a shot and you run into any issues with it, please post again here. This should make it into 4.11.

Thanks again for all your research into this issue!

As a heads up: this has been fixed in our internal branch, and should be included in a future release version of engine. If you need GitHub link for source change before that, let me know and I’ll post it here.

Is this still going to make it into 4.11?

Hi erebel55,

I believe this fix was included in 4.11 branch, and you should be able to see it in 4.11 Preview 2 now if you’d like to test it out.

Is there any chance, that this might cause my problems with steam as well?

I don´t think so, as I already implemented your fix. However, I can´t find another possible root of problem…