Supporting PIE with online subsystems

anonymous_user_735857e9 · May 6, 2016, 1:40pm

We’re working on trying to make a custom online subsystem work with PIE. UE4 has basic support for this, primarily through Online::GetSubsystem which takes a world pointer; with UE_EDITOR compiled in this generates a world specific identifier. The idea is that the factory then creates a new online subsystem which is used for that PIE session.

While this setup exists in theory, in practise it doesn’t work. All existing subsystems, aside from the Null one, have an internal singleton. If a subsystem tries to get created again via CreateSubsystem an error is logged and nothing is created. For some subsystems this is the right thing to do (for example Steam only functions as a singleton) but for our system, and others that can support multiple contexts, having more than one active would work just fine.

By creating a new subsystem that does allow multiple allocation it quite quickly becomes noticable that this hasn’t been a feature that has been used. There are still a fair few places in UE that call IOnlineSubsystem::Get() which would return the global, not world specific one. For a large extend this is fixable by changing the call and passing in GetWorld() if ::Get is currently called from an UObject derived class (for example in UGameInstance::Init).

The question comes down to if there is an architectural desire to actually make online subsystems properly work for PIE. The commonly used IOnlineSubsystem::Get() that doesn’t take any parameters should not be allowed anymore to avoid calling mistakes and interfaces allocated from an online subsystem (that get passed the owning subsystem) should use the owningSubsystem pointer rather than a global Get, to ensure that the correct system instance is called. Since modifying this involves quite a few changes to the existing codebase it is not something that is easily supported by us as a 3rd party.

Related, in writing an initial implementation there is a problem with FOnlineSubsystemModule::GetOnlineSubsystem and how it allocates new subsystem instances. This is not something that the current subsystems (aside from Null) run into since they are all effectively singletons and will fail to create more than one subsystem. The issue is that when CreateSubsystem is called for a specific instance name it is not directly added to the FOnlineSubsystemModule::OnlineSubsystems map. Because CreateSubsystem doesn’t only allocate but also calls Init on the newly allocated subsystem, any code that calls Online::GetSubsystem or similar before the call Init completes will result in FOnlineSubsystemModule::GetOnlineSubsystem allocating another instance of the subsystem with the same name. When called from the same thread this is quite noticable as the code recurses until a stack overflow occurs, but in the case of Init registering callbacks to the operating system (or other systems) it is possible for those callbacks to be called and cause an subsystem creation (with the same instance name) from a different thread.

There are a few possible solutions to this. Init could only gets called after the subsystem has been added to the map, but that risks the subsystem being used before it is initialised. An improvement there is to have some kind of flag that gets set on init completion and an error is raised of GetOnlineSubsystem is called for the same instance that already is registered in the map but hasn’t completed initialisation yet. Another option is to have a clear post-init method that callbacks etc have to be registered from.

Either of those solutions require changes to FOnlineSubsystemModule::GetOnlineSubsystem so are likely something that has to be driven by Epic.

Crzyhomer · May 27, 2016, 5:54pm

That was quite a bit there, so bear with me, we may have to have a little back and forth.

This code works fine in PIE with one of our custom OSS for backend services although your comment about the older Get() function not taking a UWorld is a known pitfall. There are a few places where it wasn’t easy to retrofit UWorld access and so it remains, but everywhere else has been cleaned up.

The naming convention for getting an online subsystem is a little obscure. It fits this format.

InstanceName:PlatformName where both sides can be optionally present.

InstanceName ← instance name default platform specified in ini
InstanceName:Steam ← instance name, Steam platform
:Steam ← default name, Steam platform

In non PIE most of this is moot, except for the platform name, and is actually optimized out.

It is true any non default OSS is initialized at the first call site, the default platform is initialized very early in the engine loading (so Steam can hook D3D/Keyboard for example). All calls to Online::GetSubsystem(Identifier) will return the same OSS instance every time. I’m not sure I understand how it could possibly create a duplicate instance.

When a new OSS module is loaded, its job is to register its factory with the main OnlineSubsystem module. The main module then finds the factory, creates an OSS and stores it in TMap of [FName, IOnlineSubsystem*]. I noticed recently this “key name” isn’t set on instance itself, but it hasn’t been an issue yet and something I’ll address in the future.

I’ve found no infinite recursion issues, duplicate instance issues, or other problems that you describe. I’d need to see specific repro examples to better understand what you mean.

In PIE, the editor has its own OSS (side effect of initializing the engine, not really used), but every PIE instance, if credentials are specified, will call LoginPIEInstance before actually creating the PIE world. There will be one complete OSS per PIE instance. It is easier for a single instance to manage a single screen of players (typically 1, but splitscreen should thoretically work).

There was a lot of comments about things not working, do you mind breaking them out for me? This code is used daily by all our internal teams with no problems.

Steam is special because it doesn’t allow more than one client per machine (I’ve tried virtualization, sandboxing, etc and it just doesn’t work consistently correct).

Console platforms don’t work as there isn’t really “PIE Console”.

NULL OSS should work just fine, creating 1 instance per PIE instance.

anonymous_user_735857e9 · May 27, 2016, 6:05pm

Hey,

My main question was about the older Get functions not taking a world. For robust PIE support they should all be moved over and the old Get that doesn’t take a world should stop existing. From your reply I get that this is the longer term goal, but I suspect that beyond the initial work done right now I shouldn’t expect more changes? I guess we could always modify logic on our side and send code upstream? I would have to do a more targeted search to see how many Get() calls are still left in the code base, but iirc there were enough to make it a bit cumbersome.

And as you said, the reason why those still exist is likely because it wasn’t trivial to get to a correct world.

The NULL one should work, it’s the only one that I can see that doesn’t use a singleton internally at the moment, the only exception being the cases where Get is called without a world, that would just return the primary/initial subsystem.

The duplicate instance can come from OnlineGetSubsystem(Identifier) being called recursively on the first call. It’s a bit of an odd pattern, but with fairly nested systems I’ve seen this:

Online::GetSubsystem(‘myidentifier’)

This ends up calling ::Init for that identifier, which starts calling a bunch of initialisation logic, but eventually one of the internal classes in the online subsystem wants a pointer to the subsystem, so it calls Online::GetSubsystem(‘myidentifier’) again. Bad stuff happens.

Obviously this is a bug in the client code, but it is easily created and not detected/disallowed by GetSubsystem right now.