Blueprint references to level Actors are lost when streaming out and back in the level

Hi,

We noticed that Level Blueprint references to Actors on that level are invalid after streaming out and streaming back in that level. It happens only in multiplayer on both server and client.

After digging into the code I noticed that normally FLinkerLoad::Preload() is called for both Blueprint class object (object of a class “BlueprintGeneratedClass”) and a Blueprint object itself (object of a class “SomeClass_S_C”).

However when running in multiplayer, FLinkerLoad::Preload() is called only for a Blueprint object. So FLinkerLoad::Preload() is not called for a Blueprint class and FLinkerLoad::FinalizeBlueprint() is never called which in effect doesn’t assign proper Actor* pointers to references in Blueprint.

Did anyone come across similar issue? It may be not related to multiplayer, at least not directly.

best,
Klaudiusz

Hi,

There is a problem with variable replication in level blueprints that causes level Actor references to be invalid after level unload/load. Our bypass of the problem is to not use Replication on Level Blueprint Variables.

Here are steps to reproduce it:

  1. Load a Level MyMap_S with a Level Blueprint which has variable Variable_Name that has Replication set (to Replicated).
  2. Connect with a client.
  3. Unload level MyMap_S with that BP. → Replication system keeps references to Variable_Name and keeps a BP from unloading.
  4. Load level MyMap_S again. → FLinkerLoad::CreateExport() loads a level BP from memory and FLinkerLoad::RegenerateBlueprintClass() isn’t being called.
  5. All Actor reference pointers are stale

“Obj Refs Name=MyMap_S_C” lists number of reference paths going through network code and Variable_Name to level blueprint:


[...]
[2092.23][824]LogReferenceChain: IpConnection /Engine/Transient.IpConnection_1->Driver
[2092.23][824]LogReferenceChain: IpNetDriver /Engine/Transient.IpNetDriver_2->UE4Editor-Engine.dll!UNetDriver::AddReferencedObjects() [f:\perforce2\ue4-main\engine\source\runtime\engine\private\networkdriver.cpp:2174]
[2093.38][824]LogReferenceChain: BoolProperty /Game/Maps/MyMap_S.MyMap_S_C:Variable_Name!->Outer
[2093.39][824]LogReferenceChain: (target) BlueprintGeneratedClass /Game/Maps/MyMap_S.MyMap_S_C

If it’s not forbidden to use replication on level blueprint variables, and it’s not blocked, it seems like a bug in UE4 4.16.

best,
Klaudiusz

Klaudiusz,

I’m sorry for the late reply here.

If it’s not forbidden to use replication on level blueprint variables, and it’s not blocked, it seems like a bug in UE4 4.16.

It’s definitely not forbidden. Level Blueprints are actual blueprints, and are wrapped / managed by an ALevelScriptActor. If you take a look at the constructor, you’ll notice that we do explicitly enable replication and set some other networking state.

We noticed that Level Blueprint references to Actors on that level are invalid after streaming out and streaming back in that level. It happens only in multiplayer on both server and client.

Just so I fully understand the problem, you have some level A that references actors. When the level is Unloaded (unstreamed) and Reloaded (restreamed) those references are broke.

I have a few questions, mostly just for the sake of clarity:

  1. Are the referenced Actors owned by the streamed level, or another level?
  2. If they’re owned by another level, is that other level being streamed in / out during the same time frame?
  3. Are the levels being streamed in and out on the client, server, or both?

After digging into the code I noticed that normally FLinkerLoad::Preload() is called for both Blueprint class object (object of a class “BlueprintGeneratedClass”) and a Blueprint object itself (object of a class “SomeClass_S_C”).

In this case, are you talking about the BP of an actor, or the BP of the level that owns the references?

FLinkerLoad::Preload() is not called for a Blueprint class and FLinkerLoad::FinalizeBlueprint() is never called which in effect doesn’t assign proper Actor* pointers to references in Blueprint.

So, it sounds like in this case FLinkerLoad::Preload is not being called on the Blueprint Class for the Level Script?

When we unload a level we go through and mark all Actors and Subobjects owned by the level as Pending Kill and perform a garbage collection. This system is completely separate from Networking, and if any system has references to these objects they should be properly nulled out regardless of whether or not the references are strong or weak (Strong being a UPROPERTY and weak being something like TWeakObjectPtr).

Aside from that, the networking system generally keeps very few strong references. The few things it will keep a reference too are generally either other networking objects that shouldn’t be level specific. The main exception to this is for UActorChannel where we’ll keep a pointer to the Actor the channel is associated with. Again though, this should get cleared away by the Garbage Collection that happens as a part of unloading the level.

I’ll work on seeing if I can reproduce the issue based on the description you gave. I’m not sure how many others have seen this issue.

Thanks,
Jon

Hi Jon,

Just so I fully understand the problem, you have some level A that references actors. When the level is Unloaded (unstreamed) and Reloaded (restreamed) those references are broke.

That’s correct. References are only lost when client is connected to server in the process. Level is streamed out properly but it’s BP stays in memory and obj refs points to references through Net code and a replicated variable.

Are the referenced Actors owned by the streamed level, or another level?

References are to Actors from the same level.

Are the levels being streamed in and out on the client, server, or both?

They are streamed in and out on both client and server and lost references are on both client and server.

In this case, are you talking about the BP of an actor, or the BP of the level that owns the references?

It’s the level BP.

So, it sounds like in this case FLinkerLoad::Preload is not being called on the Blueprint Class for the Level Script?

Yes, FLinkerLoad::Preload is not being called on BP because FLinkerLoad::CreateExport early exits when it finds a BP in memory, here:

        		UObject* ActualObjectWithTheName = StaticFindObjectFastInternal(NULL, ThisParent, Export.ObjectName, true);
[...]
        			if (ActualObjectWithTheName && (ActualObjectWithTheName->GetClass() == LoadClass))
        			{
        				Export.Object = ActualObjectWithTheName;
        			}
        
        			// Object is found in memory.
        			if( Export.Object )
        			{
[...]
        				return Export.Object;
        			}

best,

Klaudiusz

Hi Jon,

Were you able to reproduce this bug? We prepared a repro project with video on vanilla 4.16.3 that has the same issue.

What you need to do:

  1. Create an empty project in vanilla 4.16.3
  2. Put packages from attached zip in Content folder
  3. On server, run the map TestCase_P
  4. On client connect to server
  5. On server walk into Load trigger, that streams in TestCase_SubLevel
  6. Walk into Unload trigger that streams out TestCase_SubLevel
  7. Again walk into Load trigger, that streams in TestCase_SubLevel
  8. Now walk into Test trigger and note that it isn’t working anymore.

Lost Actor references in TestCase_SubLevel is caused by Level BP Variable ProblemVariable on level TestCase_SubLevel: It has Replication flag set to Replicated

Hope this helps you tracking it down. It is a really serious bug and is making multiplayer scripting basically unusable, causing a lot of issues for us.

best,
Klaudiusz

Hi Jon,

What’s your progress with a fix for references in BP Class? We will need this fix anyway, because our workaround of not using replicated variables is not possible in some cases.

I’m going to look at it, but if you have any progress so far, please let me know.

best,
Klaudiusz

I can confirm that the fix is working for us on 4.17. References are not being lost. I will have an update when our QA tests the integration for any side effects.

Hi,

Unfortunately this patch introduced another bug during level transitions in multiplayer. Game crashes with following callstack in FRepLayout.

FRepLayout::DestructProperties() [replayout.cpp:3539]
FRepChangelistState::~FRepChangelistState() [replayout.cpp:3598]
FReplicationChangelistMgr::~FReplicationChangelistMgr() [datareplication.cpp:359]
SharedPointerInternals::TReferenceControllerWithDeleter >::DestroyObject() [sharedpointerinternals.h:110]
TSet,TSharedPtr >,TDefaultMapHashableKeyFuncs,TSharedPtr,0>,FDefaultSetAllocator>::Remove() [set.h:614]
UNetDriver::TickFlush() [networkdriver.cpp:836]

Hi everyone,

I did some investigation on this issue while Jon is out of the office, and have shelved a different approach in 3762353. It focuses on clearing references to the level script actor and it’s blueprint class that were being held by the net driver even after streaming out the sub level.

Using the attached repro levels I was successfully able to stream in, out, and in again multiple times with my variable references intact. I was able to use both seamless and non-seamless server travels as well.

There may still be some issues under high packet lag/loss situations, but let me know if this helps.

Klaudiusz,

Thanks for the assets. Using them I was able to reproduce the issue in the newest version of the engine. I’m going to keep investigating today.

Thanks,
Jon

Klaudiusz,

Ok, so we’ve identified the problem, at least in the newest version of the engine. It’s the same issue, but there’s potential (although unlikely) that there are additional things happening on your end.

The Replication code always assumed that Classes (and Structs and Functions) would never be destroyed at runtime. The assumption was made because these types of things are really more like Meta Data descriptions of classes than actual gameplay objects. This is the assumption that’s causing the problem.

For every class, we create an FRepLayout that knows how the replicated properties are laid out in memory, and how to serialize / deserialize the replicated properties for networking.

The problem is that FRepLayout is storing hard references to the properties. I didn’t catch this at first, because it’s not happening through the normal property system. Instead, it’s happening through AddReferences.

Basically, the fix is to move these references from the Class level to the Object level and when the Class is destroyed just remove the FRepLayout.

We haven’t implemented a fix for this yet, but after discussion it doesn’t sound that bad. Basically, in FRepLayout change the Owner member to be a TWeakObjectPtr (so we know when the class gets destroyed) and move the AddReferences method onto FRepState (which is per object). Finally, To prevent memory leaks from repeatedly created FRepLayouts, occasionally clean out the UNetDriver::RepLayoutMap so that any RepLayouts whose Owner is null (e.g., owner has been destroyed) gets removed.

I’ll try to have a CL for this by next week, but hopefully you can work on a solution in the interim with the info I’ve provided.

Thanks,
Jon

Klaudiusz,

Sorry I forgot to update this ticket. CL-3611087 has the changes mentioned. This has not been submitted yet, but I tested it against your case and it seemed to work.

Thanks,
Jon

Hi Jon,

Thanks a lot!

Unfortunately, I tried your fix, but just integrating it over our current 4.16 codebase and it causes ReplicationChangeListMap cleanup in UNetDriver::TickFlush() to crash. I think there were just too many changes since 4.16 that new code depends on. I’ll wait a few days until we have 4.17 integrated, try my luck again. I’ll let you know.

best,
Klaudiusz

Klaudiusz,

I’ve attached three patch files that you can use to see just the isolated changes of the shelf. These shouldn’t have much (if any) impact on other parts of the replication system, so you shouldn’t experience crashes.

P4 doesn’t actually like / handle patches very well, so I generated these using a secondary tool. As such, you probably won’t be able to directly apply them (but they should give you context into what exactly you need).

Patches

Thanks,
Jon

In case any other user comes along these parts and has trouble applying this patch to NetworkDriver.cpp: that change goes into UNetDriver::ServerReplicateActors(float DeltaSeconds), just before TArray ConsiderList;.

Jon,

I confirm that we’re also seeing the crash in ReplicationChangeListMap cleanup that Klaudiusz mentions above. We’re also on 4.16.2, will try again in 4.17, which we are in the process of integrating.

Leszek,

Are the crashes still occurring with the patches?

As I pointed out above, the changes listed above (including the patches) are not final nor have they been pushed into the engine yet. To that end, they are “use at your own risk” sort of things at this point.

Thanks,
Jon

Yes, with the patches. I believe we followed precisely in the footsteps of Klaudiusz here: same problem, finding this thread, applying your patches, getting crashes.

Understood. Can we hope for an update in this thread once they do get pushed?

Once everything is submitted, I will updated this thread.

Hi,
I can confirm that I am experiencing same crash after level restart. (With patch applied, of course)
The references in blueprint are valid now after restart, but during scheduled TickFlush game crashes.