x

Search in
Sort by:

Question Status:

Search help

  • Simple searches use one or more words. Separate the words with spaces (cat dog) to search cat,dog or both. Separate the words with plus signs (cat +dog) to search for items that may contain cat but must contain dog.
  • You can further refine your search on the search results page, where you can search by keywords, author, topic. These can be combined with each other. Examples
    • cat dog --matches anything with cat,dog or both
    • cat +dog --searches for cat +dog where dog is a mandatory term
    • cat -dog -- searches for cat excluding any result containing dog
    • [cats] —will restrict your search to results with topic named "cats"
    • [cats] [dogs] —will restrict your search to results with both topics, "cats", and "dogs"

Blueprint references to level Actors are lost when streaming out and back in the level

Hi,

We noticed that Level Blueprint references to Actors on that level are invalid after streaming out and streaming back in that level. It happens only in multiplayer on both server and client.

After digging into the code I noticed that normally FLinkerLoad::Preload() is called for both Blueprint class object (object of a class "BlueprintGeneratedClass") and a Blueprint object itself (object of a class "SomeClass_S_C").

However when running in multiplayer, FLinkerLoad::Preload() is called only for a Blueprint object. So FLinkerLoad::Preload() is not called for a Blueprint class and FLinkerLoad::FinalizeBlueprint() is never called which in effect doesn't assign proper Actor* pointers to references in Blueprint.

Did anyone come across similar issue? It may be not related to multiplayer, at least not directly.

best, Klaudiusz

Product Version: Not Selected
Tags:
more ▼

asked Sep 11 '18 at 08:54 PM in Blueprint Scripting

avatar image

Answers.Archive STAFF
1.8k 189 298 660

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hi,

There is a problem with variable replication in level blueprints that causes level Actor references to be invalid after level unload/load. Our bypass of the problem is to not use Replication on Level Blueprint Variables.

Here are steps to reproduce it:

  1. Load a Level MyMap_S with a Level Blueprint which has variable Variable_Name that has Replication set (to Replicated).

  2. Connect with a client.

  3. Unload level MyMap_S with that BP. -> Replication system keeps references to Variable_Name and keeps a BP from unloading.

  4. Load level MyMap_S again. -> FLinkerLoad::CreateExport() loads a level BP from memory and FLinkerLoad::RegenerateBlueprintClass() isn't being called.

  5. All Actor reference pointers are stale

"Obj Refs Name=MyMap_S_C" lists number of reference paths going through network code and Variable_Name to level blueprint:


[...]
[2092.23][824]LogReferenceChain: IpConnection /Engine/Transient.IpConnection_1->Driver
[2092.23][824]LogReferenceChain: IpNetDriver /Engine/Transient.IpNetDriver_2->UE4Editor-Engine.dll!UNetDriver::AddReferencedObjects() [f:\perforce2\ue4-main\engine\source\runtime\engine\private\networkdriver.cpp:2174]
[2093.38][824]LogReferenceChain: BoolProperty /Game/Maps/MyMap_S.MyMap_S_C:Variable_Name!->Outer
[2093.39][824]LogReferenceChain: (target) BlueprintGeneratedClass /Game/Maps/MyMap_S.MyMap_S_C

If it's not forbidden to use replication on level blueprint variables, and it's not blocked, it seems like a bug in UE4 4.16.

best, Klaudiusz

(comments are locked)
10|2000 characters needed characters left

1 answer: sort voted first

Klaudiusz,

I'm sorry for the late reply here.

If it's not forbidden to use replication on level blueprint variables, and it's not blocked, it seems like a bug in UE4 4.16.

It's definitely not forbidden. Level Blueprints are actual blueprints, and are wrapped / managed by an ALevelScriptActor. If you take a look at the constructor, you'll notice that we do explicitly enable replication and set some other networking state.

We noticed that Level Blueprint references to Actors on that level are invalid after streaming out and streaming back in that level. It happens only in multiplayer on both server and client.

Just so I fully understand the problem, you have some level A that references actors. When the level is Unloaded (unstreamed) and Reloaded (restreamed) those references are broke.

I have a few questions, mostly just for the sake of clarity:

  1. Are the referenced Actors owned by the streamed level, or another level?

  2. If they're owned by another level, is that other level being streamed in / out during the same time frame?

  3. Are the levels being streamed in and out on the client, server, or both?

    After digging into the code I noticed that normally FLinkerLoad::Preload() is called for both Blueprint class object (object of a class "BlueprintGeneratedClass") and a Blueprint object itself (object of a class "SomeClass_S_C").

In this case, are you talking about the BP of an actor, or the BP of the level that owns the references?

FLinkerLoad::Preload() is not called for a Blueprint class and FLinkerLoad::FinalizeBlueprint() is never called which in effect doesn't assign proper Actor* pointers to references in Blueprint.

So, it sounds like in this case FLinkerLoad::Preload is not being called on the Blueprint Class for the Level Script?

When we unload a level we go through and mark all Actors and Subobjects owned by the level as Pending Kill and perform a garbage collection. This system is completely separate from Networking, and if any system has references to these objects they should be properly nulled out regardless of whether or not the references are strong or weak (Strong being a UPROPERTY and weak being something like TWeakObjectPtr).

Aside from that, the networking system generally keeps very few strong references. The few things it will keep a reference too are generally either other networking objects that shouldn't be level specific. The main exception to this is for UActorChannel where we'll keep a pointer to the Actor the channel is associated with. Again though, this should get cleared away by the Garbage Collection that happens as a part of unloading the level.

I'll work on seeing if I can reproduce the issue based on the description you gave. I'm not sure how many others have seen this issue.

Thanks, Jon

more ▼

answered Sep 11 '18 at 08:54 PM

avatar image

Answers.Archive STAFF
1.8k 189 298 660

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hi Jon,

Just so I fully understand the problem, you have some level A that references actors. When the level is Unloaded (unstreamed) and Reloaded (restreamed) those references are broke.

That's correct. References are only lost when client is connected to server in the process. Level is streamed out properly but it's BP stays in memory and obj refs points to references through Net code and a replicated variable.

Are the referenced Actors owned by the streamed level, or another level?

References are to Actors from the same level.

Are the levels being streamed in and out on the client, server, or both?

They are streamed in and out on both client and server and lost references are on both client and server.

In this case, are you talking about the BP of an actor, or the BP of the level that owns the references?

It's the level BP.

So, it sounds like in this case FLinkerLoad::Preload is not being called on the Blueprint Class for the Level Script?

Yes, FLinkerLoad::Preload is not being called on BP because FLinkerLoad::CreateExport early exits when it finds a BP in memory, here:

                 UObject* ActualObjectWithTheName = StaticFindObjectFastInternal(NULL, ThisParent, Export.ObjectName, true);
 [...]
                     if (ActualObjectWithTheName && (ActualObjectWithTheName->GetClass() == LoadClass))
                     {
                         Export.Object = ActualObjectWithTheName;
                     }
         
                     // Object is found in memory.
                     if( Export.Object )
                     {
 [...]
                         return Export.Object;
                     }


best,

Klaudiusz

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hi Jon,

Were you able to reproduce this bug? We prepared a repro project with video on vanilla 4.16.3 that has the same issue.

What you need to do:

  1. Create an empty project in vanilla 4.16.3

  2. Put packages from attached zip in Content folder

  3. On server, run the map TestCase_P

  4. On client connect to server

  5. On server walk into Load trigger, that streams in TestCase_SubLevel

  6. Walk into Unload trigger that streams out TestCase_SubLevel

  7. Again walk into Load trigger, that streams in TestCase_SubLevel

  8. Now walk into Test trigger and note that it isn't working anymore.

Lost Actor references in TestCase_SubLevel is caused by Level BP Variable ProblemVariable on level TestCase_SubLevel: It has Replication flag set to Replicated

Hope this helps you tracking it down. It is a really serious bug and is making multiplayer scripting basically unusable, causing a lot of issues for us.

best, Klaudiusz

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Klaudiusz,

Thanks for the assets. Using them I was able to reproduce the issue in the newest version of the engine. I'm going to keep investigating today.

Thanks, Jon

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Klaudiusz,

Ok, so we've identified the problem, at least in the newest version of the engine. It's the same issue, but there's potential (although unlikely) that there are additional things happening on your end.

The Replication code always assumed that Classes (and Structs and Functions) would never be destroyed at runtime. The assumption was made because these types of things are really more like Meta Data descriptions of classes than actual gameplay objects. This is the assumption that's causing the problem.

For every class, we create an FRepLayout that knows how the replicated properties are laid out in memory, and how to serialize / deserialize the replicated properties for networking.

The problem is that FRepLayout is storing hard references to the properties. I didn't catch this at first, because it's not happening through the normal property system. Instead, it's happening through AddReferences.

Basically, the fix is to move these references from the Class level to the Object level and when the Class is destroyed just remove the FRepLayout.

We haven't implemented a fix for this yet, but after discussion it doesn't sound that bad. Basically, in FRepLayout change the Owner member to be a TWeakObjectPtr (so we know when the class gets destroyed) and move the AddReferences method onto FRepState (which is per object). Finally, To prevent memory leaks from repeatedly created FRepLayouts, occasionally clean out the UNetDriver::RepLayoutMap so that any RepLayouts whose Owner is null (e.g., owner has been destroyed) gets removed.

I'll try to have a CL for this by next week, but hopefully you can work on a solution in the interim with the info I've provided.

Thanks, Jon

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hi Jon,

What's your progress with a fix for references in BP Class? We will need this fix anyway, because our workaround of not using replicated variables is not possible in some cases.

I'm going to look at it, but if you have any progress so far, please let me know.

best, Klaudiusz

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Klaudiusz,

Sorry I forgot to update this ticket. CL-3611087 has the changes mentioned. This has not been submitted yet, but I tested it against your case and it seemed to work.

Thanks, Jon

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hi Jon,

Thanks a lot!

Unfortunately, I tried your fix, but just integrating it over our current 4.16 codebase and it causes ReplicationChangeListMap cleanup in UNetDriver::TickFlush() to crash. I think there were just too many changes since 4.16 that new code depends on. I'll wait a few days until we have 4.17 integrated, try my luck again. I'll let you know.

best, Klaudiusz

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Klaudiusz,

I've attached three patch files that you can use to see just the isolated changes of the shelf. These shouldn't have much (if any) impact on other parts of the replication system, so you shouldn't experience crashes.

P4 doesn't actually like / handle patches very well, so I generated these using a secondary tool. As such, you probably won't be able to directly apply them (but they should give you context into what exactly you need).

Patches

Thanks, Jon

patch.zip (1.5 kB)
avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

In case any other user comes along these parts and has trouble applying this patch to NetworkDriver.cpp: that change goes into UNetDriver::ServerReplicateActors(float DeltaSeconds), just before TArray ConsiderList;.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Jon,

I confirm that we're also seeing the crash in ReplicationChangeListMap cleanup that Klaudiusz mentions above. We're also on 4.16.2, will try again in 4.17, which we are in the process of integrating.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Leszek,

Are the crashes still occurring with the patches?

As I pointed out above, the changes listed above (including the patches) are not final nor have they been pushed into the engine yet. To that end, they are "use at your own risk" sort of things at this point.

Thanks, Jon

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Yes, with the patches. I believe we followed precisely in the footsteps of Klaudiusz here: same problem, finding this thread, applying your patches, getting crashes.

Understood. Can we hope for an update in this thread once they do get pushed?

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Once everything is submitted, I will updated this thread.

avatar image NMouzourakis Dec 12 '18 at 08:26 PM

Hello, we seem to be seeing a similar issue happening on our end, is there any update on issue UE-60086? Or perhaps a CL that has gone into a recent engine version (or just a current best fix) that we can integrate into our engine? Our version is 4.17.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM +

Hi everyone,

I did some investigation on this issue while Jon is out of the office, and have shelved a different approach in 3762353. It focuses on clearing references to the level script actor and it's blueprint class that were being held by the net driver even after streaming out the sub level.

Using the attached repro levels I was successfully able to stream in, out, and in again multiple times with my variable references intact. I was able to use both seamless and non-seamless server travels as well.

There may still be some issues under high packet lag/loss situations, but let me know if this helps.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hello, sorry for late response. I've made a workaround, but now we probably have the problem with the same nature.

And I don't think I have access to your vcs, could you upload a patch withyour fix? Thanks.

PS We are not using Compile Manager in our version, that's the reason why it still actual for me.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Attached a diff generated from the shelved change. This was made against the latest code in Dev-Networking.

patchdiff.txt (2.9 kB)
avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Thanks a lot.

PS I am on my way in restoring perforce access to avoid this problems in the future.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hello,

Thanks for the fix! One issue. It's not enough to:

 RepLayoutMap.Remove(Level->LevelScriptActor->GetClass());

You should also remove all replicated functions/events:

 for (auto Func : TFieldRange(LevelScriptActor->GetClass(), EFieldIteratorFlags::ExcludeSuper))
 {
     if (Func && Func->HasAnyFunctionFlags(EFunctionFlags::FUNC_Net))
     {
         RepLayoutMap.Remove(Func);
     }
 }

LMK if it makes sense.
Cheers, M

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

That does indeed make sense, thanks for catching it! For reference, I've entered a bug to get this fixed in a future engine release, UE-60086.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Thanks for the patcth. It works. But after applying the patch we've faced very strange crash

callstack lines can be a little bit different, so I will show a code after a callstack:

 CallStack - OTWD!FDebug::AssertFailed() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\core\private\misc\assertionmacros.cpp:414]
 OTWD!UActorChannel::CleanupReplicators() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\datachannel.cpp:1567]
 OTWD!UNetDriver::Shutdown() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\networkdriver.cpp:1167]
 OTWD!DestroyNamedNetDriver_Local() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:9399]
 OTWD!UEngine::ShutdownWorldNetDriver() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:9231]
 OTWD!UEngine::LoadMap() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:10352]
 OTWD!UEngine::Browse() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:9971]
 OTWD!UEngine::TickWorldTravel() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:10189]
 OTWD!UGameEngine::Tick() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\gameengine.cpp:1211]
 OTWD!FEngineLoop::Tick() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\launch\private\launchengineloop.cpp:3301]
 OTWD!GuardedMain() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\launch\private\launch.cpp:166]
 OTWD!GuardedMainWrapper() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\launch\private\windows\launchwindows.cpp:134]
 OTWD!WinMain() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\launch\private\windows\launchwindows.cpp:210]
 OTWD!__scrt_common_main_seh() [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253]
 kernel32
 ntdll

Crash callstack leads to line with for iterator:

 oid UActorChannel::CleanupReplicators( const bool bKeepReplicators )
 {
     // Cleanup or save replicators
     for ( auto CompIt = ReplicationMap.CreateIterator(); CompIt; ++CompIt )
     {
              if ( bKeepReplicators && CompIt.Value()->GetObject() != nullptr )

Crash reason is out of bounds array index

Is it something with iterator implementation in this case? I can't imagine how for line can crash with empty arrays. Or can it be just access to the invalid object, and the line is wrong just because of optimization level? This bug is very hard to reproduce, and we don't have resources to catch it on the level without optimization.

Top level code causing the crash is the code from the patch:

         for (auto It = ServerConnection->ActorChannels.CreateIterator(); It; ++It)
         {
             UActorChannel* Channel = It.Value();
             if (Channel)
             {
                 Channel->CleanupReplicators();
             }
         }

Maybe I should add some additional checks somewhere?

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hi Dmitry,

Apologies for the delay. I have not seen this particular crash before while testing the fix, although the test case examples originally used were very simple.

I suspect that you are correct and that code optimization is giving you a misleading crash line; it is more likely that the array assertion is being hit from within one of the cleanup calls. Either that, or the actor channel array is being modified unexpectedly in the midst of the iteration. If you are able to find consistent repro steps that would be helpful for tracking this down, and in the meantime I'll take a look again in a 4.20 build.

Are you able to share the engine version you're currently working with? I would also like to look for interim networking fixes that might explain what's going on.

Thanks, Brian

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hello, Brian, thank you for helping us.

This type of issues hard to reproduce. I was unable to do it on my machine, it happens sporadically while our QA finishing the level, so for now there is no 100% repro. I am still looking for a repro now. And as soon as I find something I will post it here, but I can't guarante a success. I will also analyze their reports and add some more info.

We are using 4.18 now. In our modifications we have very small amount of networking related changes. And we are not touching nothing related to replicators and channels (and as soon as I removing the patch from the version the crash goes away). Anyway, I think as it's a good idea to provide as much information as I can, so I will show you the only suspicious change (but it doesn't look related) we've made in the engine:

 void UNetDriver::InternalProcessRemoteFunction ( line 1295 )
     (
     ...
     // Get the actor channel.
     UActorChannel* Ch = Connection->ActorChannels.FindRef(Actor);
     if( !Ch )
     {
         if( IsServer )
         {
             //SBZ
             if ( Actor->IsPendingKillPending() || Actor->bTearOff ) (line 1359)
             {
                 // Don't try opening a channel for me, I am in the process of being destroyed. Ignore my RPCs.
                 return;
             }
             // SBZ


we don't create a new channel for tearred of actors, if someone tries to send rpc call, that's it. Don't think this changes lead to the crash.

Can we add some workaround for the Replicator issue now? some critical section or if branches? the only things removes actors from ActorChannels are:

 void UActorChannel::Close()
 void UActorChannel::SetClosingFlag()
 void UNetDriver::NotifyActorLevelUnloaded( AActor* TheActor )

I think I should log this too, and hope engine will be able to flush this data before the crash.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Hi Dmitry,

I have yet not been able to reproduce a crash with the changes. If it really is the iterator/actor channel map that is crashing, you could try switching to iterating on the open channels list instead (should generally be safer in case this is a garbage collection problem):

 for (UChannel* Channel : ServerConnection->OpenChannels)
 {
     UActorChannel* ActorChannel = Cast(Channel);
     if (ActorChannel)
     {
         ActorChannel->CleanupReplicators();
     }
 }

It may be worth adding logging to the replicator cleanup loop with the contents of UChannel::Describe() to help track down what is or is not being cleaned up before the crash, but it would be a lot of log spam in the normal working scenario.

avatar image Answers.Archive STAFF Sep 11 '18 at 08:54 PM

Thanks.

Two days of testing is not a long time for this crash, bu it seems it is fixed.

If I receive new information I will come back to this thread. For now I would like to say that this solution clearly fixes the problem. Great job.

(comments are locked)
10|2000 characters needed characters left
Your answer
toggle preview:

Up to 5 attachments (including images) can be used with a maximum of 5.2 MB each and 5.2 MB total.

Follow this question

Once you sign in you will be able to subscribe for any updates here

Answers to this question