Sub level streaming FPS drops

Hi,

Currently in our project we are using sub level and other asset streaming to reduce the memory usage and initial loading times but whenever streaming is occurring, the FPS drops considerable and causes freezes of up to 2 seconds.

Is streaming not happening on a separate thread? Is there a way to make it more smooth?

Thanks,

Franco

Franco,

There are two parts to level streaming, in terms of loading. The first is reading the assets from the disk, and the second is creating the necessary UObjects for the level.

UE4 requires that UObjects and derived classes be created on the Game Thread (there are a number of reasons for this). The main parts of Async Loading that happens in a separate thread is actually reading data from disk.

As long as you have the Async Loading Thread enabled, I/O work will be handled in a separate background thread. However, there’s still some amount of work that needs to happen on the GameThread (like creating UObjects). This is in AsyncLoading.cpp, and can be changed as you normally would CVars (including through INI):

static int32 GAsyncLoadingThreadEnabled;
static FAutoConsoleVariableRef CVarAsyncLoadingThreadEnabledg(
	TEXT("s.AsyncLoadingThreadEnabled"),
	GAsyncLoadingThreadEnabled,
	TEXT("Placeholder console variable, currently not used in runtime."),
	ECVF_Default
	);

There’s also a few settings in CoreSettings.cpp:

float GAsyncLoadingTimeLimit = 5.0f;
int32 GAsyncLoadingUseFullTimeLimit = 1;
float GPriorityAsyncLoadingExtraTime = 20.0f;

static FAutoConsoleVariableRef CVarAsyncLoadingTimeLimit(
	TEXT("s.AsyncLoadingTimeLimit"),
	GAsyncLoadingTimeLimit,
	TEXT("Maximum amount of time to spend doing asynchronous loading (ms per frame)."),
	ECVF_Default
	);

static FAutoConsoleVariableRef CVarAsyncLoadingUseFullTimeLimit(
	TEXT("s.AsyncLoadingUseFullTimeLimit"),
	GAsyncLoadingUseFullTimeLimit,
	TEXT("Whether to use the entire time limit even if blocked on I/O."),
	ECVF_Default
	);

static FAutoConsoleVariableRef CVarPriorityAsyncLoadingExtraTime(
	TEXT("s.PriorityAsyncLoadingExtraTime"),
	GPriorityAsyncLoadingExtraTime,
	TEXT("Additional time to spend asynchronous loading during a high priority load."),
	ECVF_Default
	);

Second is actually creating the level + objects in game. This does not happen on a separate thread. So, the way level streaming deals with this is by splitting up the Creation and Registration of objects across multiple frames. There are a number of options for this. These options are also CVars that can be set, and there is also a section under property settings which makes editing them easier. These are defined in CoreSettings.cpp:

int32 GUseBackgroundLevelStreaming = 1;
float GAsyncLoadingTimeLimit = 5.0f;
int32 GAsyncLoadingUseFullTimeLimit = 1;
float GPriorityAsyncLoadingExtraTime = 20.0f;
float GLevelStreamingActorsUpdateTimeLimit = 5.0f;
float GLevelStreamingUnregisterComponentsTimeLimit = 1.0f;
int32 GLevelStreamingComponentsRegistrationGranularity = 10;
int32 GLevelStreamingComponentsUnregistrationGranularity = 5;

static FAutoConsoleVariableRef CVarUseBackgroundLevelStreaming(
	TEXT("s.UseBackgroundLevelStreaming"),
	GUseBackgroundLevelStreaming,
	TEXT("Whether to allow background level streaming."),
	ECVF_Default
	);

static FAutoConsoleVariableRef CVarAsyncLoadingTimeLimit(
	TEXT("s.AsyncLoadingTimeLimit"),
	GAsyncLoadingTimeLimit,
	TEXT("Maximum amount of time to spend doing asynchronous loading (ms per frame)."),
	ECVF_Default
	);

static FAutoConsoleVariableRef CVarAsyncLoadingUseFullTimeLimit(
	TEXT("s.AsyncLoadingUseFullTimeLimit"),
	GAsyncLoadingUseFullTimeLimit,
	TEXT("Whether to use the entire time limit even if blocked on I/O."),
	ECVF_Default
	);

static FAutoConsoleVariableRef CVarPriorityAsyncLoadingExtraTime(
	TEXT("s.PriorityAsyncLoadingExtraTime"),
	GPriorityAsyncLoadingExtraTime,
	TEXT("Additional time to spend asynchronous loading during a high priority load."),
	ECVF_Default
	);

static FAutoConsoleVariableRef CVarLevelStreamingActorsUpdateTimeLimit(
	TEXT("s.LevelStreamingActorsUpdateTimeLimit"),
	GLevelStreamingActorsUpdateTimeLimit,
	TEXT("Maximum allowed time to spend for actor registration steps during level streaming (ms per frame)."),
	ECVF_Default
	);

static FAutoConsoleVariableRef CVarLevelStreamingUnregisterComponentsTimeLimit(
	TEXT("s.UnregisterComponentsTimeLimit"),
	GLevelStreamingUnregisterComponentsTimeLimit,
	TEXT("Maximum allowed time to spend for actor unregistration steps during level streaming (ms per frame). If this is zero then we don't timeslice"),
	ECVF_Default
);

static FAutoConsoleVariableRef CVarLevelStreamingComponentsRegistrationGranularity(
	TEXT("s.LevelStreamingComponentsRegistrationGranularity"),
	GLevelStreamingComponentsRegistrationGranularity,
	TEXT("Batching granularity used to register actor components during level streaming."),
	ECVF_Default
	);

static FAutoConsoleVariableRef CVarLevelStreamingComponentsUnregistrationGranularity(
	TEXT("s.LevelStreamingComponentsUnregistrationGranularity"),
	GLevelStreamingComponentsUnregistrationGranularity,
	TEXT("Batching granularity used to unregister actor components during level unstreaming."),
	ECVF_Default
	);

One thing to point out is that any of the “time” values mentioned above are in milliseconds (I’ve seen a few licensees assume it was seconds).

Basically, there are a number of different settings associated with level streaming. There’s definitely no “one size fits all” adjustment of these settings, so it’s up to individual projects to figure out what values work well for them.

One thing you can do to start is enable PERF_TRACK_DETAILED_ASYNC_STATS (which is defined in AsyncLoading.h). This will print out detailed information on how much time each frame was spent on handling Streaming Steps when a world is loaded. From there, you can figure out what you need to adjust in order to get a better framerate. You want to make sure to disable this when not testing, as there are other pieces of code that also use this can it could bloat your logs a bit.

Of course, you’ll have to make tradeoffs and find a good balance between total loading time and framerate.

Thanks,
Jon

2 Likes

Hi Jon,

I’d want to reopen this thread because we have been testing out a lot of different parameters and no matter what we do we still have large streaming hitches.

There is one section in particular where we would like to be able to play a full screen movie (in PS4) while a couple of sublevels load in the background. The parameters I modify temporarily so I can control the streaming during the movie are as follows:

GAsyncLoadingTimeLimit = 0.5f
GAsyncLoadingUseFullTimeLimit = 0
GPriorityAsyncLoadingExtraTime = 1.0f
GLevelStreamingActorsUpdateTimeLimit = 0.5f
GLevelStreamingUnregisterComponentsTimeLimit = 0.05f
GLevelStreamingComponentsRegistrationGranularity = 1
GLevelStreamingComponentsUnregistrationGranularity = 1

I am attaching part of the logs from when I activated DUMPHITCHES right before starting the movie:
[1]: 253474-moviehitches.txt (73 KB)

Is there something I’m not understanding about streaming? Is there anything else I could try out? Someone suggested I could try bPlayersOnly in the UWorld but that will pause the streaming which defeats the purpose of what we are trying to do.

Thanks,

Franco

Hi Jon,

I’d want to reopen this thread because we have been testing out a lot of different parameters and no matter what we do we still have large streaming hitches.

There is one section in particular where we would like to be able to play a full screen movie (in PS4) while a couple of sublevels load in the background. The parameters I modify temporarily so I can control the streaming during the movie are as follows:

GAsyncLoadingTimeLimit = 0.5f
GAsyncLoadingUseFullTimeLimit = 0
GPriorityAsyncLoadingExtraTime = 1.0f
GLevelStreamingActorsUpdateTimeLimit = 0.5f
GLevelStreamingUnregisterComponentsTimeLimit = 0.05f
GLevelStreamingComponentsRegistrationGranularity = 1
GLevelStreamingComponentsUnregistrationGranularity = 1

I am attaching part of the logs from when I activated DUMPHITCHES right before starting the movie:
[1]: 253474-moviehitches.txt (73 KB)

Is there something I’m not understanding about streaming? Is there anything else I could try out? Someone suggested I could try bPlayersOnly in the UWorld but that will pause the streaming which defeats the purpose of what we are trying to do.

Thanks,

Franco

Hi Jon, thanks for the answer. I see that for GAsyncLoadingThreadEnabled the comment is “Placeholder console variable, currently not used in runtime.” and if I search the entire code base for this variable, it appears nowhere else. Does this mean that this is not implemented yet and all aspects of streaming happens in the game thread?

Hi Jon,

I’d want to reopen this thread because we have been testing out a lot of different parameters and no matter what we do we still have large streaming hitches.

There is one section in particular where we would like to be able to play a full screen movie (in PS4) while a couple of sublevels load in the background. The parameters I modify temporarily so I can control the streaming during the movie are as follows:

GAsyncLoadingTimeLimit = 0.5f
GAsyncLoadingUseFullTimeLimit = 0
GPriorityAsyncLoadingExtraTime = 1.0f
GLevelStreamingActorsUpdateTimeLimit = 0.5f
GLevelStreamingUnregisterComponentsTimeLimit = 0.05f
GLevelStreamingComponentsRegistrationGranularity = 1
GLevelStreamingComponentsUnregistrationGranularity = 1

I am attaching part of the logs from when I activated DUMPHITCHES right before starting the movie:
[1]: 253474-moviehitches.txt (73 KB)

Is there something I’m not understanding about streaming? Is there anything else I could try out? Someone suggested I could try bPlayersOnly in the UWorld but that will pause the streaming which defeats the purpose of what we are trying to do.

Thanks,

Franco

Franco,

The variable is used, but admittedly it may not be the most obvious and the comments are outdated.

If you search for “s.AsyncLoadingThreadEnabled” (which is the CVars name), you’ll see that it is referenced in 2 places outside it’s definition.

First, UStreamingSettings::AsyncLoadingThreadEnabled is set up as a Console Variable property, so it should reflect that value. However, this isn’t directly used in the engine, but rather it’s used so the value can be set via the Settings Editor from within the Editor (I believe it’s under Project Settings).

Second, inside AsyncLoadingThread::IsMultiThreaded when THREADSAFE_UOBJECTS is enabled (In ObjectMacros.h, you’ll see this is enabled by default). You’ll see that when IsMultiThreaded returns true, the constructor for FAsyncLoadingThread will create a new thread:

if (FAsyncLoadingThread::IsMultithreaded())
{
	Thread = FRunnableThread::Create(this, TEXT("FAsyncLoadingThread"), 0, TPri_Normal);
}
else
{
	Thread = nullptr;
	Init();
}

Thanks,
Jon

I’m still having trouble activating the loading thread. In DefaultEngine.ini I have:

[/Script/Engine.StreamingSettings]
s.EventDrivenLoaderEnabled=False
s.AsyncLoadingThreadEnabled=True

And if I put a breakpoint in UStreamingSettings::PostInitProperties() then UStreamingSettings::AsyncLoadingThreadEnabled is true. But when I get to
FAsyncLoadingThread::FAsyncLoadingThread(), FAsyncLoadingThread::IsMultithreaded() returns false and Thread becomes nullptr.

Franco,

Something I should have asked earlier, how are you testing this? Are you built in Development or Shipping? Running PIE, Standalone, or a packaged build?

If you’re using a Packaged build, are you running on a Console or a Desktop?

Here is the full implementation of the IsMultithreaded check:

/** True if multithreaded async loading should be used. */
static FORCEINLINE bool IsMultithreaded()
{
    static struct FAsyncLoadingThreadEnabled
    {
        bool Value;
        FORCENOINLINE FAsyncLoadingThreadEnabled()
        {
#if THREADSAFE_UOBJECTS
            if (FPlatformProperties::RequiresCookedData())
            {
                check(GConfig);
                bool bConfigValue = true;
                GConfig->GetBool(TEXT("/Script/Engine.StreamingSettings"), TEXT("s.AsyncLoadingThreadEnabled"), bConfigValue, GEngineIni);
                bool bCommandLineNoAsyncThread = FParse::Param(FCommandLine::Get(), TEXT("NoAsyncLoadingThread"));
                bool bCommandLineAsyncThread = FParse::Param(FCommandLine::Get(), TEXT("AsyncLoadingThread"));
                Value = bCommandLineAsyncThread || (bConfigValue && FApp::ShouldUseThreadingForPerformance() && !bCommandLineNoAsyncThread);
            }
            else
#endif
            {
                Value = false;
            }
        }
    } AsyncLoadingThreadEnabled;
    return AsyncLoadingThreadEnabled.Value;
}

As you can see, it’s not solely dependent on the CVar. It also checks whether or not the platform requires cooked data (which is generally true, unless you’re running a non-cooked build on a desktop), whether or not you’ve explicitly disabled the thread (via command line), and also FApp::ShouldUseThreadingForPerformance (which could return false for a number of reasons, so I’d suggest taking a look at its implementation).

I’d also suggest you take a look at the Performance and Profiling docs if you haven’t already. These can give you more insight into exactly what’s causing the hitch:

Something I’ll point out is that if you’re running an uncooked build (e.g., standalone etc.) then it’s possible loading is also causing you to compile blueprints on the fly.

It’s also worth noting that any hard references to assets (either directly in the level, or in classes / objects the level has references to (and so on down the chain)) that those will need to be resolved before the level will be considered loaded. To that end, it’s possible if you haven’t been careful that loading certain levels could inadvertently be loading more content then you’d expect.

Thanks,
Jon N.

Yeah I was using PIE to test this. Thanks Jon, this clears up everything.

Franco

Franco,

Are you still testing this in

At a glance:

  1. Your Components*Granularity values are going to be a bottleneck. Only allowing a single component to be registered in a frame means that it will likely take many frames to load in a single sublevel (assuming your levels are not nearly empty).
  2. Your time limits are very small. These values are supposed to be specified in MIlliseconds. So, for example, 0.05 is only 50 microseconds. Similarly, the other time values seem very small. Again, this could (and likely will) lead to your levels taking a large number of frames.

Looking at the dump you provided, there are a few things that stand out.

The first hitch is ~121 ms. Most of that time is being spent in this block:

98.343ms (   1)  -  UpdateLevelStreaming Time - STAT_UpdateLevelStreamingTime - STATGROUP_StreamingDetails - STATCAT_Advanced
  98.325ms (  52)  -  LevelStreamingKismet/Game/Maps/ForestNew/FO_Main.FO_Main.LevelStreamingKismet - STATGROUP_UObjects - STATCAT_Advanced
    98.300ms (   1)  -  AddToWorld Time - STAT_AddToWorldTime - STATGROUP_StreamingDetails - STATCAT_Advanced
      98.299ms (   1)  -  Level/Game/Maps/ForestNew/FO_Tutorial_SCR.FO_Tutorial_SCR.PersistentLevel - STATGROUP_UObjects - STATCAT_Advanced
        73.827ms (   6)  -  LoadObject - STAT_LoadObject - STATGROUP_Object - STATCAT_Advanced
          69.349ms (   6)  -  Self

There are a few things that are odd about this. First, there’s no explicit LoadObject calls inside AddToWorld. By the time AddToWorld is called, it’s expected that the level (and all of it’s Actors, and all of their components) have already been loaded. Further, there are no stat groups below the Object scope.

This likely means that there’s something being triggered by Game Code somewhere that’s causing some other large load.

In AsyncLoading.h, we define PERF_TRACK_DETAILED_ASYNC_STATS. If you set that to 1 and recompile, we’ll print out detailed information about where time is being spent. This includes a bunch of more granular logs in AddToWorld. This may help you track down at what point in the process you’re actually spending those cycles.

The other hitches seem more like general things. Your Slate Tick Time generally seems high 11~12ms. In the final hitch, it’s up to ~41ms. That accounts for a large portion of the second hitch of ~30ms.

Another one of the hitches is from Garbage Collection.

Certain operations in level streaming are not done across multiple frames. For example, all Actors / Components will have BeginPlay called on the final frame, along with UpdateOverlaps.

Your spending a non-trivial amount of time in both of these. Potential things to help in that case:

  1. Limit heavy lifting done in BeginPlay.
  2. Make sure you only have GenerateOverlaps enabled for components that actually need it.

Thanks,
Jon N.

Hi Jon,

I activated PERF_TRACK_DETAILED_ASYNC_STATS and this is the
result.

During the seamless travel things are not so bad but as soon as seamless travel is done, it is loading sublevels. You can see that there is a huge amount of time spent in Update Components and Initialize. What is inside those categories? BeginPlay() and UpdateOverlaps()?

Thanks,

Franco

Hello,

The UpdateComponents time refers to the block where we are repeatedly calling ULevel::IncrementalUpdateComponents.

This is the stage where we’re going Actor by Actor in a level and calling AActor::IncrementalRegisterComponents.

Also, in non-cooked builds we’ll end up calling the Construction Scripts here for each of the Actors in the level (once all Actors have been properly registered).

This is where the setting s.LevelStreamingComponentRegistrationGranularity comes into play (display name is “Components Registration Granularity”). When this value is 0, you’ll end up forcibly registering all the components of an actor in a single frame.

It’s important to notice that ULevel::IncrementalUpdateComponents does not time slice this method. Time slicing occurs in the repeated calls to IncrementalUpdateComponents made by UWorld::AddToWorld. However, between each Actor we’ll break out of that function and return UWorld::AddToWorld.

Now, in your post above you only have that set to a single component. That means that we’ll repeatedly return to UWorld::AddToWorld after each component is registered.

Now, I’ll point out something I said in an earlier comment that was incorrect:

Your Components*Granularity values are going to be a bottleneck. Only allowing a single component to be registered in a frame means that it will likely take many frames to load in a single sublevel (assuming your levels are not nearly empty).

That’s actually not true. We’ll still end up using the full amount of time possible for the frame. It just means that there’s more wasted time as after each component we do a bunch of iteration / book keeping.

Back to your hitches though. As I pointed out in my previous comment, the LoadObject time was a bottleneck. Have a look through your components, and make sure nothing is making explicit LoadObject calls in RegisterComponent.

Further, generally audit the code you’re running as a part of RegisterComponent calls (e.g. UActorComponent::OnRegister, UActorComponent::Activate, UActorComponent::OnComponentCreated, UActorComponent::OnCreatePhysicsState, and their similar BP methods and delegates).

Finally, the Initialize time is the time spent handling AActor::InitializeComponents, AActor::PostInitializeComponents, AActor::UpdateOverlaps, and AActor::DispatchBeginPlay (see ULevel::RouteActorInitialize).

Thanks,
Jon N.

Hi Jon,

It turns out that the actors calling LoadObject() in their constructors are actors of type ALevelSequenceActor. InitializePlayer() calls GetSequence() which calls FStringAssetReference::TryLoad().

There seems to be no options to preload this asset or load it asynchronously so these hitches are not fixable without heavily modifying the level sequence actor.

For our next project we were planning on making it an open world game, are there any guidelines over what kind of actors are ok to load asynchronously and what actors will cause hitches? Is it even feasible to attempt having an open world game that has very smooth and unnoticeable loading?

Thanks,

Franco

Franco,

FWIW, Fortnite makes heavy use of Level Streaming to split up it’s map. In that case, we’re also running against a lot of network traffic and a lot of map placed actors.

As far as your specific use case, I believe the Level Sequence Actors were actually changed in the last few revisions so they don’t forcibly load their assets anymore (or, at least they due it asynchronously).

This older thread which I had forgotten about actually covers this a bit:
https://udn.unrealengine.com/questions/389991/streaming-hitches-in-416.html

Typically, most things in the engine should be “safe” to use in streaming levels. If they’re not good for streaming levels, then chances are they’re not good for loading times in general, and we’re always auditing that sort of thing.

Thanks,
Jon N.

That’s great information. Thanks a lot for your help.