Only 33 instances of Instanced Static Mesh displayed on Android ES2

I’m trying to display a labyrinth generated at runtime in C++ via InstancedStaticMesh on an Android mobile device with feature level ES2. After setting the material shading model of walls and floors to unlit to see anything at all (although it’s really ugly like that), I was very suprised that only 33 of 279 wall/floor instances are displayed in the “Android preview” mode of Unreal Engine 4.10.2 (Settings → Preview Rendering Level → Mobile / HTML5 → Android preview). Interestingly, when I eject and select the labyrinth actor, all instances are shown, but maybe it’s using a special mode for that…

I searched a bit through the source code of Unreal Engine and found out, that batches of (probably) 64 instances are used to render when no real instanced meshes are available for the platform. But outputting the number of used batches for the walls and floors show that the expected 2-3 batches for each is returned by FStaticMeshSceneProxy::GetNumMeshBatches(). So currently I don’t see what could be causing this.

As this works perfectly on Windows and InstancedStaticMesh for mobile is supposed to work since UE 4.4, I consider this as a bug.

Here is a collage of some screenshots from playing inside the editor annotated with the number of visible instances (yes, with unlit material it’s really ugly, I’m open for suggestions ^_^):

And here’s the code used to instantiate the map:

ALabyrinth::ALabyrinth(const FObjectInitializer &ObjectInitializer)
{
	PrimaryActorTick.bCanEverTick = true; //false;

	static ConstructorHelpers::FObjectFinder<UStaticMesh> labyMeshes[LWT_NumWallTypes] = {
		ConstructorHelpers::FObjectFinder<UStaticMesh>(TEXT("StaticMesh'/Game/Labyrinth/Floor_StaticMesh.Floor_StaticMesh'")),
		ConstructorHelpers::FObjectFinder<UStaticMesh>(TEXT("StaticMesh'/Game/Labyrinth/Wall_StaticMesh.Wall_StaticMesh'"))
	};

	RootComponent = ObjectInitializer.CreateDefaultSubobject<USceneComponent>(this, TEXT("Scene"));

	for (int i = 0; i < LWT_NumWallTypes; i++)
	{
		InstancedComponents[i] = ObjectInitializer.CreateDefaultSubobject<UInstancedStaticMeshComponent>(this, FName(*FString::Printf(TEXT("InstancedComponents_%d"), i)));
		InstancedComponents[i]->AttachTo(RootComponent);
		InstancedComponents[i]->SetStaticMesh(labyMeshes[i].Object);
	}
}

// Called by ALabyrinth::BeginPlay()
void ALabyrinth::InstantiateLabyrinth()
{
	FVector actorLoc = GetActorLocation();
	ScaleFactor = 0.75;

	for (int i = 0; i < LWT_NumWallTypes; i++)
	{
		InstancedComponents[i]->SetRelativeLocation(FVector(-(LabyWidth / 2.f) * 100 * ScaleFactor, -(LabyHeight / 2.f) * 100 * ScaleFactor, 0));
	}

	for (uint32 y = 0; y < LabyHeight; y++)
	{
		for (uint32 x = 0; x < LabyWidth; x++)
		{
			uint8 walltype = Labyrinth[y * LabyHeight + x];
			FVector pos = actorLoc + FVector(x * 100 * ScaleFactor, y * 100 * ScaleFactor, 0);
			InstancedComponents[walltype]->AddInstance(
				FTransform(FQuat::Identity, pos, FVector(ScaleFactor)));
		}
	}
}

// Debug message to prove GetNumMeshBatches is correct.
void ALabyrinth::Tick(float DeltaTime)
{
	Super::Tick(DeltaTime);

	if (InstancedComponents[0]->SceneProxy == NULL)
	{
		GEngine->AddOnScreenDebugMessage(222, 5.f, FColor::Blue, TEXT("SceneProxy 0 is NULL!"));
	}
	else
	{
		int32 numbatches = static_cast<FStaticMeshSceneProxy *>(InstancedComponents[0]->SceneProxy)->GetNumMeshBatches();
		GEngine->AddOnScreenDebugMessage(222, 5.f, FColor::Blue, FString::Printf(TEXT("NumBatches for InstancedComponents[0] = %d"), numbatches));
	}

	if (InstancedComponents[1]->SceneProxy == NULL)
	{
		GEngine->AddOnScreenDebugMessage(223, 5.f, FColor::Blue, TEXT("SceneProxy 1 is NULL!"));
	}
	else
	{
		int32 numbatches = static_cast<FStaticMeshSceneProxy *>(InstancedComponents[1]->SceneProxy)->GetNumMeshBatches();
		GEngine->AddOnScreenDebugMessage(223, 5.f, FColor::Blue, FString::Printf(TEXT("NumBatches for InstancedComponents[1] = %d"), numbatches));
	}
}

After finding out, that I can actually debug deeply into the renderer without having to recompile the whole engine myself, I found the source of the bug: FInstancedStaticMeshVertexFactory::GetStaticBatchElementVisibility() in Engine\Source\Runtime\Engine\Private\InstancedStaticMesh.h

	/**
	* Get a bitmask representing the visibility of each FMeshBatch element.
	*/
	virtual uint64 GetStaticBatchElementVisibility(const class FSceneView& View, const struct FMeshBatch* Batch) const override
	{
		uint32 NumElements = FMath::Min((uint32)Batch->Elements.Num(), NumBitsForVisibilityMask());
		return (1ULL << (uint64)NumElements) - 1ULL;
	}

The bug appears, if you have a full batch, i.e. a batch with 64 elements. In this case 1ULL << (uint64)NumElements does not fit into 64-bit anymore. What happens shows this x64 disassembly:

So, (1ULL << (uint64)NumElements) - 1ULL with NumElements = 64 = 0x40 results in (1ULL << (0x40 & 0x3f)) - 1ULL = (1ULL << 0) - 1ULL = 1ULL - 1ULL = 0. Thus the return value of the function is 0 and no elements of any full batch will be displayed.

Now we can also explain, why it displayed 33 elements: 289 % 64 = 33. It was displaying the only non-empty batch.

So here’s a fixed (but untested) version of this function:

	/**
	* Get a bitmask representing the visibility of each FMeshBatch element.
	*/
	virtual uint64 GetStaticBatchElementVisibility(const class FSceneView& View, const struct FMeshBatch* Batch) const override
	{
		uint32 NumElements = FMath::Min((uint32)Batch->Elements.Num(), NumBitsForVisibilityMask());
		return NumElements == 64 ? 0xffffffffffffffffULL : (1ULL << (uint64)NumElements) - 1ULL;
	}

I found another potential occurrence of this bug if platform is not PLATFORM_MAC (I don’t know the maximum value for NumBits though), you maybe also want to fix it:

master branch: Engine/Source/Runtime/Apple/MetalRHI/Private/MetalRenderPipelineDesc.h: FMetalRenderPipelineDesc::SetHashValue()

template<typename Type>
	inline void SetHashValue(uint32 Offset, uint32 NumBits, Type Value)
	{
		FMetalRenderPipelineHash BitMask = ((((FMetalRenderPipelineHash)1ULL) << NumBits) - 1) << Offset;
		Hash = (Hash & ~BitMask) | (((FMetalRenderPipelineHash)Value << Offset) & BitMask);
	}

OK, after applying the fix for GetStaticBatchElementVisibility() the result looked like this when the number of floors was exactly 64:

So back to debugging, I found this code in Engine\Source\Runtime\Engine\Private\InstancedStaticMesh.cpp: FInstancedStaticMeshSceneProxy::SetupInstancedMeshBatch()

		const uint32 MaxInstancesPerBatch = FInstancedStaticMeshVertexFactory::NumBitsForVisibilityMask();
		const uint32 NumBatches = FMath::DivideAndRoundUp(NumInstances, MaxInstancesPerBatch);
		uint32 NumInstancesThisBatch = BatchIndex == NumBatches - 1 ? NumInstances % MaxInstancesPerBatch : MaxInstancesPerBatch;

With 128 floors and MaxInstancesPerBatch being 64, we would have exactly 2 batches with 64 floors each.
But the above code will calculate this for the second batch:

NumInstanceThisBatch = 1 == 2 - 1 ? 128 % 64 : 64 = true ? 0 : 64 = 0

Thus the second batch will be empty.

Here’s a fixed version of the code:

		const uint32 MaxInstancesPerBatch = FInstancedStaticMeshVertexFactory::NumBitsForVisibilityMask();
		const uint32 NumBatches = FMath::DivideAndRoundUp(NumInstances, MaxInstancesPerBatch);
		uint32 NumInstancesThisBatch;

		if (BatchIndex == NumBatches - 1)  // Last batch?
		{
			NumInstancesThisBatch = NumInstances % MaxInstancesPerBatch;
			if (NumInstancesThisBatch == 0)  // Last batch is full? -> modulo returns 0, so we have to fix it
				NumInstancesThisBatch = MaxInstancesPerBatch;
		}
		else
		{
			NumInstancesThisBatch = MaxInstancesPerBatch;
		}

I created a pull request containing the two patches to fix this bug:

https://github.com/EpicGames/UnrealEngine/pull/2031

I’d really appreciate, if you could have a look at my other problem (How to play two sounds directly after each other on Android? - Audio - Unreal Engine Forums), as debugging the Android sound system goes over my limits :slight_smile: