GetInstancesOverlappingSphere reports instances that have been removed

I believe that GetInstancesOverlappingSphere is bugged. If I call “GetInstancesOverlappingSphere” and then remove all the instances it finds using “RemoveInstances”, then I check the same area again using “GetInstancesOverlappingSphere”, there is about a 0.01% chance that the same instances show again, even after “RemoveInstances” returns successfully.

I know that “UHierarchicalInstancedStaticMeshComponent” is using some async code, and I think there may be a race condition. I can only reproduce this when spamming my function repeatedly.

Is there some function similar to “UPrimitiveComponent::UpdateOverlaps” that I can use on the HISM? Or is this just a bug? Here is some pseudo code:

// 'this' is a UHierarchicalInstancedStaticMeshComponent
TArray<int32> removalIndexes = this->GetInstancesOverlappingSphere(loc, radius, true);
const int instanceCountBefore = this->GetInstanceCount();
bool removedOk = this->RemoveInstances(removal_list);
check(removedOk); // check passes
const int instanceCountAfter = this->GetInstanceCount();
check((instanceCountBefore - instanceCountAfter) == removalIndexes.Num()); // check passes
TArray<int32> removalIndexes2 = this->GetInstancesOverlappingSphere(loc, radius, true);
check(hismRemovalIndexes2.Num() == 0); // check fails (rarely)

Have you seen this bug here? I wonder if it could be related, though it pertains to instances not rendering even though they are still in existence: Unreal Engine Issues and Bug Tracker (UE-53053)

The reason I wonder if its related, is that in that bug instances fail to render but their collision are still present. That however does not explain why GetInstanceCount() would be passing your check.

Looking at your code, the second check could fail if any instances make their way into the sphere size in the second GetInstancesOverlappingSphere call. How large of a sphere are you using, and are there many instances moving around at very high speed near the call? Note that if you aren’t moving instances around in world space, then this could not possibly be the issue.

Instead of the second check you are running, try logging the index values in removalIndexs2, and then also log the int32 returned by a call to GetInstanceCount(). This will indicate to you how many instances exist, and which ones are in the second collision sweep result. This could give you clues about which instances are either not being removed, or which are somehow finding their way into the sphere trace.

Another test you could do, is to perform your own sphere trace (calling it from GetWorld()) right before or after the second GetInstancesOverlappingSphere call, and examine the ‘Item’ member of the return FHitResult (Item is the index of the instance on the HISMC it hits, if any). Curious to see if it also detects an instance that shouldn’t be there.

Could could also try using sphere traces from GetWorld() (use a multi trace to get an array of FHitResult) instead of the sphere trace built into the HISMC, and collect the ‘Item’ numbers of the hits, and do a RemoveInstances using that array of indices instead. Perhaps the bug is in the HISMC sphere trace function.

Another thing: Perhaps try using a smaller sphere radius for the call to GetInstancesOverlappingSphere. Make it much smaller than the dimensions of the instances you are tracing for. That will guarantee it is not colliding with instances adjacent to the instances returned in the first sweep (if they are not moving). Though… not sure why those instances would not have been returned in the first trace, but who knows. Nice to have a large margin for the trace, at any rate.

Note that RemoveInstances does not preserve the order of the HISMC internal array of instances. It uses RemoveAtSwap to remove items from the internal arrays, rather than RemoveAt, which fills holes in the array with the elements at the end of the array, as elements are removed.

Thus, indices from removal_index_1 and removal_index_2 are not guaranteed to refer to the same instances in the HISMC.

Curious, do you experience the same behavior when using GetInstancesOverlappingBox? Might be worth trying just to see what happens.

Thanks for the ideas superdreamgen.

I have experienced bug UE-53053 as well, but only if i try adding instances before everything is fully loaded in.

I’m not changing the transforms of any of the instances, so it can’t be that.

I did look at the transforms of the blocks that showed in removalIndexes2. It’s interesting that removalIndexes usually had dozens of values, and removalIndexes2 only had 1 or 2. The transforms of the blocks in removalIndexes2 were also in removalIndexes.
I think the most likely cause is that GetInstancesOverlappingSphere is giving me the wrong indexes, or they are changing right after I call it.

I have not yet tried using any other trace methods. I’ll look into that later, but first I’m going to try detect if those index values are changing during my function.

I’m aware that the instance locations will change after calling RemoveInstances. My concern is that some async process is changing the indexes after calling GetInstancesOverlappingSphere and before calling RemoveInstances.

I’m wondering if the physics body of the instance is being removed with just a slight delay, in some async process as you describe. Here is a screenshot of part of RemoveInstanceInternal() of HISMC.

Looking at the API, TermBody() cleans up the PhysX data for the instance, which could perhaps indeed be performed in an asynchronous fashion. I really have on idea, but my intuition says that PhysX probably doesn’t run in the game thread.

Actually, I just dug into it a bit more: PhysX can indeed be run in either synchronous or asynchronous modes! You can toggle between the modes here: Project Settings > Physics > Simulation > Async Scene

Is Async scene enabled in your project settings right now? If so, you may indeed be experiencing a race condition with regards to the physics body of the HISMC instance being released. Try turning off async scene and let’s see what happens.

Another option: If you do collision sweeps with GetWorld() instead, you can actually enable or disable async scene collision using FCollisionQueryParams.bTraceAsyncScene.

If it turns out this is what’s happening, adding a boolean to enable/disable async scene tracing when calling GetInstancesOverlappingSphere() would be a good thing for the devs to add.

Digging some more, GetInstancesOverlappingSphere doesn’t trace against phyics bodies at all (which makes sense since I believe physics can be disabled on HISMCs and their instances). It actually uses some geometry algorithms to literally compute if spheres are overlapping one another (see attached images). Pretty nifty actually.

I would say that these algorithms almost certainly do not run asynchronously.

However, I may have noticed something. If you look in in GetInstancesOverlappingSphere, the first branch check to see if something called the ClusterTreePtr is valid. I think the ClusterTree contains some cached data about the locations of the instances in the HISMC, and that data is used to calculate the sphere overlaps.

Looking in RemoveInstances(), there is this if block:

	if (bAutoRebuildTreeOnInstanceChanges)
	{
		BuildTreeIfOutdated(true, false);
	}

So this ClusterTree is optionally rebuilt each time an instance is removed.

Perhaps try setting bAutoRebuildTreeOnInstanceChanges to ‘true’, and see if the problem still persists. That should force a rebuild of the ClusterTree each time you call RemoveInstances, which will hopefully make the sphere overlap calculations more accurate!

Holy smokes!

First of all, bAutoRebuildTreeOnInstanceChanges is set true by default. SECOND OF ALL, there are INDEED asynchronous functions in HISMC, and looking at it, they affect how: Instances are cleared, and how the ClusterTree is rebuilt!

The plot thickens…

It appears that ClusterTree rebuilding is done asynchronously, all the time. I don’t think it can be disabled. This is almost certainly the reason your problem is occurring, as the SphereCollision algorithms depend entirely upon the state of the ClusterTree.

I think your best bet might bet might be to try using GetWorld() multi traces, instead.

Good catch on the async race condition.

I managed to deal with the problem. It may or may not be considered a bug. Here was the conditions that produced the problem:

  1. I have a function named
    getNearestBlockTransform which
    uses
    UInstancedStaticMeshComponent::GetInstancesOverlappingSphere
    internally to determine an
    instance’s index based on it’s
    transform. The function
    GetInstancesOverlappingSphere was
    occasionally returning multiple
    blocks, even though there should not
    have been any overlapping blocks.
    The radius was specified to be only
    2 units. This was either caused by
    the bounds of the instances being
    too loose, or the blocks had somehow
    penetrated each other while they
    were physics actors, and then turned
    into HISM instances.

  2. Since getNearestBlockTransform was
    returning the wrong block
    occasionally, this would cause my
    removal_indexes array to contain
    duplicates. When I sent this array
    to the
    UHierarchicalInstancedStaticMeshComponent::RemoveInstances,
    that’s when the real problem
    started. I’m assuming that sending
    an array to RemoveInstance which
    contains duplicate values causes
    undefined behavior.

  3. After the HISM component’s data had
    been corrupted by that, the
    proceeding call to
    GetInstancesOverlappingSphere was
    returning transforms that I believed
    I had removed.

To fix the problem, I added a new function to find the index values which is more strict. For multiplayer, I needed to relax the tolerance. I’m not sure why, but the transform values did not mach exactly when sent to clients.

int Utw_HismComponent::getBlockIndex(const FTransform& queryTransform, const float overlap_radius) const {
	const FVector queryLoc = queryTransform.GetLocation();
	const TArray<int32> hismIndexes = this->UInstancedStaticMeshComponent::GetInstancesOverlappingSphere(queryLoc, overlap_radius);

	FTransform actualTransform;
	bool ok;
	int matchCount = 0;
	int matchingIndex = -1;
	for (const int32& index : hismIndexes) {
		
		ok = this->GetInstanceTransform(index, actualTransform);
		check(ok);
		if (actualTransform.Equals(queryTransform,0.01)) {
			matchingIndex = index;
			matchCount++;
		}
	}
	check(matchCount == 1);
	return matchingIndex;
}

It might be ok to use UHierarchicalInstancedStaticMeshComponent::GetInstancesOverlappingSphere instead of UInstancedStaticMeshComponent::GetInstancesOverlappingSphere, but I have not tested it yet.

Thank you to superdreamgen for helping me track down this bug.

Glad you were able to figure it out! Happy to lend a hand. (By the way, I deleted some of my replies here because they contained screenshots of Engine code, which I realized we are not allowed to share in public forums).

About the “transform values did not mach exactly when sent to clients”:

Do you mean that the floats in the FTransform location FVector sent to clients via RPC were not the same value on client-side and server-side? That would be bizarre indeed.

Or do you mean that the location of your instances were not identical client-side and server-side? In this case, if I recall correctly, your objects are actually initially actors moving with physics, which you then convert into HISMC instances once they stop moving.

UE physics are not deterministic, so objects moved by physics interactions will quickly de-sync between clients and server, meaning the HISMC instances you create once the actors stop moving would end up having different transforms almost 100% of the time.