Attachment Replication Questions

Hello,

I’ve been debugging some issues I’ve been seeing with replicating attachments. This is my first deep dive into the replication code, and I have a handful of questions about things that, at least at first glance, look like possible bugs. I am using Unreal 4.14, but I did do a cursory glance at some of the associated code in 4.15 and it looks like it hasn’t changed in these areas.

1 - There seems to be a race condition when spawning an actor, then attaching something to that actor, and having that replicated. In particular, I’m noticing an offset (which looks like the original Relative Location of the object before attaching) which I don’t see if I add a delay between spawning the actor and attaching something to it. One thing I have noticed (although I don’t have any evidence that it’s causing my particular issue) is that it looks like OnRep_AttachParent and OnRep_AttachSocketName usually (or at least with the delay) get called back-to-back, and then PostRepNotifies is called, which handles the actual attaching/detaching. However, when triggering a spawn and then attaching to the spawned actor, it looks like the SocketName can replicate first, followed by the actor spawn, and then the actual attachment. When it finally gets around to replicating the attachment, the flag “bNetUpdateAttachment” is updated from within a call to CallRepNotifies, but not in a flow that calls PostRepNotifies afterward. Should PostRepNotifies actually be called every time CallRepNotifies is called, perhaps at the end of that method? It seems like this could cause strange behavior?

2 - I tried moving PostRepNotifies into the end of CallRepNotifies just in case, but the offset was still present. This looks related to KeepRelativeTransform always being used in PostRepNotifies–is there a reason why that’s always the attachment rule on the client? It does seem to work in other cases (although I haven’t dug into why), but in this case it causes an expected offset.

3 - Inside of PostRepNotifies, AttachToComponent is called to update an attachment, even for when the parent is null. If a parent passed into AttachToComponent is null, the method bails before updating the AttachParent and AttachSocketName. This can cause the client to have an incorrect AttachParent/AttachSocketName in the case of detachment. Should a detach method be called in certain replication cases instead of always calling AttachToComponent?

4 - I noticed that AttachChildren in USceneComponent is set to replicated. Is it possible that this data could be replicated to the client before a component has a chance to replicate a detachment? This looks like it could trigger an ensure inside of DetachComponent (which gets called by AttachToComponent when switching to a non-null parent) saying that it’s detaching from a parent that has no record of the component being a child.

Thanks,

Hello,

So in order to dive in and investigate these issues, it would be great if you could provide a functional example for each as well as a way for us to reproduce it so I can relay that to the developers.

You can either provide a set of repro steps, or create test projects and I’ll be glad to take a look.

Hey Sean,

Thanks for the response! Some of the situations look problematic but I haven’t verified that all of them cause issues. I can probably isolate the specific problems I’m seeing to a test project that won’t violate any of my job confidentiality, and I can definitely see the value in providing that.

However, perhaps even more important than solving the actual problems I’m seeing, is understanding the reasoning behind the particular pieces of code that I’ve mentioned. (This way I’m better able to diagnose these issues in the future and have to bother you less.) I’ve tried to be very specific in my examples so that a developer familiar with that part of the code-base should be able to provide more insight into why it’s written that way. Some of these issues are also race conditions, and there is a chance (although I think it is probably tiny in this case) that the timings on your machine may be different and might not reliably repro.

This is all a very long way of asking if there’s any way that I could have more clarity about the questions I’ve asked before we focus on handing off a test project? (I understand if this isn’t feasible, or just isn’t possible for how you have to triage issues on your end.)

Thanks,

Hey Sean,

I’ve setup a Test Project with several examples which can be downloaded here (Note: I used 4.15): https://www.dropbox.com/sh/hw3e6esfg7fl2jx/AADIaJ3zbeGGuVrA8q452ho2a?dl=0

To reproduce, I’m just playing with “New Editor Window (PIE)” with “Number of Players” being 2 and then looking at each example I’ve set up. Everything on the server should appear fine but not on the client.

There are two examples for the offset issue I mentioned in Questions #1 and #2: one where the mesh is the root component and is being attached to a spawned actor, and another example where the mesh is placed, but the mesh is not the root component. (The latter might not not be the same issue, but it looked plausible.) The first example is at the left side of the “example wall” and the second is behind the player spawn point.

There is one example for Question #3 found in the middle of the wall where I am attaching/detaching an actor on a timer. Here you should see that the parent is wrong on the client whenever it is detached (ie: it shows that it still has a parent).

Finally, the last example, found at the end of the wall, is related to question #4, which should cause an ensure to be triggered, usually fairly soon after startup. I suspect this is a race condition, so actual time may vary slightly. (This may cause the editor to hang for a few seconds while handling the ensure if you don’t launch from Visual Studio.) I think what I’ve mentioned in that question may be the cause, but it’s just a theory.

It is a C++ project, but all of what I’ve set up is in Blueprint (most if it being in the Level Blueprint with a little bit in the two Cube Blueprints I’ve provided). I made it a C++ project just in case I would not be able to repro the issues in Blueprint, but that turned out not to be the case. Again, everything on the server should appear fine, but not on the client.

Thanks,

I think I’ve isolated the cause of the offset issue for the first example (where I’m attaching an actor with a mesh root component to a spawned actor). I haven’t verified if it’s related to the offset in the second example yet; I’m still investigating that case.

Anyway, the shadow buffer for replication gets created after the attachment is set up. Because the attachment sets the relative location to (0,0,0) and the archetype that is used to populate the shadow buffer also has a relative location of (0,0,0), no delta is detected and that position isn’t sent to the client. When the client receives the message to spawn the actor on the client, it sets the relative location of the static mesh to the Actor’s world location (since it’s the root component). The AttachParent still gets through because that is a delta to the archetype, resulting in an attachment with an offset on the client.

I’m still a bit confused why I seem to be the only one hitting this; it definitely seems like an issue that other people would have seen by now. I’m holding out hope that I just need to set up the content differently, but I’m starting to lose confidence in that. :frowning:

Sorry for the delay. I took a look at the project you’ve provided, and I noticed the differences between the server and client, but I’m not entirely sure this is a bug. Unfortunately we don’t currently have the resources to allocate to debugging the project you’ve provided.

What you can do if you’d like further investigation is to isolate each issue that you’re reporting and provide a detailed set of repro steps for each (you can do this in a separate text file and attach that if you’d like just for the sake of organization).

With the way we triage issues, we’ll need to be able to reproduce each issue locally in a clean project, so if you are able to do so feel free to leave a comment to reopen the thread with the steps and I’ll be more than happy to have another look.

Thank you

Hey Sean,

Would you be able to explain why you don’t feel these are bugs, and more specifically, which of these issues you don’t feel are bugs? I’m totally open to alternative ways to set up content, as long as it’s clear why those ways avoid some kind of pitfall in the engine that my current set up is exposing. :slight_smile:

The idea is that the Blueprints here are the repro steps. In the level blueprint, you’ll notice that each example is cleanly organized into a repro case. (The only one not found in Blueprint is the example that is not located on the “example wall”, which is repro-able just based on the hierarchy of the object.) The other actor types are largely just there for debug (printing debug for current parent, or rotating the actor to make parenting obvious) or standard setup (disabling physics on the client copy of the object so that it only simulates on the server).

Attached is a pdf that explains each of the Blueprinted repro cases in the Test Project. (Download that here) Additionally, I’ve also included my notes for the C++ side of things just in case that’s useful! (Download that here)

Thanks,

Sorry for the delay. I’ve gotten some time to look into this, and I’ll respond as follows:

1: This does seem like it could be a bug, and as a result I’ve entered a ticket: Unreal Engine Issues and Bug Tracker (UE-43355)

2: I don’t believe that this is a bug. Changing a setting such as simulate physics on just a client doesn’t seem like it should work, and I think this is why you’re seeing an issue. Remember that having Simulate Physics on detaches the component from its parent, so whenever you’re disabling physics it will try to reattach to the parent, which could definitely cause the offset you’re seeing. However, I don’t think this is intended workflow like you stated in your attached .pdf.

3: I don’t believe that this is a bug either. Function calls will not replicate by default, you’ll need to call the function on the server as well as on the client.

4: I wouldn’t consider this a recommended workflow. Typically, if I’m looking to attach something to a different parent, I’ll detach it from the original first to prevent these sort of issues. Do you see the same behavior if you do this?

Let me know if you have further questions regarding these or if you think I’m misunderstanding any of the issues you’ve listed.

Thank you for your report.

Hey Sean,

Thanks for your response!

1: Awesome. :slight_smile:

2: I think we agree that this case isn’t ideal workflow, but I don’t agree that the call to client side SetSimulatePhysics is the culprit here. This can be verified by taking that node out. (I was able to fix this case with a handful of changes to some of the issues I’ve alluded to in the doc, but we’re not actively pursuing it since the setup does seem strange) That being said, the SetSimulatePhysics node was a mistake, and if it’s helpful, I can re-upload all of the examples with those stripped out.

3: Yes, function calls will not replicate by default. But, attachments are set up to replicate by default. This can be seen in the header file for SceneComponent.h where AttachParent and AttachChildren are set up to replicate:

UPROPERTY(ReplicatedUsing = OnRep_AttachParent)
USceneComponent* AttachParent;

/** List of child SceneComponents that are attached to us. */

UPROPERTY(ReplicatedUsing = OnRep_AttachChildren, Transient)
TArray<USceneComponent*> AttachChildren;

And then in USceneComponent::PostRepNotifies() you can clearly see that the client also runs AttachToComponent if it detects that OnRep_AttachParent was run. (Even with out re-running the attach logic on the client, the attachment should replicate since the AttachParent and AttachChildren are both setup to replicate to the client)

4: Attaching to a component while something is already attached to an existing component is clearly supported in the code flow. AttachToComponent is clearly trying to detach from any existing parent. This makes sense as requiring people to check for an existing parent any time they want to attach might be a bit annoying. The issue here is almost definitely a race condition with how AttachChildren is being replicated for the parent.

Thanks for your help, Sean! I know you guys are probably very busy.

No problem, glad to help.

As far as 2 goes, I do not believe that this is a bug. I wasn’t able to reproduce it cleanly with a new asset until I added the SetSimulatePhysics call and if the object was not simulating to begin with on my end.

3: So here’s how I’m looking at this issue. For attachment, you can replicate it because you’re attaching to (hopefully) a valid reference to an object and then that reference gets replicated through rep notify and the client can attach to that same object.
For detaching an object, let’s look at it like this: the replicated variable for the attachment is now being detached and set to nullptr. The variable is replicated out to the clients but the replicated variable is now nullptr. So on the server you’re detaching from the parent, but then that parent variable is replicated out as a null reference, which would lead to some issues. This is why the detachment is not replicated by default like the attachment is, at least how I look at it.

4: I did some more digging and found that the ensure you’re reporting seems to exist in our database: Unreal Engine Issues and Bug Tracker (UE-40244)
Let me know what you think regarding whether or not that’s the same issue.

Thanks again for all of the information.

Hey Sean,

Thanks for the quick response! Splitting this up into two comments because my text is too long, heh.

I’ve uploaded a new zip to Dropbox that has the client-side SetSimulatePhysics Blueprint nodes stripped out for all examples (but these shouldn’t have been causing any difference in behavior), and I also simplified the repro case for the first bug. All issues still repro.

So just to clarify about #2: the example project does show that the issue still repros without the SetSimulatePhysics node, so I’m not sure what you’re doing differently. Looking in FBodyInstance::SetInstanceSimulatePhysics (which PrimitiveComponent calls from SetSimulatePhysics), it looks like detachment only occurs when setting Simulate Physics to true, not false (which is what the project was doing), which makes sense. So I’m wondering if maybe you’re example content has the checkbox checked?

Regardless, the Simulate Physics checkbox on the actual component can cause a detach on BeginPlay for a PrimitiveComponent (and does in this case), and will do so independently on the client and server. Even though this doesn’t happen without the checkbox being checked, this is still a bug. A detachment shouldn’t cause a divergence in transform on the client and server, especially when attachments are set up to replicate by default (unless of course it was only happening on the client which I think was your original assumption).

3: Yes, that is basically the flow I’ve outlined in my doc. Specifically, the issue that this leads to is that the replicated nullptr is fed into AttachToComponent on the client and this won’t cause the detachment on the client. Because of the way the client swaps NetOldAttachParent and AttachParent before running the AttachToComponent call (I’m assuming to let the client try and run the operation locally), the client ends up with the old parent. This is definitely a bug; it would be very strange to support only replicating attachment correctly and not detachment.

4: So that is the same ensure, and it’s possible the ensure is happening for the same reason, but this issue doesn’t have anything to do with physics. It’s specifically attaching something that already has a parent to a different parent, and then that result is sometimes replicating in a certain order to the client. When the ensure is triggered, it means that the updated AttachChildren from the first parent has already been replicated to not have the child anymore. When the client tries to attach the object to the new parent after that has already been replicated, it will try to detach from the existing parent (which it only does if it is attaching to a non-null new parent), it checks to make sure that the current parent has record of the child. But it doesn’t, because the updated AttachChildren were replicated first, causing the ensure.

2: Unreal Engine Issues and Bug Tracker (UE-43494)
I’ve entered a ticket for further investigation on this issue.

3: I see this in the following way: How can the client, who just got an updated reference of nullptr from the server, detach anything with it? It’s null, and trying to reference it will cause the entire program to crash. It is true that the attachment is tied to a OnRep call but there is nothing that would suggest it also has to be wrapped into the detach call, especially with the conundrum explained above.

4: I believe that the ensure is the same as shown in the ticket I linked earlier. It’s possible that there are multiple ways to trigger the ensure though, so I’ll make a note of that.

Have a great day

2: Awesome! Thankyou for doing that.

3: The problem is that the client isn’t keeping it null. The client is restoring it to the original non-null value, which is why it’s incorrect on the client.

In USceneComponent::PreNetReceive the old (in this case, non-null) AttachParent is being cached into NetOldAttachParent. In USceneComponent::PostRepNotifies, AttachParent and NetOldAttachParent are exchanged before then running AttachToComponent on NetOldAttachParent.

After the exchange, and before the AttachToComponent, this means that AttachParent is its original, pre-replicated, value (which in this case is not null), and NetOldAttachParent now has the newly replicated value. In the case of a non-null parent being replicated, the value of NetOldAttachParent goes through AttachToComponent, causing AttachParent to be overwritten, which means it then matches the host. BUT, when it’s null, AttachToComponent will basically just ignore it, which means AttachParent will stay the original, pre-replicated value (which in this case, is non-null).

So all that’s probably needed is just a null check: if the replicated value is null, do a detach instead of an attach.

4: It is the same ensure, but it might not be the same issue. The two repros are different enough that I think two bugs that are linked might be better, but that works too! :slight_smile:

Thanks for the clarification. Here’s the ticket for that issue:

Thank you for your time and for providing a test project showcasing the issues.

Have a great day

Quote rcarper-defiant | (where I’m attaching an actor with a mesh root component to a spawned actor).

Can confirm this on 4.17 as well. I upvoted the bug ticket.