Nested material functions and shader code optimization

Answers.Archive · September 16, 2016, 2:59pm

Hi,

I noticed that encapsulating two material functions into a larger material function can obstruct shader code optimizations. In such cases, having the encapsulated material function nodes explicit in the actual material asset yields the expected optimized code.

I have tried come up with a very simple example to illustrate the issue.
Please refer to the attached images below for guidance.

Assume two hypothetical material functions, acting like a demultiplex multiplex pair:

FEncoder : receives several inputs, operates on them, and produces a single output value
FDecoder : receives a single input, and outputs multiple values

We could now explicitly place and connect both of them in a material, or we could encapsulate them within a FCodec material function and place this abstraction instead in the material.

If we use the FCodec node, as in MTestJoined, and inspect the generated HLSL code, we can notice that each of its outputs is making an invocation to the FEncoder logic (in this case, to the Custom Expression within it, but could be a standard material node network), even though the inputs did not change.

On the other hand, if we place FEncoder and FDecoder directly in the material instead, as in MTestSplit, only one invocation is made, as expected.

Sure, for such simple material functions, the resulting amount of machine instructions will be the same, since the shader optimizer can do a good job of detecting the redundancies. However, in many of our real/complex materials, the shader optimizer is unable to detect the pleonasm and we end up paying a heavy performance toll for this redundancy (especially if whatever custom expression logic being invoked contains loops and such).

Is there any way to assist the shader code generator in this matter?
(For the time being, I quickly hacked a simple variable caching mechanism to prevent redundant work, but my caching system can only handle one instance of FEncoder per material, which limits the work of the technical artists.)

Thanks in advance.

Answers.Archive · September 16, 2016, 2:59pm

Hi Marcos,

Thanks for the detailed explanation, it seems you’ve already performed a solid investigation into what’s happening here. When the translator steps into a material function it creates a temporary expression map to allow better scope control and result sharing, but when we leave the function that map is discarded. Other material pins are re-entering the function but creating a new map without realizing the code had already been translated. This leads to custom expressions incorrectly duplicating their definitions, but it sounds like your caching workaround is already attempting to handle that?

I’ve created issue UE-32897 to track this task internally, to better handle custom expression calls and avoid duplicate inclusion. Additionally we could improve some code sharing between function expression stacks. The main issue is that the code generation step is done by a translator only, it doesn’t necessarily have understanding of any of the underlying code so it’s not always trivial to remove redundancy. Keeping the graph ‘flatter’ with less recursive function calls is the best way to avoid this.

If you’re working on something more performance restricting then it might be better to create the functions in a .usf file then include that in the MaterialTemplate or Common shader files to allow use elsewhere. This will give you the greatest control over the final code and prevent the translator from generating multiple copies of the actual functionality, aiding the compiler.

Thanks,
Chris

Answers.Archive · September 16, 2016, 2:59pm

Hi Chris, thank you for the feedback.

I appreciate you opening an internal ticket to track this problem.
If you permit me “ranting” a little bit, I believe this issue should receive some high priority.
Without it, the usefulness of a node-based shader/material editor for non-trivial materials is hampered; it also obstructs all the effort put into better shader code generation around the material pins introduced in UE4.11.

Moving everything to .usf files is out of question, since most material artists are not trained on shader programming, and the few instances they dabbled into it, the long term impact/burden (maintenance, performance, etc.) on us in the programming team was overwhelming. A group of, say, 10 effect artist can easily work on several dozen different materials each, and it would be impossible for a small handful of graphics software engineers to keep up with that pace, while also being productive on other graphics improvements, techniques and tools.

My caching mechanism has some flaws, since it uses some preprocessor code generation tricks around a specially crafted Custom node to cache the results. It can only cache a single instance of a given material function per material. If the material author wishes to use, say, two distance field ray-marching nodes in the same material, it won’t give the expected results. I also messed around the Material Function logic, but I found it difficult to modify since it is full of indirections and does not have much useful documentation around it. My time budget to work around the issue expired and I had to resort to the custom-node caching scheme.

Cheers!

Answers.Archive · September 16, 2016, 2:59pm

Would it be possible to add UE-32897 to the public issues tracker?

Answers.Archive · September 16, 2016, 2:59pm

Hi Steven,

This has been requested on the bug so should appear soon. There’s no additional information on the JIRA ticket, it’s more a link back to this page to address later on but I understand it’s helpful to have more visibility.

The bug should appear here when processed: Unreal Engine Issues and Bug Tracker (UE-32897)

Thanks,
Chris