Loading Data TO/FROM Structured Buffer (Compute Shaders)

Note: This post requires some knowledge about RHI/RDG and compute shaders in Unreal Engine.

I am trying to create a compute shader which has as part of its parameters a set of structured buffers, but I do not know how to access the memory of those buffers.

For the current time being, there are two input structured buffers and three output structured buffers.
I wish to fill the input buffers with some data (it being static const data or some generated data) as well as retrieve the data from the output buffers (and ultimately have it in a TArray variable).

I have looked at the following resources:

https://forums.unrealengine.com/development-discussion/c-gameplay-programming/1751313-reading-a-frhistructuredbuffer-on-the-cpu

https://forums.unrealengine.com/development-discussion/rendering/1725311-writing-data-to-rdg-structured-buffer-general-rdg-questions

https://forums.unrealengine.com/development-discussion/rendering/19556-get-data-back-from-compute-shader

The general consesus across these is to use RHILockStructuredBuffer, however there are some issues with this:

  1. It blocks the thread on which it runs which means that it is more difficult to do async copying. One of the resources proposed a solution by extending a class, but this does not seem very elegant and I hope there is some way to achieve this via the Unreal Engine codebase itself, rather than extending it.

  2. Using this function does not seem very RDG-friendly. In the powerpoint slides they suggest to avoid writing boilerplate code and (for example) popose to add a pass via FComputeShaderUtils::AddPass(). However, to use the Lock/UnlockStructuredBuffer functions, you need to use them within a lambda that is passed as argument to GraphBuilder.AddPass(), making it an awkward situation.

I found about GraphBuilder.QueueBufferExtraction()*, but all it does (to my knowledge), is allow you to keep a buffer which was initially allocated just for the pass (that is, once the pass is over, the buffer would disappear - please correct me if I’m wrong) by giving you a reference to a pooled buffer.

My issue with this is:

  1. I cannot find a way to get to the underlying data to even issue a standard FMemory::MemCpy().
  2. This is just for retrieving the data. What about storing data in an input buffer? After all, I am dealing with FRDGBufferRef for the input shader, whereas for the QueueBufferExtraction() function returns TRefCountPtr of FPooledRDGBuffer.

I’m getting rather confused around all this and I seek help from others. All suggestions are welcome!

Below I have provided source of calling the shader.

	ENQUEUE_RENDER_COMMAND(MarchingCubesShader)
	([&](FRHICommandListImmediate& RHICmdList)
	{
		//Render Thread Assertion
		check(IsInRenderingThread());
		int VoxelCount = ScheduleParams.Width * ScheduleParams.Height * ScheduleParams.Depth;
		int MaxVertexCount = VoxelCount * 15; // max 5 triangles per voxel -> max 15 vertices

		FRDGBuilder GraphBuilder(RHICmdList);

		FRDGBufferDesc ConstCubeEdgeFlagsDesc = FRDGBufferDesc::CreateStructuredDesc(sizeof(int), 256);
		FRDGBufferDesc ConstTriangleConnectionTableDesc = FRDGBufferDesc::CreateStructuredDesc(sizeof(int), 256 * 16);
		FRDGBufferDesc InputDensityDataDesc = FRDGBufferDesc::CreateStructuredDesc(sizeof(float), VoxelCount);
		FRDGBufferDesc OutputVertexPositionsDesc = FRDGBufferDesc::CreateStructuredDesc(sizeof(FVector), MaxVertexCount);
		FRDGBufferDesc OutputVertexNormalsDesc = FRDGBufferDesc::CreateStructuredDesc(sizeof(FVector), MaxVertexCount);
		FRDGBufferDesc OutputTriangleIndicesDesc = FRDGBufferDesc::CreateStructuredDesc(sizeof(int), MaxVertexCount);

		FRDGBufferRef ConstCubeEdgeFlagsBuffer = GraphBuilder.CreateBuffer(ConstCubeEdgeFlagsDesc, TEXT("SB_EdgeFlags"));
		FRDGBufferRef ConstTriangleConnectionTableBuffer = GraphBuilder.CreateBuffer(ConstTriangleConnectionTableDesc, TEXT("SB_TriangleConnectionTable"));
		FRDGBufferRef InputDensityDataBuffer = GraphBuilder.CreateBuffer(InputDensityDataDesc, TEXT("SB_DensityData"));
		FRDGBufferRef OutputVertexPositionsBuffer = GraphBuilder.CreateBuffer(OutputVertexPositionsDesc, TEXT("SB_VertexPositions"));
		FRDGBufferRef OutputVertexNormalsBuffer = GraphBuilder.CreateBuffer(OutputVertexNormalsDesc, TEXT("SB_VertexNormals"));
		FRDGBufferRef OutputTriangleIndicesBuffer = GraphBuilder.CreateBuffer(OutputTriangleIndicesDesc, TEXT("SB_TriangleIndices"));

		FRDGBufferUAVRef ConstCubeEdgeFlagsUAVRef = GraphBuilder.CreateUAV(ConstCubeEdgeFlagsBuffer);
		FRDGBufferUAVRef ConstTriangleConnectionTableUAVRef = GraphBuilder.CreateUAV(ConstTriangleConnectionTableBuffer);
		FRDGBufferUAVRef InputDensityDataUAVRef = GraphBuilder.CreateUAV(InputDensityDataBuffer);
		FRDGBufferUAVRef OutputVertexPositionsUAVRef = GraphBuilder.CreateUAV(OutputVertexPositionsBuffer);
		FRDGBufferUAVRef OutputVertexNormalsUAVRef = GraphBuilder.CreateUAV(OutputVertexNormalsBuffer);
		FRDGBufferUAVRef OutputTriangleIndicesUAVRef = GraphBuilder.CreateUAV(OutputTriangleIndicesBuffer);

		FMarchingCubesCS::FParameters* MarchingCubesCSParameters = GraphBuilder.AllocParameters<FMarchingCubesCS::FParameters>();
		MarchingCubesCSParameters->Width = ScheduleParams.Width;
		MarchingCubesCSParameters->Height = ScheduleParams.Height;
		MarchingCubesCSParameters->Depth = ScheduleParams.Depth;
		MarchingCubesCSParameters->VoxelScale = ScheduleParams.VoxelScale;
		MarchingCubesCSParameters->CubeEdgeFlags = ConstCubeEdgeFlagsUAVRef;
		MarchingCubesCSParameters->TriangleConnectionTable = ConstTriangleConnectionTableUAVRef;
		MarchingCubesCSParameters->DensityData = InputDensityDataUAVRef;
		MarchingCubesCSParameters->OutputVertexPositionsBuffer = OutputVertexPositionsUAVRef;
		MarchingCubesCSParameters->OutputVertexNormalsBuffer = OutputVertexNormalsUAVRef;
		MarchingCubesCSParameters->OutputTriangleIndicesBuffer = OutputTriangleIndicesUAVRef;
    
		// Get a reference to our shader type from global shader map
		TShaderMapRef<FMarchingCubesCS> MarchingCubesCSRef(GetGlobalShaderMap(GMaxRHIFeatureLevel));

		// Compute the Group dimensions used for dispatching
		FIntVector MarchingCubesCSGroupCount = FComputeShaderUtils::GetGroupCount(NUM_THREADS_PER_GROUP_XYZ,
			FIntVector(ScheduleParams.Width, ScheduleParams.Height, ScheduleParams.Depth));

		ValidateShaderParameters(MarchingCubesCSRef, *MarchingCubesCSParameters);
		
		FComputeShaderUtils::AddPass(GraphBuilder, RDG_EVENT_NAME("Marching Cubes Test"), MarchingCubesCSRef, MarchingCubesCSParameters, MarchingCubesCSGroupCount);
		GraphBuilder.QueueBufferExtraction(OutputVertexPositionsBuffer, OutputData.VertexPositionsBuffer, FRDGResourceState::EAccess::Read, FRDGResourceState::EPipeline::Compute);
		GraphBuilder.QueueBufferExtraction(OutputVertexNormalsBuffer, OutputData.VertexNormalsBuffer, FRDGResourceState::EAccess::Read, FRDGResourceState::EPipeline::Compute);
		GraphBuilder.QueueBufferExtraction(OutputTriangleIndicesBuffer, OutputData.TriangleIndicesBuffer, FRDGResourceState::EAccess::Read, FRDGResourceState::EPipeline::Compute);
		
		// final step
		GraphBuilder.Execute();
	});

P.S. I hope UE 5 will have a better RDG structure and documentation to make compute shaders easier to work with. I beg you Epic!!

Edit: I probably need to post some more source, let me know if you need clarification!

1 Like

I’ve finally solved this, took me a while to get there with having to dive through the source code but:

You have to use pooled GPU buffers e.g TRefCountPtr<FPooledRDGBuffer> pooled_verticies;

To set input/output structured buffers with initial data use the macro SHADER_PARAMETER_RDG_BUFFER_UAV(RWStructuredBuffer<FVector>, Verticies)

For input only use SRV SHADER_PARAMETER_RDG_BUFFER_SRV(StructuredBuffer<uint32>, Triangles)

then to load the initial data in the render thread use

uint32 size = parameters.verticies.Num();        // parameters.verticies type is TArray<FVector> verticies
if (size > 0)
{
    verticies_buff = CreateStructuredBuffer(
                                     graph_builder,
                                     TEXT("NormalsCS_Verticies"),
                                     sizeof(FVector),
                                     size,
                                     parameters.verticies.GetData(),
                                     sizeof(FVector) * size,
                                     ERDGInitialDataFlags::None);
    verticies_uav = graph_builder.CreateUAV(verticies_buff, PF_R32_UINT);
}

Use an SRV above for input only buffers

Then set your pass parameter e.g pass_parameters->Verticies = verticies_uav;

Add your pass then queue the buffer extraction for output buffers with

graph_builder.QueueBufferExtraction(
                               verticies_buff,
                               &pooled_verticies,
                               FRDGResourceState::EAccess::Read, 
                               FRDGResourceState::EPipeline::Compute);

Note: For UE4.26 use ERHIAccess::CPURead in place of FRDGResourceState::EAccess::Read and Compute

Execute your graph, then copy the buffer back out from the pooled GPU buffer, I created a static method for this although ideally the copy should be done in a pass, see the CreateStructuredBuffer method in the source.

Note: For UE4.26 FPooledRDGBuffer has been renamed to FRDGPooledBuffer, also use ->GetStructuredBufferRHI() in place of ->StructuredBuffer

    FComputeShader::CopyBuffer(RHICmdList, pooled_verticies, parameters.verticies.GetData(), sizeof(FVector) * parameters.verticies.Num());


    // definition:
    // Copies an FPooledRDGBuffer
    static void CopyBuffer(FRHICommandListImmediate &RHICmdList, TRefCountPtr<FPooledRDGBuffer> &source, void *dest, SIZE_T size)
    {
        void *psource = RHICmdList.LockStructuredBuffer(source->StructuredBuffer, 0, size, RLM_ReadOnly);
        FMemory::Memcpy(dest, psource, size);
        RHICmdList.UnlockStructuredBuffer(source->StructuredBuffer);
    }

Pooled buffers stay in the GPU until you release them when no longer needed, you can access them after the first pass creates them with something like

FRDGBufferRef verticies_buff = graph_builder.RegisterExternalBuffer(pooled_verticies, TEXT("NormalsCS_Verticies"));

Hope this helps :wink: although the buffers have to be locked because of the CPU-GPU boundary.

3 Likes

In UE5 (and possibly earlier versions) search for examples FRHIGPUBufferReadback, which will allow you to copy buffer contents in a later frame after you queue up a copy pass. Note that as of this date, this only worked with buffers with descriptors crated through this function: FRDGBufferDesc::CreateBufferDesc

Thanks for the useful knowledge share! I’m trying to make this work on macOS (tested both with 4.27 and 5.0.3 but the output buffer seems unprocessed, do you have any suggestion?