Given the following shader:
struct Contents {
int X;
};
RWStructuredBuffer<Contents> Values;
[numthreads(4, 1, 1)]
void main(uint3 TID : SV_DispatchThreadID) {
uint Sum = 0;
switch (Values[TID.x].X) {
case 0:
Sum += WaveActiveSum(1);
default:
Sum += WaveActiveSum(10);
break;
}
Values[TID.x].X = Sum;
}
If the buffer Values
refers to is initialized to [ 0, 0, 1, 2 ]
, what is the
buffer’s value when the shader completes?
A. [ 42, 42, 40, 40 ]
B. [ 22, 22, 20, 20 ]
C. [ 22, 22, 10, 10 ]
D. Undefined!
Answer!
Trick question!
On DirectX, this is intended to be well-defined to A
, but the specification
language is unclear. The
documentation
states:
These intrinsics are dependent on active lanes and therefore flow control. In the model of this document, implementations must enforce that the number of active lanes exactly corresponds to the programmer’s view of flow control.
There are bugs in drivers that cause this to not always be the case is it was not rigorously tested in the HLK tests.
In SPIR-V the OpSwitch
instruction’s convergence behavior on switch fall
through cases is undefined, which would make this code undefined if it lowers
to SPIR-V’s OpSwitch
.
The HLSL team is tracking bugs on both DXC and Clang to avoid the use of
OpSwitch
:
Similarly the Slang compiler is tracking this issue as well.
A second example that becomes even more problematic is something like:
struct Contents {
int X;
};
RWStructuredBuffer<Contents> Values;
groupshared int Reduction;
[numthreads(4, 1, 1)]
void main(uint3 TID : SV_DispatchThreadID) {
if (WaveIsFirstLane())
Reduction = 0;
switch (Values[TID.x].X) {
case 0:
Reduction += WaveActiveSum(1);
default:
Reduction += WaveActiveSum(Reduction);
GroupMemoryBarrierWithGroupSync();
break;
}
Values[TID.x].X = Reduction;
}
In this case under SPIRV, even though all threads enter the default
label,
control flow is not guaranteed to be uniform. This means that the group
barrier’s behavior is undefined and may cause the shader to deadlock or
terminate unexpectedly.