[Performance] Inaccurate Result on "MultiHeadAttention" with Cutlass_fmha Kernel Compared to PyTorch
The native FP16 implementation of the Fused MultiHeadAttention (FMHA) operator with onnxruntime exhibits numerical divergence compared to the equivalent PyTorch FP16 implementation, even when running ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results