Propagate binding correlation information to LLVM codegen for mutable binding fusion#23184
Propagate binding correlation information to LLVM codegen for mutable binding fusion#23184kimm240 wants to merge 21 commits intoiree-org:mainfrom
Conversation
krzysz00
left a comment
There was a problem hiding this comment.
Is this going to be stacked on top of another PR that implements these TODOs? Otherwise, I figure gesturing at a feature that doesn't quite exist doesn't really make sense
| if (auto correlationAttr = funcOp.getArgAttrOfType<ArrayAttr>( | ||
| binding, "stream.binding_correlation")) { | ||
| SmallVector<int64_t> correlatedBindings; | ||
| for (auto attr : correlationAttr) { |
There was a problem hiding this comment.
Can we spell out the types here?
There was a problem hiding this comment.
... more importantly
- Can we turn this as corralatedBindings.getAsValueRange
- If you're worried about the types, you can add a dialect attribute verifier to
streamtAt'll check the format of these attributes
| if (correlationIt != bindingCorrelationMap.end() && !correlationIt->second.empty()) { | ||
| // This binding is in a correlation group. The bindings in the same | ||
| // group don't alias each other (different offsets). | ||
| // TODO: In the future, we could use alias.scope metadata to be more |
There was a problem hiding this comment.
Ah, this is where the code will go for annotating the different subspans of a binding as noasias etc.
Some of that would be made simpler by something like https://discourse.llvm.org/t/n-ary-separate-objects-non-argument-noalias-groups/88420 landing.
I can also flag #22236 as an interesting related piece of work
There was a problem hiding this comment.
Thanks for the guidance! I'm very grateful for the high-level context and the links you shared.
Regarding the TODOs: I agree that gesturing at features that don't yet exist can be confusing. I've removed those comments in commit faf30dd.
However, seeing the discussions in the LLVM community you shared, I’m open to exploring the implementation of that.
MaheshRavishankar
left a comment
There was a problem hiding this comment.
Not reviewed the code yet, but I think for the code that lowers to LLVM GPU, we should be OK just adding noalias to all operands. The fact that they are the same bindings shouldnt matter, cause the generated LLVM function takes a different pointer for each argument. One or more pointers could have been set by using the same base but different offsets, but LLVM shouldnt care about this.
|
@MaheshRavishankar If we can guarantee that IREE's high-level flow/stream always produces non-overlapping subspans for separate bindings, then simply applying noalias to all operands would be simpler. |
Yes. It is guaranteed. |
|
Two subspans into the same binding may overlap, distinct bindings do not |
9ff7ddc to
207ad2e
Compare
|
I completely agree that applying a blanket noalias to all pointer arguments is a simpler approach and works effectively under IREE's current guarantee that distinct bindings never overlap. However, I have preserved the correlation tracking logic in this PR to prepare for the following future scenarios: As discussed in the LLVM forum thread, the next step for maximizing GPU performance is moving beyond function attributes to granular While distinct bindings don't overlap, subspans within the same binding might. The correlation logic is the only way to distinguish these cases. By keeping this infrastructure, we can safely optimize instructions even when multiple subspans are fused into a single binding argument. Relying on a global "no-overlap" guarantee is fine for now, but explicit correlation makes the codegen much more resilient to future changes in resource management or binding fusion strategies. I’d appreciate it if we could keep this logic as a foundation for upcoming metadata work. |
Are you sure? Two subspans into the same binding I think do not overlap. Thats how the combine bindings in Stream works. @benvanik could you confirm that in a dispatch the binding subspans do not overlap. |
I remember debugging an elementwise dispatch that had two inputs coming from the same buffer at different base offsets. |
|
Ok, maybe I am getting confused. I haven't looked at in a while. The subspans are independent, that is guaranteed. Whether the offset is added to the base inside the kernel or outside. I don't remember now. Would be good to check |
|
... the subspan condition is actually slightly weaker, per a bunch of previous discussions I had with @benvanik on the So multiple read-only subspans aliasing each other is allowed, though one could probably get away with |
Well, then if there is a way to say the reads dont alias with the writes, thats enough I guess. I am not blocking on this PR... but the logic seems more complicated, and I thought we could just use some existing gaurantees. I am not reviewing this actively. If @kuhar and @krzysz00 are fine with this, it works for me. |
|
Hi all! Regarding the previous discussion, as @kuhar and @krzysz00 mentioned, read-only subspans can indeed overlap, so I believe the current correlation tracking logic is necessary for safety and for the future alias.scope metadata work. If you think any further refinements or additional implementations are needed to better align this with IREE's architecture, please let me know. I’m more than happy to follow up and implement them. |
kuhar
left a comment
There was a problem hiding this comment.
Can we add some tests for this?
|
@kuhar Please Give me any feedback if there is an issue in test. |
| int64_t binding = subspan.getBinding().getSExtValue(); | ||
| // Try to read correlation information from the original function. | ||
| // Note: This assumes the correlation attribute was preserved through | ||
| // Stream → HAL conversion. If not, we'll need to preserve it explicitly. |
There was a problem hiding this comment.
nit: stick to ascii
| // Stream → HAL conversion. If not, we'll need to preserve it explicitly. | |
| // Stream -> HAL conversion. If not, we'll need to preserve it explicitly. |
| // After ConvertToNVVMPass: llvm.noalias attributes should be applied. | ||
| // LLVM-LABEL: llvm.func @dispatch | ||
| // LLVM-SAME: (%[[LLVM_ARG0:.+]]: !llvm.ptr {llvm.align = 16 : i32, llvm.noalias, llvm.nonnull, llvm.noundef}, | ||
| // LLVM-SAME: %[[LLVM_ARG1:.+]]: !llvm.ptr {llvm.align = 16 : i32, llvm.noalias, llvm.nonnull, llvm.noundef} |
There was a problem hiding this comment.
I delete it for clarity.
| // 2. LLVM dialect: After ConvertToNVVMPass, llvm.noalias attributes are | ||
| // applied to function arguments |
|
Hi @kuhar @krzysz00, just a gentle ping on this PR. |
|
Can you rebase this? We plan to have a release in a couple of days, I'd hold off with merging this until it's out. |
|
Hi @kuhar, thank you for the update. I have already rebased this PR onto the latest main. I'm happy to wait until the upcoming release is out. Thanks again for the review! |
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
…tchBindings.cpp Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
…tchBindings.cpp Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
|
Hi @kuhar! As requested, I've rebased the PR onto the latest main branch again. Please let me know when you have a chance to take a look. Thanks! |
|
@kuhar @krzysz00 @bangtianliu If this is confirmed as a correctness bug, we likely need to decide whether this is:
I can’t reliably reproduce this exact test locally in my current environment, so I can’t confidently conclude which direction is correct yet. Guidance on the preferred direction would be greatly appreciated. |
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
|
Change FfileCheck for this pr |
Problem
When
--iree-stream-resource-alias-mutable-bindings=trueis enabled, theFuseDispatchBindingspass fuses multiple bindings that refer to the same underlying storage. However, the correlation information about which bindings alias or don't alias was not propagated to LLVM code generation, causing LLVM's alias analysis to be overly conservative. This prevents optimizations like vectorization, especially in cases with multiple in-place operands (e.g., thesort_3dtest case with two in-place operands of different types).This addresses issue #7737.
Solution
We propagate binding correlation information from the Stream dialect through to LLVM code generation by:
Storing correlation attributes in Stream dialect: The
FuseDispatchBindingspass now attachesstream.binding_correlationandstream.binding_noaliasattributes to function arguments, indicating which bindings are in the same correlation group (same resource, different offsets) or point to different resources.Reading attributes in LLVM codegen: The
ConvertFuncclass inLLVMGPU/ConvertToLLVM.cppreads these attributes and applies LLVMnoaliasattributes to function arguments, enabling LLVM's alias analysis to make better optimization decisions.Implementation Details
FuseDispatchBindings Pass (
FuseDispatchBindings.cpp)In
updateExecutableSignature, we now generate two types of attributes:stream.binding_correlation: Indicates bindings in the same correlation group (same resource, different offsets - they don't alias each other)stream.binding_noalias: Indicates bindings pointing to different resources (guaranteed not to alias)LLVMGPU ConvertToLLVM (
LLVMGPU/ConvertToLLVM.cpp)The
ConvertFuncclass reads the correlation attributes from the original function and applies LLVMnoaliasattributes:Verification
The implementation has been verified with the
sort_3dtest case:IR was made by
../iree-build/tools/iree-compile --iree-hal-target-device=cuda --iree-cuda-target=sm_86 --mlir-print-ir-after-all test.mlir -o test.vmfb 2> ir_dump.log.test.mliris it. which is@sort_3dtest indispatch_linalg_on_tensors.mlir:Stream dialect:
stream.binding_noaliasattributes are correctly generated:FuseDispatchBindingsPass.LLVM dialect:
llvm.noaliasattributes are correctly applied:ConvertToNVVMPass.Future Work