Skip to content

Propagate binding correlation information to LLVM codegen for mutable binding fusion#23184

Open
kimm240 wants to merge 21 commits intoiree-org:mainfrom
kimm240:kimm240/alias
Open

Propagate binding correlation information to LLVM codegen for mutable binding fusion#23184
kimm240 wants to merge 21 commits intoiree-org:mainfrom
kimm240:kimm240/alias

Conversation

@kimm240
Copy link
Copy Markdown
Contributor

@kimm240 kimm240 commented Jan 19, 2026

Problem

When --iree-stream-resource-alias-mutable-bindings=true is enabled, the FuseDispatchBindings pass fuses multiple bindings that refer to the same underlying storage. However, the correlation information about which bindings alias or don't alias was not propagated to LLVM code generation, causing LLVM's alias analysis to be overly conservative. This prevents optimizations like vectorization, especially in cases with multiple in-place operands (e.g., the sort_3d test case with two in-place operands of different types).

This addresses issue #7737.

Solution

We propagate binding correlation information from the Stream dialect through to LLVM code generation by:

  1. Storing correlation attributes in Stream dialect: The FuseDispatchBindings pass now attaches stream.binding_correlation and stream.binding_noalias attributes to function arguments, indicating which bindings are in the same correlation group (same resource, different offsets) or point to different resources.

  2. Reading attributes in LLVM codegen: The ConvertFunc class in LLVMGPU/ConvertToLLVM.cpp reads these attributes and applies LLVM noalias attributes to function arguments, enabling LLVM's alias analysis to make better optimization decisions.

Implementation Details

FuseDispatchBindings Pass (FuseDispatchBindings.cpp)

In updateExecutableSignature, we now generate two types of attributes:

  • stream.binding_correlation: Indicates bindings in the same correlation group (same resource, different offsets - they don't alias each other)
  • stream.binding_noalias: Indicates bindings pointing to different resources (guaranteed not to alias)
// For bindings in different correlation groups (different resources)
if (binding.value().correlationMap != otherBinding.value().correlationMap) {
  noaliasIndices.push_back(IntegerAttr::get(..., otherBinding.index()));
}
if (!noaliasIndices.empty()) {
  funcOp.setArgAttr(bindingArg.getArgNumber(), "stream.binding_noalias",
                   ArrayAttr::get(funcOp.getContext(), noaliasIndices));
}

LLVMGPU ConvertToLLVM (LLVMGPU/ConvertToLLVM.cpp)

The ConvertFunc class reads the correlation attributes from the original function and applies LLVM noalias attributes:

// Read noalias information
if (auto noaliasAttr = funcOp.getArgAttrOfType<ArrayAttr>(
        binding, "stream.binding_noalias")) {
  // ... process and store ...
}

// Apply LLVM noalias attribute
newFuncOp.setArgAttr(idx, LLVM::LLVMDialect::getNoAliasAttrName(), unit);

Verification

The implementation has been verified with the sort_3d test case:

IR was made by
../iree-build/tools/iree-compile --iree-hal-target-device=cuda --iree-cuda-target=sm_86 --mlir-print-ir-after-all test.mlir -o test.vmfb 2> ir_dump.log.

test.mlir is it. which is @sort_3d test in dispatch_linalg_on_tensors.mlir :

util.func public @sort_3d(%arg0: tensor<?x?x?xi32>, %arg1 : tensor<?x?x?xf32>)  
    -> (tensor<?x?x?xi32>, tensor<?x?x?xf32>) {  
  %0, %1 = iree_linalg_ext.sort dimension(0)  
      outs(%arg0, %arg1 : tensor<?x?x?xi32>, tensor<?x?x?xf32>) {  
      ^bb0(%arg2: i32, %arg3: i32, %arg4 : f32, %arg5 : f32):  
        %2 = arith.cmpf ogt, %arg4, %arg5 : f32  
        iree_linalg_ext.yield %2 : i1  
      } -> tensor<?x?x?xi32>, tensor<?x?x?xf32>  
  util.return %0, %1 : tensor<?x?x?xi32>, tensor<?x?x?xf32>  
}    
  1. Stream dialect: stream.binding_noalias attributes are correctly generated:

    func.func @dispatch(
      %arg0: !stream.binding {stream.binding_noalias = [1 : i32]},
      %arg1: !stream.binding {stream.binding_noalias = [0 : i32]}
    )
    • It is in after FuseDispatchBindingsPass.
  2. LLVM dialect: llvm.noalias attributes are correctly applied:

    llvm.func @dispatch(
      %arg0: !llvm.ptr<1> {llvm.noalias, llvm.nonnull, llvm.noundef},
      %arg1: !llvm.ptr<1> {llvm.noalias, llvm.nonnull, llvm.noundef}
    )
    • It is in after ConvertToNVVMPass.

Future Work

  • Implement similar support for LLVMCPU backend (currently only LLVMGPU is supported)
  • Consider using alias.scope metadata for more precise alias information (as mentioned in the TODO comments)

Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated
Copy link
Copy Markdown
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be stacked on top of another PR that implements these TODOs? Otherwise, I figure gesturing at a feature that doesn't quite exist doesn't really make sense

if (auto correlationAttr = funcOp.getArgAttrOfType<ArrayAttr>(
binding, "stream.binding_correlation")) {
SmallVector<int64_t> correlatedBindings;
for (auto attr : correlationAttr) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we spell out the types here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... more importantly

  1. Can we turn this as corralatedBindings.getAsValueRange
  2. If you're worried about the types, you can add a dialect attribute verifier to stream tAt'll check the format of these attributes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fix it in commit commit 0638ea0.

if (correlationIt != bindingCorrelationMap.end() && !correlationIt->second.empty()) {
// This binding is in a correlation group. The bindings in the same
// group don't alias each other (different offsets).
// TODO: In the future, we could use alias.scope metadata to be more
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is where the code will go for annotating the different subspans of a binding as noasias etc.

Some of that would be made simpler by something like https://discourse.llvm.org/t/n-ary-separate-objects-non-argument-noalias-groups/88420 landing.

I can also flag #22236 as an interesting related piece of work

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the guidance! I'm very grateful for the high-level context and the links you shared.

Regarding the TODOs: I agree that gesturing at features that don't yet exist can be confusing. I've removed those comments in commit faf30dd.

However, seeing the discussions in the LLVM community you shared, I’m open to exploring the implementation of that.

Copy link
Copy Markdown
Collaborator

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not reviewed the code yet, but I think for the code that lowers to LLVM GPU, we should be OK just adding noalias to all operands. The fact that they are the same bindings shouldnt matter, cause the generated LLVM function takes a different pointer for each argument. One or more pointers could have been set by using the same base but different offsets, but LLVM shouldnt care about this.

Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp Outdated
Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated
Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated
Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated
Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated
Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated
Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated
Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated
@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Jan 20, 2026

@MaheshRavishankar
The reason I implemented the correlation tracking is to be conservative and safe. While most GPU dispatch arguments don't overlap, I was concerned about cases where multiple bindings might point to the exact same resource (true aliasing).

If we can guarantee that IREE's high-level flow/stream always produces non-overlapping subspans for separate bindings, then simply applying noalias to all operands would be simpler.

@MaheshRavishankar
Copy link
Copy Markdown
Collaborator

@MaheshRavishankar
The reason I implemented the correlation tracking is to be conservative and safe. While most GPU dispatch arguments don't overlap, I was concerned about cases where multiple bindings might point to the exact same resource (true aliasing).

If we can guarantee that IREE's high-level flow/stream always produces non-overlapping subspans for separate bindings, then simply applying noalias to all operands would be simpler.

Yes. It is guaranteed.

@kuhar
Copy link
Copy Markdown
Member

kuhar commented Jan 20, 2026

Two subspans into the same binding may overlap, distinct bindings do not

@kimm240 kimm240 force-pushed the kimm240/alias branch 2 times, most recently from 9ff7ddc to 207ad2e Compare January 20, 2026 05:18
@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Jan 20, 2026

I completely agree that applying a blanket noalias to all pointer arguments is a simpler approach and works effectively under IREE's current guarantee that distinct bindings never overlap.

However, I have preserved the correlation tracking logic in this PR to prepare for the following future scenarios:
Foundation for alias.scope Metadata:

As discussed in the LLVM forum thread, the next step for maximizing GPU performance is moving beyond function attributes to granular !alias.scope and !noalias metadata. And My implementation provides the exact 'pointer grouping' information required to emit this metadata.

While distinct bindings don't overlap, subspans within the same binding might. The correlation logic is the only way to distinguish these cases. By keeping this infrastructure, we can safely optimize instructions even when multiple subspans are fused into a single binding argument.

Relying on a global "no-overlap" guarantee is fine for now, but explicit correlation makes the codegen much more resilient to future changes in resource management or binding fusion strategies.

I’d appreciate it if we could keep this logic as a foundation for upcoming metadata work.

@MaheshRavishankar
Copy link
Copy Markdown
Collaborator

Two subspans into the same binding may overlap, distinct bindings do not

Are you sure? Two subspans into the same binding I think do not overlap. Thats how the combine bindings in Stream works. @benvanik could you confirm that in a dispatch the binding subspans do not overlap.

@kuhar
Copy link
Copy Markdown
Member

kuhar commented Jan 20, 2026

Two subspans into the same binding may overlap, distinct bindings do not

Are you sure? Two subspans into the same binding I think do not overlap.

I remember debugging an elementwise dispatch that had two inputs coming from the same buffer at different base offsets.

@MaheshRavishankar
Copy link
Copy Markdown
Collaborator

Ok, maybe I am getting confused. I haven't looked at in a while. The subspans are independent, that is guaranteed. Whether the offset is added to the base inside the kernel or outside. I don't remember now. Would be good to check

@krzysz00
Copy link
Copy Markdown
Contributor

... the subspan condition is actually slightly weaker, per a bunch of previous discussions I had with @benvanik on the writeonly stuff. The summary of that is that, as best as I can recall, we obey Rust rules here: a given region of memory can either be covered by N immutable subspans (including 0) or exactly 1 mutable subspan.

So multiple read-only subspans aliasing each other is allowed, though one could probably get away with noaliasing them anyway.

@MaheshRavishankar
Copy link
Copy Markdown
Collaborator

... the subspan condition is actually slightly weaker, per a bunch of previous discussions I had with @benvanik on the writeonly stuff. The summary of that is that, as best as I can recall, we obey Rust rules here: a given region of memory can either be covered by N immutable subspans (including 0) or exactly 1 mutable subspan.

So multiple read-only subspans aliasing each other is allowed, though one could probably get away with noaliasing them anyway.

Well, then if there is a way to say the reads dont alias with the writes, thats enough I guess. I am not blocking on this PR... but the logic seems more complicated, and I thought we could just use some existing gaurantees. I am not reviewing this actively. If @kuhar and @krzysz00 are fine with this, it works for me.

@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Feb 4, 2026

Hi all! Regarding the previous discussion, as @kuhar and @krzysz00 mentioned, read-only subspans can indeed overlap, so I believe the current correlation tracking logic is necessary for safety and for the future alias.scope metadata work.

If you think any further refinements or additional implementations are needed to better align this with IREE's architecture, please let me know. I’m more than happy to follow up and implement them.

Copy link
Copy Markdown
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for this?

Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp Outdated
@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Feb 4, 2026

@kuhar
This test confirms that FuseDispatchBindings correctly generates stream.binding_noalias when distinct resources are allocated and alias_mutable_bindings is set to false. It ensures the analysis logic properly identifies non-aliasing bindings at the Stream level.

Please Give me any feedback if there is an issue in test.

Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_nvvm_noalias_e2e.mlir Outdated
int64_t binding = subspan.getBinding().getSExtValue();
// Try to read correlation information from the original function.
// Note: This assumes the correlation attribute was preserved through
// Stream → HAL conversion. If not, we'll need to preserve it explicitly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stick to ascii

Suggested change
// Stream HAL conversion. If not, we'll need to preserve it explicitly.
// Stream -> HAL conversion. If not, we'll need to preserve it explicitly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fix it in commit ca20bad

Comment on lines +84 to +87
// After ConvertToNVVMPass: llvm.noalias attributes should be applied.
// LLVM-LABEL: llvm.func @dispatch
// LLVM-SAME: (%[[LLVM_ARG0:.+]]: !llvm.ptr {llvm.align = 16 : i32, llvm.noalias, llvm.nonnull, llvm.noundef},
// LLVM-SAME: %[[LLVM_ARG1:.+]]: !llvm.ptr {llvm.align = 16 : i32, llvm.noalias, llvm.nonnull, llvm.noundef}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are never checked

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I delete it for clarity.

Comment on lines +12 to +13
// 2. LLVM dialect: After ConvertToNVVMPass, llvm.noalias attributes are
// applied to function arguments
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is missing from the test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kuhar
I add test for LLVM dialect in commit 6c07283

@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Mar 16, 2026

Hi @kuhar @krzysz00, just a gentle ping on this PR.
I addressed all @kuhar 's previous feedback, including adding the requested NVVM/LLVM dialect tests and fixing the ascii comments, back in early February.
Could you please take another look when you have a chance? Please let me know if there are any further refinements needed. Thanks for your time!

@kuhar
Copy link
Copy Markdown
Member

kuhar commented Mar 16, 2026

Can you rebase this? We plan to have a release in a couple of days, I'd hold off with merging this until it's out.

@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Mar 17, 2026

Hi @kuhar, thank you for the update. I have already rebased this PR onto the latest main. I'm happy to wait until the upcoming release is out. Thanks again for the review!

Hyun Gyu Kim and others added 20 commits April 9, 2026 11:22
1
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
2
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
…tchBindings.cpp

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
…tchBindings.cpp

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Apr 9, 2026

Hi @kuhar! As requested, I've rebased the PR onto the latest main branch again. Please let me know when you have a chance to take a look. Thanks!

@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Apr 14, 2026

@kuhar @krzysz00 @bangtianliu
Prev CI is reproducibly failing on llama_8b_fp16/quality_gfx942 with an output mismatch, so this may indicate a correctness issue. (Latest Commit may solve it, but I can't sure what it is.)

If this is confirmed as a correctness bug, we likely need to decide whether this is:

  1. acceptable drift from aliasing-related optimization (and therefore an expected/weight update case), or
  2. a logic issue that requires making noalias propagation more conservative.

I can’t reliably reproduce this exact test locally in my current environment, so I can’t confidently conclude which direction is correct yet.

Guidance on the preferred direction would be greatly appreciated.

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>
@kimm240
Copy link
Copy Markdown
Contributor Author

kimm240 commented Apr 17, 2026

Change FfileCheck for this pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants